The Development of a High-level Computer Language and its Associated Compiler
In this paper the development of a new computer language and an associated
compiler is described in four sections. The first section introduces the topic
of compiler design and its importance in undergraduate studies. Section 2 gives
a general overview of the features of the newly designed language and discusses
their respective merits. Sections 3 details the construction of the compiler in
three subsections which describe the three major phases of compiler
construction: the scanner, the symbol table, and the parser. The last section
concludes the paper by describing the educational benefits of the project for
the developers. The project was part of a course in compiler design conducted
at McKendree College in the fall semester of 2002 by Dr. Kian Pokorny.
"The quick, brown fox jumps over the lazy dog." When a person reads a sentence such as this, a series of events occur that allow him/her to understand it. First, the symbols are transmitted to the brain, which recognizes them as letters and groups them into words. The mind then looks up each word in its memory of known words, the person's internal dictionary, and connects each with some meaning. Words are grouped into phrases and grammar rules are applied to those phrases to determine whether or not they make sense in any language the person understands. When a computer program parses a sentence written in English or any other language, it uses a very similar process. The program reads each character and connects groups of them to form words. Then, the program analyzes those words and the ways in which they are combined to see if they comply with a given set of rules that govern the language.
This ability to parse languages is the essence of computing. Every problem a computer may solve is simply a problem of languages. Consider a desktop calculator. At its core, it is a translator. On one side, the user is speaking to it in the language of mathematics. The user might say to it, “2 + 3,” or, “36 - 34,” both of which are valid sentences - they follow the rules for arithmetic expressions. On the other side, the calculator speaks the language of numbers, the answers to the user's sentences. Given the examples above, the calculator should say, “5,” and, “2,” respectively. Should the user speak an improperly structured sentence, such as “3) + 4,” the calculator tells him/her so instead of giving an answer because it cannot easily understand what he/she meant. Because every computer application takes some input and produces some output, they are all merely translating from one language to another.
The focal point of this project is one particular application of language translation in computing, that of programming language compilation. A compiler is a translator between programming languages. Most of the time, a compiler translates from a powerful, human-readable language, such as C++ or Java, into a simple, machine-readable language that a computer may run. The language parsing, or grammatical analysis, techniques that most compilers use are more primitive versions of those used by Microsoft Word when it performs grammar checks on English compositions. Language parsing methods that originated in the study of compiler design are also now used for executing Internet searches and for natural language processing by voice recognition programs. The concepts that are the foundation of the tu language and its compiler are virtually everywhere in modern applications of computing.
There are many other reasons to study compiler design. The most important of these reasons is that compiler construction overlaps many areas of Computer Science, granting students valuable experience in software engineering, a more in-depth understanding of programming languages, and knowledge of the intricacies of machine architecture (Aho, Sethi and Ullman 1). Additionally, many of the techniques used to build compilers have applicability elsewhere, such as in reading structured data, rapidly introducing new formats, and converting files. For instance, PostScript interpreters convert PostScript text, describing how to print something in terms any computer can understand, to printer-specific instructions, and they utilize the translation techniques of compiler design in order to do so (Grune et al. 8). Therefore, in order to broaden understandings of various areas of Computer Science, to gain skills applicable to future projects, and to satisfy simple academic curiosity, a new programming language was designed and a compiler was created to convert it to interpreted machine code. The language was designed with a range of features that have relevant benefits and disadvantages. The compiler for this language was constructed in three parts: a scanner, a symbol table, and a parser.
2 The Construction of a New Language
The new language is called tu. Dr. Kian Pokorny named it thus because of an odd reoccurrence of the number two during its development. tu is a case-sensitive, block-structured language in which each block of code consists of an optional list of declarations followed by one or more statements. Each tu program is contained entirely within its own source file and has this same structure of declarations followed by statements except for the fact that the main program code is undelimited. Each of these statements and declarations must start with the beginning-of-line marker, a colon, because any line not starting with a colon is a comment. A comment may also appear on a line with a statement provided that it occurs after the statement and the comment marker. In tu, comments should dominate a given source file because normal code is delimited while comments are the default. This encourages proper documentation, an absolute requirement of modern software engineering practice that enhances readability and comprehension and thus reduces the potential for flaws in the resulting code by maximizing the possibility of someone spotting them before a release.
tu has arithmetic operators for performing the following functions: addition, subtraction, negation, multiplication, division, modulus, all common comparisons, and all common logical operators. A couple of tu’s sixteen operator symbols perform multiple functions with the same symbol. For example, the minus sign is used for both subtraction and negation. This pluralism allows for a smaller number of operator symbols and a higher level of simplicity, both of which increase readability (Sebesta 9) and make tu easier to master (Ghezzi and Jazayeri 11). In keeping with the theme of simplicity, tu contains a fairly small number of control-flow structures including a conditional execution ('if') statement, a discrete value selection statement ('case'/'switch'/'select'), conditional loops ('do'/'while'), a counting ('for') loop, and an explicitly infinite loop. Much of tu's core constructs were designed in order to maximize utility while minimizing complexity.
Functions are also present in tu and can be defined in the declarations of any level of nesting which is one of the qualities that makes tu an ALGOL-family language (Pokorny). The only basic data types in tu are integers, floating-point numbers, and single characters (vectors of characters function as strings). In tu, the first byte of a string of characters contains the length of the string, which may only be as high as 255 (a limitation of tu's virtual machine), while the remaining bytes contain the actual ASCII-character string. It is important to note that every variable declared in a tu program has a type associated with it at compile-time and that the scope of each variable can be statically determined prior to execution making tu statically typed (Ghezzi and Jazayeri 54) and statically scoped (Sebesta 175), respectively. Also, arrays of any basic type and of any dimensions are legal including multidimensional arrays of characters. Variables may be explicitly declared for placement in static storage, increasing memory usage yet also increasing performance and allowing for persistent function-level data. Finally, tu supports only single-tier pointer variables; a pointer may not point to another pointer, not even one of the same type.
With the exception of single-tier pointers, tu is a very orthogonal language, meaning that it contains a small number of primitive constructs which can be combined in many ways to create valid, meaningful structures in the language (Sebesta 10). Overall program structure in tu is highly orthogonal because allowing nested functions to exist in any block causes the entire program to be, essentially, just a block, with no power beyond that of any inner block. Orthogonality is a valuable trait because it increases simplicity and regularity (MacLennan 13). Regularity is the absence of exceptions to rules that makes learning and using the language easier (MacLennan 11). Largely due to its orthogonal design, tu is quite intuitive.
Unfortunately, the simplicity that makes tu easy to use also results in certain limitations. tu does not have complex data structures, valuable pointer functionality, sophisticated I/O, I/O from or to devices other than the screen, or support for modular programming. All of these omissions seriously limit the power of the language and its usefulness for most purposes. However, it is important to note that creating a compiler for tu was not nearly as large scale of a project as it would have been if the language had had more features. With its simplicity, tu made the construction of a compiler into a manageable project for a group of only two developers under the given time constraints. The small group size was more of a limiting factor in terms of productive potential than the time constraints were.
3 Development of a Compiler
The standard compiler for the tu language is a single-pass, top-down, recursive-descent compiler that translates tu source code into machine code for a stack-based virtual machine developed by Dr. Kian Pokorny. “Single-pass” means that the compiler only reads the code once and produces machine code from the information that it gathers on that one pass (Aho, Sethi and Ullman 20). “Top-down” is a technique for parsing that generates a parse tree beginning at the root (Aho, Sethi and Ullman 41). “Recursive-descent” refers to the use of recursive calls to construct an implicit parse tree (Grune et al. 15). A stack architecture is a computer architecture in which most instructions have zero operands and all operations implicitly reference the stack (Pokorny). The compiler was written in C++ over the course of a one-semester class as a group project. Theoretically, a compiler has four main parts: a scanner, a symbol table, a parser, and a code generator, but, for practical purposes, there were only three major subprojects in the process of creating this compiler: building the scanner, building the symbol table, and creating the parser, which performed the code generation itself.
3.1 The Scanner
The first phase of compiler construction involved the scanner. The scanner performs lexical analysis, meaning that it reads in the source code as a series of characters and recognizes the groups of characters in the code that have some meaning when put together (Wilhelm and Maurer 223). These character groups are called lexemes, and they are analogous to words in a normal written language. The scanner translates these lexemes into tokens, tags that represent words by their structure instead of their meaning (Aho, Sethi and Ullman 56). For example, variable names and function names both fall under the “identifier” token, even though they obviously have totally different meanings. The scanner stores extra information in the symbol table in order to resolve these conflicts. The least interesting task performed by the lexical scanner is that of digesting all comments and some spaces in order to present to the parser only the input that has meaning as a tu program. When lexical analysis is complete, the input, now a stream of tokens, is passed on to the next module, the parser.
To begin scanner construction, a set of tokens and a set of reserved words - specific lexeme strings, each of which corresponds to exactly one unique token - for the language were assembled. Then, a transition graph had to be written for each token. The transition graphs showed how the scanner would process any combination of characters that it might receive into certain tokens. The next step was to actually write the code for the scanner by referring to the transition graphs. To this end, a Scanner class was created that contained a method that produced some token if the correct input was received. Tokens were defined as numeric constants, and reserved words were implemented as tokens associated with particular string literals. The scanner does the lexical analysis by reading in a character, forming a lexeme starting with it, and then filtering out spaces until it finds the next character. Once a valid lexeme is formed, the scanner outputs the token associated with it and moves on, as long as context is unimportant. If the meaning of the lexeme is dependent upon context, the scanner must store some contextual information in the symbol table module before producing the associated token, or the meaning of the lexeme is lost. The scanner and the symbol table work in parallel with the parser, sending it the input tokens along with any additional information needed to understand them.
3.2 Symbol Table
In computing theory, a context-free grammar is one in which the production rules each consist of a single non-terminal symbol that goes to a combination of terminal and non-terminal symbols (Cleaveland and Uzgalis 16). In other words, a given rule may apply anywhere in valid text, regardless of the surrounding symbols - the context. It follows that a context-free language such as tu is one that can be generated by a context-free grammar (Aho, Sethi and Ullman 168). Not all elements of tu are completely independent of context, however. In particular, identifiers are highly dependent upon context. If an identifier comes in the form of a function call, it should refer to a function name; if that same identifier occurs in the form of a variable assignment, it should refer to a variable name. Furthermore, the relative location of the identifier with respect to the block structure of the program determines what code has access to that identifier. Information about the context in which an identifier is used must be stored alongside its all-encompassing “identifier” token. The symbol table is the data structure used by the compiler to store this information and, in so doing, to allow for a transparent degree of context-sensitivity. The symbol table stores the attributes of an identifier, including the type, name, scope, kind, and address (Pokorny). Initially, the symbol table was implemented as a simple fixed-size array of pointers-to-structs inside of a symbol table class, but that design suffered from a truly horrific lack of efficiency and was later changed dramatically. In the new design, the symbol table is represented by a stack of scopes. Each scope is a triple of namespaces, functions, in the mathematical sense, that map identifier names to associated attributes. The mappings are as follows for functions, normal variables, and arrays to their respective data fields:
F: name -> kind x type x staticflag x address x nesting x typesignature
V: name -> kind x type x staticflag x address x nesting
A: name -> kind x type x staticflag x address x nesting x dimensions.
The only field among those listed above that can ever be changed is the one for the address of a function. This mutability is necessary because a function prototype, a pre-declaration of a function used for clarity and necessary for certain recursive configurations, is, essentially, just a function without an address; when the code for the function is generated, the prototype's address is updated to point to that code. The functionality provided by the symbol table is what allows tu to have functions and variables, language features that are nearly absolutely necessary yet do not quite fit in simple language theory.
3.3 The Parser
Before the symbol table was complete, work began on the most complex part of the compiler - the parser. The parser's construction was by far the most difficult portion of the project primarily because it housed the most functionality. The job of a traditional parser is to take the token stream sent by the scanner, examine each token, and determine if its placement is consistent with the grammatical rules of the programming language (Wilhelm and Maurer 266). Our parser, however, must also perform the job of a code generator, writing out instructions in machine language based on the meaning of the corresponding tu code, assuming that it is composed of legal constructs (Pokorny). If, instead, the tokens do not make sense, then the parser must be able to recognize that there was an error in the source code and issue an appropriate error message. In other words, the parser does the vast task of syntax analysis (Wilhelm and Maurer 266) making it the bulkiest of the compiler's components.
Before parser coding could commence, parsing rules had to be precisely determined. Therefore, a formal grammar based on tu's specifications needed to be written to assist in bridging the mental gap between conceptualizing the parser and actually coding it. The grammar was done in Backus-Naur form which is a standardized way of formatting a context-free grammar (Cleaveland and Uzgalis 22). Productions in the grammar have an N:1 correspondence with methods in the Parser class. More often than not, two or more productions are coded within a single method.
Formally, a leftmost derivation is a particular series of production rules that may be applied to a string in which only the leftmost non-terminal in a series of terminals and non-terminals is replaced at each step (Aho, Sethi and Ullman 169). More basically, finding a leftmost derivation involves recognizing a particular construct as quickly as you can, starting from the left. An LL(1) parser is one that parses its input from left to right searching for leftmost derivations while only knowing the first token that lies ahead at a given point in the token stream produced by the scanner. Otherwise stated, an LL(1) parser can determine exactly which rule to apply by looking at only the next token (Wilhelm and Maurer 305). However, the tu compiler is not LL(1) because it relies on knowing more than just the single next token in certain cases. All of tu's constructs can be precisely identified by their first tokens except for three: the function call statement, the pointer-assignment statement, and the assignment statement. These three types of statements all begin with identifiers, so additional tokens are required in order to determine which of them involves the given identifier. As a result, our parser is LL(k) instead of LL(1). LL(k) means that it parses its input from left to right searching for leftmost derivations yet requires some integer, greater than one, of immediately proceeding tokens in order to decide when to apply a production rule (Wilhelm and Maurer 304-05).
Most of the methods for the foundational productions in the grammar do little
more than call methods representing other productions. For example, the
“declaration list” method just walks through tokens calling the “declaration” function for each one until it finds something that is not recognizable as a declaration. The lower-level productions, which generate the vast majority of output code, are far more interesting.
The parser interacts with the symbol table in many of the lower-level production methods. For instance, the “variable identifier list” method, called as part of a variable declaration statement, reads “identifier” tokens from the scanner and enters the data for them into the symbol table. Similarly, the “l-value” method, which parses the left-hand side of an assignment, queries the symbol table information for the “identifier” lexeme it sees in order to ensure that it has been previously declared. Virtually all methods associated with production rules involving functions or variables access the symbol table either directly or indirectly.
The additional task that our parser must perform is that of code generation, the actual production of translated code. Our target language, like most, is extremely low-level. One line of tu code may produce tens of lines of tu virtual machine code, because the virtual machine instructions can only do simple, atomic tasks. tu has many powerful structures that utilize a large amount of logic that is invisible in the tu code yet must exist in the machine code. Consider a counting loop, tu's “dofor.” This structure counts with a variable from an initial value to a final value by a step value. In tu, the programmer simply supplies these things on one line. The resulting machine code, however, must explicitly perform each step:
1.The initial value must be calculated (it could be an expression, like “2 +
2.The initial value must be stored in the given variable.
3.The final value and the step must both be calculated for later use.
4.The current value of the variable is checked to determine whether it is beyond the final value; if so, the loop is skipped.
5.Otherwise, the code within the tu loop is executed.
6.Afterwards, the step value is added to the counter variable, and the process begins again.
As a bare minimum, a single “dofor” line results in the generation of thirty-five lines of machine language code. Thus, the task of code generation is a very complex one that our parser handles.
Code generation was, perhaps, the most challenging aspect of writing the
compiler, and it provided a particularly good learning experience about the
workings of stack architecture. In actuality, all stages of the project
including language design, scanner building, symbol table development, and
parser implementation had significant educational value. For example, the
completion of this project resulted in increased awareness of language parsing
and its countless recognizable, widespread applications, including natural
language processing, HTML parsing, and SQL-based database management systems.
There were many other complex issues, such as symbol table efficiency and
function call sequences, which required a lot of deep thought not only to
comprehend well, but also to resolve. Therefore, a deep understanding of
compilers and their structure was gained through this project while
problem-solving skills were tested and strengthened.
Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Massachusetts: Addison Wesley Longman, Inc. 1988.
Cleaveland, Craig J. and Robert C. Uzgalis. Grammars for Programming Languages. Ed. Thomas E. Cheatham. New York: Elsevier North-Holland, Inc. 1977.
Ghezzi, Carlo and Mehdi Jazayeri. Programming Language Concepts. 3rd ed. New York: John Wiley & Sons. 1998.
Grune, Dick, Henri E. Bal, Ceriel J. H. Jacobs, and Koen G. Langendoen. Modern Compiler Design. Chichester: John Wiley & Sons, Ltd. 2000.
MacLennan, Bruce J. Principles of Programming Languages. 3rd ed. New York: Oxford University Press. 1999.
Pokorny, Kian. "Compiler Design." CSI465: Compiler Design Class. McKendree College, Lebanon, IL. 26 August 2002 to 6 December 2002.
Sebesta, Robert W. Concepts of Programming Languages. 4th ed. Massachusetts: Addison Wesley Longman, Inc. 1999.
Wilhelm, Reinhard and Dieter Maurer. Compiler Design. Trans. Stephen S.
Wilson. Harlow, England: Addison-Wesley Publishing Company. 1995.
Specification for tu
·The language is case-sensitive.
·Normal blocks of statements are enclosed in [ and ]. The imperative stop exists for escaping from blocks:
stop n - Jumps out of the innermost n levels of block structure, where n is a positive integer literal. An n must be specified.
·Comments are the default; statements occur when the first non-whitespace character on a line is :. Comments occurring on the same line as a statement must occur after that statement and the comment delimiter character, |.
·Empty statements such as
and blank lines are allowed but ignored. The NOOP imperative can be used to create a statement that does not execute any operation.
·A tu program is entirely contained within one tu source file. The structure of a tu source file is strictly as follows:
1.Declarations - These include global variable declarations, function prototypes, and function definitions in no specific order.
Note that blocks are statements and that the structure of a tu block is as follows:
1.Declarations - These include local variable declarations, function prototypes, and function definitions in no specific order.
tu contains the following operators:
·+, for addition
·-, for binary subtraction and unary negation
·*, for multiplication and string variable dereferencing
·/, for both floating-point and integer division
·%, for modulus which is not defined for real numbers)
for both assignment (which is actually a form of simple statement, not a normal
operator) and equality testing. When occurring in an expression, = refers
to an equality test. When occurring in the following form:
: lvalue = expression
The first = is actually the symbol for the assignment imperative, which assigns the value of expression into the variable named by lvalue, which is a valid identifier and any array indexing. Note that any other = on the above line will be treated as an equality test because it is inside of an expression.
·=/, for inequality testing
·>, for greater-than testing
·<, for less-than testing
·<_, for less-than-or-equal-to testing
·>_, for greater-than-or-equal-to testing
·and, for logical and
·or, for logical inclusive or
·not, for logical unary not
·xor, for logical exclusive or
·->, for “pointing” (see pointers in the variable section)
·Operator precedence is as follows from first to last done; all parenthesized operators are equivalent in precedence: unary -, (*, /, %), (+, -), (=, =/, <, >, <_, >_), not, and, (or, xor).
·All operators are right-associative except for *, /, %, +, and -, all of which are left-associative.
·The result of any Boolean expression (i.e., one involving and, or, not, xor, =, =/, <, >, <_, or >_) is an integer.
·The result of any arithmetic expression involving at least one real number is a real number.
·The result of an addition or subtraction involving one character and one integer is a character, and the result of a modulus involving one character and one integer is a character. All other arithmetic involving one integer and one character yields an integer result.
·The result of any arithmetic operator applied to only arguments of any one type is always of that same type For example, char * char results in a char.
4. Control Structures
All control structures in tu involve special sequences followed by one or more code blocks. The beginning delimiter of a given block must occur at the end of a statement line. The ending delimiter of a given block must occur by itself on a line, with one exception. When there are two code blocks in a row following an if statement, the ending delimiter of the first block and the beginning delimiter of the second occur in order at the end of one statement line, as in the following example:
:if x < 10 [
:printline “It's less than ten!”][
:printline “It's ten or greater!”]
As shown in the second line above, in this case, the ending delimiter of the first block is actually the second-to-last character on a statement line, with the last character being the beginning delimiter of the next block.
The following control structures exist in tu:
·if expression [ true_block ] [ false_block ] - Conditional execution. If expression evaluates to true (non-zero), true_block is executed. If expression evaluates to false (zero), the optional false_block is executed; if no false_block was specified, then execution continues at the next statement following the true_block.
·select expression [ select_block ] - Discrete value selection. The expression in a select must result in an integer or a character. The only statements in select_block that are executed are those whose option prefix has a value equivalent to expression.
·option value [ option_block ] - Valid only within a select block, option associates the statements in option_block with the integer or character value, value.
§Note that a select block looks like a normal code block, but there is a significant difference in their logic. In a normal code block, all statements are executed in order; in a select block, only certain option blocks are selected for execution.
·do [ loop_block ] - Infinite loop. The statements in loop_block are executed in order indefinitely. The loop may only be terminated with a stop imperative.
·doif expression [ loop_block ] - Conditional loop without guaranteed execution. The statements in loop_block are executed in order repeatedly as long as expression (evaluated before the loop) evaluates to true (non-zero).
·dowhile expression [ loop_block ] - Conditional loop with one guaranteed iteration. The statements in loop_block are executed in order repeatedly as long as expression (evaluated after the loop) evaluates to true (non-zero).
·dofor counter, initial_value, final_value, increment [ loop_block ] - Counting loop. counter is a valid identifier that will be an integer variable defined only for loop_block. The expression, initial_value, will be assigned into counter before anything else. The loop_block code will then be executed repeatedly until counter is less than or equal to (if final_value is less than initial_value) or greater than or equal to (if final_value is greater than initial_value) final_value. After each iteration of loop_block, the value of the expression increment will be added to counter. increment, initial_value and final_value must each return an integer or character value.
Function definitions take the following form:
F name type_signature [ function_body ]
name is the identifier associated with this function, and function_body contains the code associated with this function. The optional type_signature field contains return-value and parameter information, in the following format:
where parameter_list is an optional comma-separated list of variable declarations without initializers and return_spec is an optional return-variable specification of the form:
, returns return_variable
where return_variable is a normal variable declaration (any initialization occurs just before function_body is executed). Function parameters may not be arrays. The parameter variables declared in the parameter_list exist as normally visible variables within only function_body. The return-variable declared in the return_spec also exists as a normally visible variable within only function_body, but the current value of return_variable is returned by the function when it terminates, either at the end of function_body or as the result of a stop imperative.
Here are some examples for the sake of clarification:
·A function named add that takes two integers, x and y, and
returns their sum, z:
:F add int x, int y, returns int z [
:z = x + y
·A function that takes no arguments yet returns a value:
:F magic, returns int m [
:m = 4096
·A function that takes some arguments yet returns nothing:
:F printints int a, int b [
:print a, b]
·A function that takes no arguments and returns nothing:
:F hello [
:printline “Hello, world!”
Function prototypes are, essentially, function definitions, prefixed by P in place of F, with neither variable names nor function bodies. Here are the prototypes for the four example functions above in respective order:
·:P add int, int, returns int
·:P magic, returns int
·:P printints int, int
A function call consists of a valid identifier for a function followed by an argument list enclosed in curly braces. Each argument is an expression and is evaluated before being passed to its function. Note that function calls can be statements as follows:
tu contains the following possible variable types:
·int - Machine-word-sized integer
·rea - Double-precision floating-point number
·cha - Single ASCII character
·Arrays - Arrays of values of a given type also exist. An array type is specified by the size of each array dimension in parentheses occurring after the specification for the type of the constituent values. For example, int(4) is the type of one-dimensional arrays of four integers. rea ptr(10)(5) is the type of two-dimensional 10x5 arrays of pointers to reals. Arrays of characters are used to represent strings. They differ from other types of arrays in that the first position of each array of characters contains the length of that array which may not be more than 255. Note multi-dimensional arrays of chas are allowed, so cha (8) (8) (10) is the type of a 8x8x10 array of characters.
·Boolean values are just integers, with zero representing false and non-zero representing true.
Variables are declared using the V imperative:
V type id_list
where type is a valid variable type (see above) and id_list is a comma-separated list of one or more valid identifiers. V declares all of the identifiers in id_list to be new variables of type type. For example,
:V int x, y, z
declares x, y, and z to be new integer variables, and
:V cha(8)(5)(3) bigarray
declares bigarray to be an 8x5x3 three-dimensional array of characters.
To specify that a variable is static, V is followed by S in declaration. For example,
:V S rea(3) z
creates a static array of reals.
Pointers deserve special attention. A variable is a pointer if its variable type is followed by ptr at declaration. For example,
:V int ptr x
creates the integer pointer x. A normal assignment into a pointer variable actually assigns the given value into the variable at which it points. To specify where a given pointer variable points, use the point operator, ->. For example, the following code changes the value of a to 4:
:V int a
:V int ptr b
:b -> a
:b = 4
There are four imperatives for doing input/output in tu:
1. print print_list - Prints the comma-separated expressions, string literals, and dereferenced string variables in print_list in order to standard output.
2. printline print_list - Acts exactly like print, but adds an end-of-line character after printing the given expressions, string literals, and dereferenced string variables.
3. read variable_list - Reads values from standard input and stores them into the variables in the comma-separated variable_list in order. Note that read can not read in whole arrays but it can read in individual elements from arrays.
4. readline string_list - Reads lines from standard input and stores them into the single-dimensional character arrays in the comma-separated string_list in order. Note that arrays of characters (strings) do not need to be preceded by the dereferencing operator, *, to be read in by readline.