Monday, December 10, 2007

Functions and Global Variables

The program expand processes the files named as its arguments (or its standard input if no file arguments are specified) by expanding hard tab characters(\t, ASCII character9) to a number of spaces. The default behavior is to set tab stops every eight characters; this can be overridden by a comma or space-separated numeric list specified using the -t option. An interesting aspect of the program's implementation, and the reason we are examining it, is that it uses all of the control flow statements available in the C family of languages. Figure 2.2 contains the variable and function declarations of expand,[10] Figure 2.3 contains the main code body,[11] and Figure 2.5 (in Section 2.5) contains the two supplementary functions used.[12]

[10] netbsdsrc/usr.bin/expand/expand.c:36–62

[11] netbsdsrc/usr.bin/expand/expand.c:64–151

[12] netbsdsrc/usr.bin/expand/expand.c:153–185

When examining a nontrivial program, it is useful to first identify its major constituent parts. In our case, these are the global variables (Figure 2.2:1) and the functions main (Figure 2.3), getstops (see Figure 2.5:1), and usage (see Figure 2.5:8).

The integer variable nstops and the array of integers tabstops are declared as global variables, outside the scope of function blocks. They are therefore visible to all functions in the file we are examining.

The three function declarations that follow (Figure 2.2:2) declare functions that will appear later within the file. Since some of these functions are used before they are defined, in C/C++ programs the declarations allow the compiler to verify the arguments passed to the function and their return values and generate correct corresponding code. When no forward declarations are given, the C compiler will make assumptions about the function return type and the arguments when the function is first used; C++ compilers will flag such cases as errors. If the following function definition does not match these assumptions, the compiler will issue a warning or error message. However, if the wrong declaration is supplied for a function defined in another file, the program may compile without a problem and fail at runtime.



Figure 2.2 Expanding tab stops (declarations).
<-- a
#include
#include
#include
#include
#include

int nstops;
int tabstops[100];

static void getstops(char *);
int main(int, char *);
static void usage (void);



(a) Header files



Global variables



Forward function declarations



Notice how the two functions are declared as static while the variables are not. This means that the two functions are visible only within the file, while the variables are potentially visible to all files comprising the program. Since expand consists only of a single file, this distinction is not important in our case. Most linkers that combine compiled C files are rather primitive; variables that are visible to all program files (that is, not declared as static) can interact in surprising ways with variables with the same name defined in other files. It is therefore a good practice when inspecting code to ensure that all variables needed only in a single file are declared as static.



Let us now look at the functions comprising expand. To understand what a function (or method) is doing you can employ one of the following strategies.

Guess, based on the function name.

Read the comment at the beginning of the function.

Examine how the function is used.

Read the code in the function body.

Consult external program documentation.

In our case we can safely guess that the function usage will display program usage information and then exit; many command-line programs have a function with the same name and functionality. When you examine a large body of code, you will gradually pick up names and naming conventions for variables and functions. These will help you correctly guess what they do. However, you should always be prepared to revise your initial guesses following new evidence that your code reading will inevitably unravel. In addition, when modifying code based on guesswork, you should plan the process that will verify your initial hypotheses. This process can involve checks by the compiler, the introduction of assertions, or the execution of appropriate test cases.



Figure 2.3 Expanding tab stops (main part).
int
main(int argc, char *argv)
{
int c, column;
int n;

while ((c = getopt (argc, argv, "t:")) != -1) {
switch (c) {
case 't':
getstops(optarg);
break;
case '?': default: <-- a
usage();
}
}
argc -= optind;
argv += optind;
do {

if (argc > 0) {
if (freopen(argv[0], "r", stdin) == NULL) {
perror(argv[0]);
exit(1);
}
argc--, argv++;
}

column = 0;
while ((c = getchar()) != EOF) {
switch (c) {
case '\t': <-- b
if (nstops == 0) {
do {
putchar(' ');
column++;
} while (column & 07);
continue;
}
if (nstops == 1) {
do {
putchar(' ');
column++;
} while (((column - 1) % tabstops[0]) != (tabstops[0] - 1));
continue;
}
for (n = 0; n < nstops; n++)
if (tabstops[n] > column)
break;
if (n == nstops) {
putchar(' ');
column++;
continue;
}
while (column < tabstops[n]) {
putchar(' ');
column++;
}
continue;
case '\b': <-- c
if (column)
column--;
putchar('\b');
continue;
default: <-- d
putchar(c);
column++;
continue;
case '\n': <-- e
putchar(c);
column = 0;
continue;
} <-- f
} <-- g
} while (argc > 0);) <-- h
exit(0);
}



Variables local to main



Argument processing using getopt



Process the -t option



(a) Switch labels grouped together



End of switch block



At least once



(7) Process remaining arguments



Read characters until EOF



(b) Tab character



Process next character



(c) Backspace



(d) All other characters



(e) Newline



(f) End of switch block



(g) End of while block



(h) End of do block



The role of getstops is more difficult to understand. There is no comment, the code in the function body is not trivial, and its name can be interpreted in different ways. Noting that it is used in a single part of the program (Figure 2.3:3) can help us further. The program part where getstops is used is the part responsible for processing the program's options (Figure 2.3:2). We can therefore safely (and correctly in our case) assume that getstops will process the tab stop specification option. This form of gradual understanding is common when reading code; understanding one part of the code can make others fall into place. Based on this form of gradual understanding you can employ a strategy for understanding difficult code similar to the one often used to combine the pieces of a jigsaw puzzle: start with the easy parts.

Exercise 2.7 Examine the visibility of functions and variables in programs in your environment. Can it be improved (made more conservative)?

Exercise 2.8 Pick some functions or methods from the book's CD-ROM or from your environment and determine their role using the strategies we outlined. Try to minimize the time you spend on each function or method. Order the strategies by their success

No comments: