Monday, December 10, 2007

while Loops, Conditions, and Blocks

We can now examine how options are processed. Although expand accepts only a single option, it uses the Unix library function getopt to process options. A summarized version of the Unix on-line documentation for the getopt function appears in Figure 2.4. Most development environments provide on-line documentation for library functions, classes, and methods. On Unix systems you can use the man command and on Windows the Microsoft Developer Network Library (MSDN),[13] while the Java API is documented in HTML format as part of the Sun JDK. Make it a habit to read the documentation of library elements you encounter; it will enhance both your code-reading and code-writing skills.

[13] http://msdn.microsoft.com

Figure 2.4. The getopt manual page.


Based on our understanding of getopt, we can now examine the relevant code (Figure 2.3:2). The option string passed to getopt allows for a single option -t, which is to be followed by an argument. getopt is used as a condition expression in a while statement. A while statement will repeatedly execute its body as long as the condition specified in the parentheses is true (in C/C++, if it evaluates to a value other than 0). In our case the condition for the while loop calls getopt, assigns its result to c, and compares it with -1, which is the value used to signify that all options have been processed. To perform these operations in a single expression, the code uses the fact that in the C language family assignment is performed by an operator (=), that is, assignment expressions have a value. The value of an assignment expression is the value stored in the left operand (the variable c in our case) after the assignment has taken place. Many programs will call a function, assign its return value to a variable, and compare the result against some special-case value in a single expression. The following typical example assigns the result of readLine to line and compares it against null (which signifies that the end of the stream was reached).[14]

[14] cocoon/src/java/org/apache/cocoon/components/language/programming/java/Javac.java:106–112



if ((line = input.readLine()) == null) [...]
return errors;

It is imperative to enclose the assignment within parentheses, as is the case in the two examples we have examined. As the comparison operators typically used in conjunction with assignments bind more tightly than the assignment, the following expression



c = getopt (argc, argv, "t:") != -1

will evaluate as

c = (getopt (argc, argv, "t:") != -1)

thus assigning to c the result of comparing the return value of getopt against -1 rather than the getopt return value. In addition, the variable used for assigning the result of the function call should be able to hold both the normal function return values and any exceptional values indicating an error. Thus, typically, functions that return characters such as getopt and getc and also can return an error value such as -1 or EOF have their results stored in an integer variable, not a character variable, to hold the superset of all characters and the exceptional value (Figure 2.3:7). The following is another typical use of the same construct, which copies characters from the file stream pf to the file stream active until the pf end of file is reached.[15]

[15] netbsdsrc/usr.bin/m4/eval.c:601–602



while ((c = getc(pf)) != EOF)
putc(c, active);

The body of a while statement can be either a single statement or a block: one or more statements enclosed in braces. The same is true for all statements that control the program flow, namely, if, do, for, and switch. Programs typically indent lines to show the statements that form part of the control statement. However, the indentation is only a visual clue for the human program reader; if no braces are given, the control will affect only the single statement that follows the respective control statement, regardless of the indentation. As an example, the following code does not do what is suggested by its indentation.[16]

[16] netbsdsrc/usr.sbin/timed/timed/timed.c:564–568



for (ntp = nettab; ntp != NULL; ntp = ntp->next) {
if (ntp->status == MASTER)
rmnetmachs(ntp);
ntp->status = NOMASTER;
}

The line ntp->status = NOMASTER; will be executed for every iteration of the for loop and not just when the if condition is true.

Exercise 2.9 Discover how the editor you are using can identify matching braces and parentheses. If it cannot, consider switching to another editor.

Exercise 2.10 The source code of expand contains some superfluous braces. Identify them. Examine all control structures that do not use braces and mark the statements that will get executed.

Exercise 2.11 Verify that the indentation of expand matches the control flow. Do the same for programs in your environment.

Exercise 2.12 The Perl language mandates the use of braces for all its control structures. Comment on how this affects the readability of Perl programs.
switch Statements
The normal return values of getopt are handled by a switch statement. You will find switch statements used when a number of discrete integer or character values are being processed. The code to handle each value is preceded by a case label. When the value of the expression in the switch statement matches the value of one of the case labels, the program will start to execute statements from that point onward. If none of the label values match the expression value and a default label exists, control will transfer to that point; otherwise, no code within the switch block will get executed. Note that additional labels encountered after transferring execution control to a label will not terminate the execution of statements within the switch block; to stop processing code within the switch block and continue with statements outside it, a break statement must be executed. You will often see this feature used to group case labels together, merging common code elements. In our case when getopt returns 't', the statements that handle -t are executed, with break causing a transfer of execution control immediately after the closing brace of the switch block (Figure 2.3:4). In addition, we can see that the code for the default switch label and the error return value ´?´ is common since the two corresponding labels are grouped together.



When the code for a given case or default label does not end with a statement that transfers control out of the switch block (such as break, return, or continue), the program will continue to execute the statements following the next label. When examining code, look out for this error. In rare cases the programmer might actually want this behavior. To alert maintainers to that fact, it is common to mark these places with a comment, such as FALLTHROUGH, as in the following example.[17]

[17] netbsdsrc/bin/ls/ls.c:173–178



case 'a':
fts_options |= FTS–SEEDOT;
/* FALLTHROUGH */
case 'A':
f_listdot = 1;
break;

The code above comes from the option processing of the Unix ls command, which lists files in a directory. The option -A will include in the list files starting with a dot (which are, by convention, hidden), while the option -a modifies this behavior by adding to the list the two directory entries. Programs that automatically verify source code against common errors, such as the Unix lint command, can use the FALLTHROUGH comment to suppress spurious warnings.

A switch statement lacking a default label will silently ignore unexpected values. Even when one knows that only a fixed set of values will be processed by a switch statement, it is good defensive programming practice to include a default label. Such a default label can catch programming errors that yield unexpected values and alert the program maintainer, as in the following example.[18]

[18] netbsdsrc/usr.bin/at/at.c:535–561



switch (program) {
case ATQ:
[...]
case BATCH:
writefile(time(NULL), 'b');
break;
default:
panic("Internal error");
break;
}

In our case the switch statement can handle two getopt return values.

't' is returned to handle the -t option. Optind will point to the argument of -t. The processing is handled by calling the function getstops with the tab specification as its argument.

'?' is returned when an unknown option or another error is found by getopt. In that case the usage function will print program usage information and exit the program.

A switch statement is also used as part of the program's character-processing loop (Figure 2.3:7). Each character is examined and some characters (the tab, the newline, and the backspace) receive special processing.

Exercise 2.13 The code body of switch statements in the source code collection is formatted differently from the other statements. Express the formatting rule used, and explain its rationale.

Exercise 2.14 Examine the handling of unexpected values in switch statements in the programs you read. Propose changes to detect errors. Discuss how these changes will affect the robustness of programs in a production environment.

Exercise 2.15 Is there a tool or a compiler option in your environment for detecting missing break statements in switch code? Use it, and examine the results on some sample programs.
for Loops
To complete our understanding of how expand processes its command-line options, we now need to examine the getstops function. Although the role of its single cp argument is not obvious from its name, it becomes apparent when we examine how getstops is used. getstops is passed the argument of the -t option, which is a list of tab stops, for example, 4, 8, 16, 24. The strategies outlined for determining the roles of functions (Section 2.2) can also be employed for their arguments. Thus a pattern for reading code slowly emerges. Code reading involves many alternative strategies: bottom-up and top-down examination, the use of heuristics, and review of comments and external documentation should all be tried as the problem dictates.

After setting nstops to 0, getstops enters a for loop. Typically a for loop is specified by an expression to be evaluated before the loop starts, an expression to be evaluated before each iteration to determine if the loop body will be entered, and an expression to be evaluated after the execution of the loop body. for loops are often used to execute a body of code a specific number of times.[19]

[19] cocoon/src/java/org/apache/cocoon/util/StringUtils.java:85

for (i = 0; i < len; i++) {

Loops of this type appear very frequently in programs; learn to read them as "execute the body of code len times." On the other hand, any deviation from this style, such as an initial value other than 0 or a comparison operator other than <, should alert you to carefully reason about the loop's behavior. Consider the number of times the loop body is executed in the following examples.



Loop extrknt + 1 times:[20]

[20] netbsdsrc/usr.bin/fsplit/fsplit.c:173

for (i = 0; i <= extrknt; i++)

Loop month - 1 times:[21]

[21] netbsdsrc/usr.bin/cal/cal.c:332

for (i = 1; i < month; i++)

Loop nargs times:[22]

[22] netbsdsrc/usr.bin/apply/apply.c:130

for (i = 1; i <= nargs; i++)

Note that the last expression need not be an increment operator. The following line will loop 256 times, decrementing code in the process:[23]

[23] netbsdsrc/usr.bin/compress/zopen.c:510

for (code = 255; code >= 0; code--) {

In addition, you will find for statements used to loop over result sets returned by library functions. The following loop is performed for all files in the directory dir.[24]

[24] netbsdsrc/usr.bin/ftp/complete.c:193–198



if ((dd = opendir(dir)) == NULL)
return (CC_ERROR);
for (dp = readdir(dd); dp != NULL; dp = readdir(dd)) {

The call to opendir returns a value that can be passed to readdir to sequentially access each directory entry of dir. When there are no more entries in the directory, readdir will return NULL and the loop will terminate.

The three parts of the for specification are expressions and not statements. Therefore, if more than one operation needs to be performed when the loop begins or at the end of each iteration, the expressions cannot be grouped together using braces. You will, however, often find expressions grouped together using the expression-sequencing comma (,) operator.[25]

[25] netbsdsrc/usr.bin/vi/vi/vs smap.c:389



for (cnt = 1, t = p; cnt <= cnt–orig; ++t, ++cnt) {

The value of two expressions joined with the comma operator is just the value of the second expression. In our case the expressions are evaluated only for their side effects: before the loop starts, cnt will be set to 1 and t to p, and after every loop iteration t and cnt will be incremented by one.

Any expression of a for statement can be omitted. When the second expression is missing, it is taken as true. Many programs use a statement of the form for (;;) to perform an "infinite" loop. Very seldom are such loops really infinite. The following example—taken out of init, the program that continuously loops, controlling all Unix processes—is an exception.[26]

[26] netbsdsrc/sbin/init/init.c:540–545

Figure 2.5 Expanding tab stops (supplementary functions).

static void
getstops(char *cp)
{
int i;

nstops = 0;
for (;;) {
i = 0;
while (*cp >= '0' && *cp <= '9')
i = i * 10 + *cp++ - '0';
if (i <= 0 || i > 256) {
bad:
fprintf(stderr, "Bad tab stop spec\n");
exit(1);
}
if (nstops > 0 && i <= tabstops[nstops-1])
goto bad;
tabstops[nstops++] = i;
if (*cp == 0)
break;
if (*cp != ',' && *cp != ' ')
goto bad;
cp++;
}
}

static void
usage(void)
{
(void)fprintf (stderr, "usage: expand [-t tablist] [file ...]\n");
exit(1);
}



Parse tab stop specification



Convert string to number



Complain about unreasonable specifications



Verify ascending order



Break out of the loop



Verify valid delimiters



break will transfer control here



Print program usage and exit



for (;;) {
s = (state_t) (*s)();
quiet = 0;
}

In most cases an "infinite" loop is a way to express a loop whose exit condition(s) cannot be specified at its beginning or its end. These loops are typically exited either by a return statement that exits the function, a break statement that exits the loop body, or a call to exit or a similar function that exits the entire program. C++, C#, and Java programs can also exit such loops through an exception (see Section 5.2).



A quick look through the code of the loop in Figure 2.5 provides us with the possible exit routes.

A bad stop specification will cause the program to terminate with an error message (Figure 2.5:3).

The end of the tab specification string will break out of the loop.

Exercise 2.16 The for statement in the C language family is very flexible. Examine the source code provided to create a list of ten different uses.

Exercise 2.17 Express the examples in this section using while instead of for. Which of the two forms do you find more readable?

Exercise 2.18 Devise a style guideline specifying when while loops should be used in preference to for loops. Verify the guideline against representative examples from the book's CD-ROM.

No comments: