Villanova University
CSC 2400: Computer Systems I

A "De-Comment" Program


Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/Unix programming tools, especially bash, emacs, and gcc.


Background

The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs:

  • Merge "physical" lines of source code into "logical" lines. That is, when the preprocessor detects a line that ends with the backslash character, it merges that physical line with the next physical line to form one logical line.
  • Remove comments from ("de-comment") the source code.
  • Handle preprocessor directives (#define, #include, etc.) that reside in the source code.

The de-comment job is substantial. For example, the C preprocessor must be sensitive to:

  • The fact that a comment is a token delimiter. After removing a comment, the C preprocessor must make sure that a whitespace character is in its place.
  • Line numbers. After removing a comment, the C preprocessor sometimes must insert blank lines in its place to preserve the original line numbering.
  • String literal boundaries. The preprocessor must not consider the character sequence (//) to be a comment if it occurs inside a string literal ("...")

Your Task

Your task is to compose a C program named "decomment" that performs a subset of the de-comment job of the C preprocessor, as defined below.



Functionality

Your program should be a Unix "filter." That is, your program should read characters from the standard input stream, and write characters to the standard output stream. Specifically, your program should (1) read text, presumably a C program, from the standard input stream, (2) write that same text to the standard output stream with each comment replaced by a newline, and (3) write error and warning messages as appropriate to the standard output stream. A typical execution of your program from the shell might look like this:

         decomment < somefile.c > somefilewithoutcomments.c 

To write characters to the standard output stream, use the standard putchar function. A call putchar(nextc)  (where nextc is declared as an int variable) is equivalent to either

               printf(“%c”, nextc);  /* or */

               fprintf(stdout, “%c”, nextc);

In the following examples a newline character is shown as "n".

A.     Your program should recognize only comments of the form (//…) and ignore comments of the form (/* … */). Each comment (//…) should be replaced with a newline. Examples:

Standard Input Stream

Standard Output Stream

abc//defn

abcn

abc//def//ghin

abcn

abc/*def*/ghin

abc/*def*/ghin

B.     Text of the form (// ...) that occurs within a string literal ("...") should not be considered a comment. Examples:

Standard Input Stream

Standard Output Stream

abc"def//ghi/jkl"mnon

abc"def//ghi/jkl"mnon

abc//def"ghi//jkl"mnon

abcn

C.     Your program should handle escaped characters within string literals. That is, when your program reads a backslash (\) while processing a string literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ("...\" ...") to be a valid string literal which happens to contain the double quote character. Examples:

Standard Input Stream

Standard Output Stream

abc"def\"ghi"jkln

abc"def\"ghi"jkln

abc"def\tghi"jkln

abc"def\tghi"jkln

D.     Your program should handle newline characters in C string literals without generating errors or warnings. Examples:

Standard Input Stream

Standard Output Stream

abc"defnghi"jkln

abc"defnghi"jkln

abc"defnghinjkl"mno//pqrn

abc"defnghinjkl"mnon

Note that a C compiler would consider those examples to be erroneous (newline character in a string literal). But many C preprocessors would not, and your program should not.

E.      Your program should handle unterminated string literals without generating errors or warnings. Examples:

Standard Input Stream

Standard Output Stream

abc"def//ghin

abc"def//ghin

Note that a C compiler would consider those examples to be erroneous (unterminated string literal). But many C preprocessors would not, and your program should not.

You may assume that the final line of the standard input stream ends with the newline character, as files created with emacs typically do.

Your program may assume that the backslash-newline character sequence does not occur in the standard input stream. That is, your program may assume that logical lines are identical to physical lines in the standard input stream.


Design

Design your program as a finite state automaton (FSA). The FSA concept is described in the lecture notes and on wikipedia, http://en.wikipedia.org/wiki/Finite-state_machine.

Generally, a (large) C program should consist of multiple source code files. For this assignment, you need not split your source code into multiple files. Instead you may place all source code in a single source code file. Subsequent assignments will ask you to write programs consisting of multiple source code files.

We suggest that your program use the standard C getchar() function to read characters from the standard input stream. Also use enum or define constructs for the states of your FSA in your C implementation.

Logistics

You should create your program on tanner/degas/rodin using bash, emacs and gcc.

Step 1: Design a DFA

Express your DFA using the traditional "ovals and labeled arrows" notation. More precisely, use the same notation as is used in the examples shown in class. Capture as much of the program's logic as you can within your DFA. The more logic you express in your DFA, the better your grade on the DFA will be.

Step 2: Create Source Code

Use emacs to create source code in a file named decomment.c that implements your DFA. Make sure your file decomment.c is located in the directory csc2400.

Step 3: Preprocess, Compile, Assemble, and Link

Use the gcc command to preprocess, compile, assemble, and link your program. Perform each step individually, and examine the intermediate results to the extent possible.

Step 4: Execute

Execute your program multiple times on various input files that test all logical paths through your code.

You should also test your decomment program against its own source code using a command sequence such as this:

decomment < decomment.c > output 

Step 5: Create a readme File

Use Emacs to create a text file named "readme" (not "readme.txt", or "README", or "Readme", etc.) that contains:

  • Your name and the assignment number.
  • A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated.
  •  An indication of how much time you spent doing the assignment.
  • Your assessment of the assignment: Did it help you to learn? What did it help you to learn? Do you have any suggestions for improvement? Etc.
  • Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

Descriptions of your code should not be in the readme file. Instead they should be integrated into your code as comments. Your readme file should be a plain text file. Don't create your readme file using Microsoft Word or any other word processor.

Step 6: Submit

Hand in a printout of your decomment.c file, your readme file, and a hardcopy of your "circles and labeled arrows" DFA. A DFA drawn using drawing software (e.g. Microsoft PowerPoint) would be good, but it is sufficient to submit a neatly hand-drawn DFA.


Grading

We will grade your work on two kinds of quality: quality from the user's point of view, and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc generates warning messages.

From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the decomment program is defined by the previous sections of this assignment specification.

From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, style is defined by the rules summarized in the Basic Rules of Programming Style document. These additional rules apply:

  • Names: You should use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix "c" might indicate that the variable is of type char, "i" might indicate int, etc. But it is fine to use another style as long as the result is a clear and readable program.
  • Comments: Each source code file should begin with a comment that includes your name, the number of the assignment, and the name of the file. Include comments in your source code to explain what blocks of code do.

Acknowledgement. This project has been originally designed by Prof. Jennifer Rexford from Princeton University and modified by Prof. Mirela Damian.