CSC 2400: Computer Systems

A "De-Comment" Program


Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/Unix programming tools, especially bash, emacs/pico, and gcc.


Background

The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs:

The de-comment job is substantial. For example, the C preprocessor must be sensitive to:


Your Task

Your task is to compose a C program named "decomment" that performs a subset of the de-comment job of the C preprocessor, as defined below.


Functionality

Your program should be a Unix "filter." That is, your program should read characters from the standard input stream, and write characters to the standard output stream and possibly to the standard error stream. Specifically, your program should (1) read text, presumably a C program, from the standard input stream, and (2) write that same text to the standard output stream with each comment replaced by a space. A typical execution of your program from the shell might look like this:

decomment < somefile.c > somefilewithoutcomments.c 

In the following examples a space character is shown as "s" and a newline character as "n".

Your program should replace each comment with a space. Examples:

Standard Input Stream

Standard Output Stream

abc/*def*/ghin

abcsghin

abc/*def*/sghin

abcssghin

abcs/*def*/ghin

abcssghin

Your program should consider text of the form (/* ... */) to be a comment. It should not consider text of the form (// ... ) to be a comment. Example:

Standard Input Stream

Standard Output Stream

abc//defn

abc//defn

Your program should allow a comment to span multiple lines. That is, your program should allow a comment to contain newline characters. Your program should add blank lines as necessary to preserve the original line numbering. Examples:

Standard Input Stream

Standard Output Stream

abc/*defnghi*/jklnmnon

abcsnjklnmnon

abc/*defnghinjkl*/mnonpqrn

abcsnnmnonpqrn

Your program should not recognize nested comments. Example:

Standard Input Stream

Standard Output Stream

abc/*def/*ghi*/jkl*/mnon

abcsjkl*/mnon

Your program should detect an unterminated comment. If your program detects end-of-file before a comment is terminated, it should write the message "Error: line X: unterminated comment" to the standard error stream. "X" should be the number of the line on which the unterminated comment begins. Examples:

Standard Input Stream

Standard Output Stream

Error Message

abc/*defnghin

abcsnn

Error:slines1:sunterminatedscommentn

abcdefnghi/*n

abcdefnghisn

Error:slines2:sunterminatedscommentn

ab/c//*def/ghinjkln

ab/c/snn

Error:slines1:sunterminatedscommentn

abc/*def*ghinjkln

abcsnn

Error:slines1:sunterminatedscommentn

abc/*defnghi*n

abcsnn

Error:slines1:sunterminatedscommentn

abc/*defnghi/n

abcsnn

Error:slines1:sunterminatedscommentn

Your program should work for standard input lines of any length.


Design

Design your program as a deterministic finite state automaton (DFA, alias FSA). The FSA concept is described in wikipedia, http://en.wikipedia.org/wiki/Finite-state_machine.

Generally, a (large) C program should consist of multiple source code files. For this assignment, you need not split your source code into multiple files. Instead you may place all source code in a single source code file. Subsequent assignments will ask you to write programs consisting of multiple source code files.

We suggest that your program use the standard C getchar() function to read characters from the standard input stream. Also use enum or define constructs for the states of your FSA in your C implementation.

Logistics

You should create your program on one of our Unix machines (tanner or felix).

Step 1: Design a DFA

Express your DFA using the traditional "ovals and labeled arrows" notation. More precisely, use the same notation as is used in the examples shown in class. Capture as much of the program's logic as you can within your DFA. The more logic you express in your DFA, the better your grade on the DFA will be.

Step 2: Create Source Code

In your csc2400 directory, create another directory called decomment. Your solution should reside in the directory csc2400/decomment. Write your code in a file named decomment.c that implements your DFA.

Step 3: Compile and Test your Source Code

Execute your program multiple times on various input files that test all logical paths through your code.

You should also test your decomment program against its own source code using a command sequence such as this:

decomment < decomment.c > output 

Step 4: Create a readme File

Create a text file named "readme" (not "readme.txt", or "README", or "Readme", etc.) that contains:

Descriptions of your code should not be in the readme file. Instead they should be integrated into your code as comments.

Your readme file should be a plain text file. Don't create your readme file using Microsoft Word or any other word processor.

Step 5: Submit

Hand in a printout of your decomment.c file, your readme file, and a hardcopy of your "circles and labeled arrows" DFA. A DFA drawn using drawing software (e.g. Microsoft PowerPoint) would be good, but it is sufficient to submit a neatly hand-drawn DFA.


Grading

We will grade your work on two kinds of quality: quality from the user's point of view, and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc generates warning messages.

From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the decomment program is defined by the previous sections of this assignment specification.

From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, style is defined by the rules summarized in the Basic Rules of Programming Style document. These additional rules apply:


OPTIONAL Functionality (Bonus Points)

You may expand your code to handle string literals and character literals as follows (just like a real preprocessor does). Text of the form (/* ... */) that occurs within a string literal ("...") should not be considered a comment. Examples:

Standard Input Stream

Standard Output Stream

abc"def/*ghi*/jkl"mnon

abc"def/*ghi*/jkl"mnon

abc/*def"ghi"jkl*/mnon

abcsmnon

abc/*def"ghijkl*/mnon

abcsmnon

Similarly, text of the form (/* ... */) that occurs within a character literal ('...') should not be considered a comment. Examples:

Standard Input Stream

Standard Output Stream

abc'def/*ghi*/jkl'mnon

abc'def/*ghi*/jkl'mnon

abc/*def'ghi'jkl*/mnon

abcsmnon

abc/*def'ghijkl*/mnon

abcsmnon

Note that the C compiler would consider the first of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not.

Your program should handle escaped characters within string literals. That is, when your program reads a backslash (\) while processing a string literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ("...\" ...") to be a valid string literal which happens to contain the double quote character. Examples:

Standard Input Stream

Standard Output Stream

abc"def\"ghi"jkln

abc"def\"ghi"jkln

abc"def\'ghi"jkln

abc"def\'ghi"jkln

Similarly, your program should handle escaped characters within character literals. When your program reads a backslash (\) while processing a character literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ('...\' ...') to be a valid character literal which happens to contain the quote character. Examples:

Standard Input Stream

Standard Output Stream

abc'def\'ghi'jkln

abc'def\'ghi'jkln

abc'def\"ghi'jkln

abc'def\"ghi'jkln

Note that the C compiler would consider both of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not.

Your program should handle newline characters in C string literals without generating errors or warnings. Examples:

Standard Input Stream

Standard Output Stream

abc"defnghi"jkln

abc"defnghi"jkln

abc"defnghinjkl"mno/*pqr*/stun

abc"defnghinjkl"mnosstun

Note that a C compiler would consider those examples to be erroneous (newline character in a string literal). But many C preprocessors would not, and your program should not.

Similarly, your program should handle newline characters in C character literals without generating errors or warnings. Examples:

Standard Input Stream

Standard Output Stream

abc'defnghi'jkln

abc'defnghi'jkln

abc'defnghinjkl'mno/*pqr*/stun

abc'defnghinjkl'mnosstun

Note that a C compiler would consider those examples to be erroneous (multiple characters in a character literal, newline character in a character literal). But many C preprocessors would not, and your program should not.

Your program should handle unterminated string and character literals without generating errors or warnings. Examples:

Standard Input Stream

Standard Output Stream

abc"def/*ghi*/jkln

abc"def/*ghi*/jkln

abc'def/*ghi*/jkln

abc'def/*ghi*/jkln

Note that a C compiler would consider those examples to be erroneous (unterminated string literal, unterminated character literal, multiple characters in a character literal). But many C preprocessors would not, and your program should not.

Acknowledgement

This project has been designed by Prof. Jennifer Rexford from Princeton University, slightly modified by Prof. Mirela Damian.