Manipulating Files in C


Standard "C" File Read And Write

This tutorial describes standard C methods for reading files and writing into files. This works portably across all operating systems.
  1. The FILE Structure
  2. Opening And Closing A File
  3. Reading From An Open File
  4. Writing Into An Open File
  5. Moving The Read/Write Location In An Open File
  6. A Complete Example


The FILE Structure

The FILE structure is the basic data type used when handling files with the standard C library. When we open a file, we get a pointer to such a structure, that we later use with all other operations on the file, until we close it. This structure contains information such as the location in the file from which we will read next (or to which we will write next), the read buffer, the write buffer, and so on. Sometimes this structure is also referred to as a "file stream", or just "stream".

Opening And Closing A File

In order to work with a file, we must open it first, using the fopen() function. We specify the path to the file (full path, or relative to the current working directory), as well as the mode for opening the file (open for reading, for writing, for reading and writing, for appending only, etc.). Here are a few examples of how to use it:
/* FILE structure pointers, for the return value of fopen() */
FILE* f_read;
FILE* f_write;
FILE* f_readwrite;
FILE* f_append;

/* Open the file /home/me/data.txt for reading */
f_read = fopen("/home/me/data.txt", "r");
if (!f_read) { /* open operation failed. */
    perror("Failed opening file '/home/me/data.txt' for reading:");
    exit(1);
}

/* Open the file logfile in the current directory for writing. */
/* if the file does not exist, it is being created.            */
/* if the file already exists, its contents is erased.         */
f_write = fopen("logfile", "w");

/* Open the file /usr/local/lib/db/users for both reading and writing    */
/* Any data written to the file is written at the beginning of the file, */
/* over-writing the existing data.                                       */
f_readwrite = fopen("/usr/local/lib/db/users", "r+");

/* Open the file /var/adm/messages for appending.       */
/* Any data written to the file is appended to its end. */
f_append = fopen("/var/adm/messages", "a");
As you can see, the mode of opening the file is given as an abbreviation. More options are documented in the manual page for the fopen() function. The fopen() function returns a pointer to a FILE structure on success, or a NULL pointer in case of failure. The exact reason for the failure may be anything from "file does not exist" (in read mode), "permission denied" (if we don't have permission to access the file or its directory), I/O error (in case of a disk failure), etc. In such a case, the global variable "errno" is being set to the proper error code, and the perror() function may be used to print out a text string related to the exact error code.

Once we are done working with the file, we need to close it. This has two effects:

  1. Flushing any un-saved changes to disk (actually, to the operating system's disk cache).
  2. Freeing any resources associated with the open file.
Closing the file is done with the fclose() function, as follows:
if (!fclose(f_readwrite)) {
    perror("Failed closing file '/usr/local/lib/db/users':");
    exit(1);
}
fclose() returns 0 on success, or EOF (usually '-1') on failure. It will then set "errno" to zero. One may wonder how could closing a file fail - this may happen if any buffered writes were not saved to disk, and are being saved during the close operation. Whether the function succeeded or not, the FILE structure may not be used any more by the program.

Reading From An Open File

Once we have a pointer for an open file's structure, we may read from it using any of several functions. In the following code, assume f_read and f_readwrite pointers to FILE structures returned by previous calls to fopen().
/* variables used by the various read operations.            */
int c;
char buf[201];

/* read a single character from the file.                    */
/* variable c will contain its ASCII code, or the value EOF, */
/* if we encountered the end of the file's data.             */
c = fgetc(f_read);
/* read a single character from the file using fscanf        */
/* similar to scanf                                         */
fscanf(f_read, "%c", &c);

/* read one line from the file. A line is all characters up to a new-line  */
/* character, or up to the end of the file. At most 200 characters will be */
/* read in (i.e. one less then the number we supply to the function call). */
/* The string read in will be terminated by a null character, so that is   */
/* why the buffer was made 201 characters long, not 200. If a new line     */
/* character is read in, it is placed in the buffer, not removed.          */
/* note that 'stdin' is a FILE structure pre-allocated by the */
/* C library, and refers to the standard input of the process (normally    */
/* input from the keyboard).                                              */
fgets(buf, 201, stdin);

/* place the given character back into the given file stream. The next     */
/* operation on this file will return this character. Mostly used by       */
/* parsers that analyze a given text, and try to guess what the next       */
/* is. If they miss their guess, it is easier to push the last character   */
/* back to the file stream, then to make book-keeping operations.          */
ungetc(c, stdin);

/* check if the read/write head has reached past the end of the file.      */
if (feof(f_read)) {
    printf("End of file reached\n");
}

/* read one block of 120 characters from the file stream, into 'buf'.   */
/* (the third parameter to fread() is the number of blocks to read).    */
char buf[120];
if (fread(buf, 120, 1, f_read) != 1) {
    perror("fread");
}
There are various other file reading functions (getc() for example), but you'll be able to learn them from the on-line manual.

Note that when we read in some text, the C library actually reads it from disk in full blocks (with a size of 512 characters, or something else, as optimal for the operating system we work with). For example, if we read 20 consecutive characters using fgetc() 20 times, only one disk operation is made. The rest of the read operations are made from the buffer kept in the FILE structure.


Writing Into An Open File

Just like the read operations, we have write operations as well. They are performed at the current location of the read/write pointer kept in the FILE structure, and are also done in a buffered mode - only if we fill in a full block, the C library's write functions actually write the data to disk. Yet, we can force it to write data at a given time (e.g. if we print to the screen and want partially written lines to appear immediately). In the following example, assume that f_readwrite is a pointer to a FILE structure returned from a previous call to fopen().
/* variables used by the various write operations.   */
int c;
char buf[201];

/* write the character 'a' to the given file.        */
c = 'a';
fputc(c, f_readwrite);

/* write the string "hello world" to the given file. */
strcpy(buf, "hello world");
fputs(buf, f_readwrite);

/* write the string "hi there, mate" to the given file */
/* a new-line in placed in the string, to make the cursor move      */
/* to the next line in the file after writing the string.             */
fprintf(f_readwrite, "hi there, mate\n");

/* write out any buffered writes to the given file stream.          */
fflush(stdout);

/* write once the string "hello, great world. we feel fine!\n" to 'f_readwrite'. */
/* (the third parameter to fwrite() is the number of blocks to write).            */
char buf[100];
strcpy(buf, "hello, great world. we feel fine!\n");
if (fwrite(buf, strlen(buf), 1, f_readwrite) != 1) {
    perror("fwrite");
}
Note that when the output is to the screen, the buffering is done in line mode, i.e. whenever we write a new-line character, the output is being flushed automatically. This is not the case when our output is to a file, or when the standard output is being redirected to a file. In such cases the buffering is done for larger chunks of data, and is said to be in "block-buffered mode".

Moving The Read/Write Location In An Open File

Until now we have seen how input and output is done in a serial mode. However, in various occasions we want to be able to move inside the file, and write to different locations, or read from different locations, without having to scan the whole code. This is common in database files, when we have some index telling us the location of each record of data in the file. Traveling in a file stream in such a manner is also called "random access".

The fseek() function allows us to move the read/write pointer of a file stream to a desired location, stated as the number of bytes from the beginning of the file (or from the end of file, or from the current position of the read/write pointer). The ftell() function tells us the current location of the read/write header of the given file stream. Here is how to use these functions:

/* move the read/write pointer of the file stream to position '30' */
/* in the file. Note that the first position in the file is '0',   */
/* not '1'.                                                        */
fseek(f_read, 29L, SEEK_START);

/* move the read/write pointer of the file stream 25 characters    */
/* forward from its given location.                                */
fseek(f_read, 25L, SEEK_SET);

/* remember the current read/write pointer's position, move it     */
/* to location '520' in the file, write the string "hello world",  */
/* and move the pointer back to the previous location.             */
long old_position = ftell(f_readwrite);
if (old_position < 0) {
    perror("ftell");
    exit(0);
}
if (fseek(f_readwrite, 520L, SEEK_START) < 0) {
    perror("fseek(f_readwrite, 520L, SEEK_SET)");
    exit(0);
}
fputs("hello world", f_readwrite);
if (fseek(f_readwrite, old_position, SEEK_START) < 0) {
    perror("fseek(f_readwrite, old_position, SEEK_SET)");
    exit(0);
}
Note that if we move inside the file with fseek(), any character put to the stream using ungetc() is lost and forgotten.

Note: It is OK to seek past the end of a file. If we will try to read from there, we will get an error, but if we try to write there, the file's size will be automatically enlarged to contain the new data we wrote. All characters between the previous end of file and the newly written data will contain nulls ('\0') when read. Note that the size of the file has grown, but the file itself does not occupy so much space on disk - the system knows to leave "holes" in the file. However, if we try to copy the file to a new location using the Unix "cp" command, the new file will have all wholes filled in, and will occupy much more disk space then the original file.


A Complete Example

Two examples are given for the usage of the standard C library I/O functions. The first example is a file copying program, that reads a given file one line at a time, and writes these lines to a second file. The source code is found in the file stdc-file-copy.c. Note that this program does not check if a file with the name of the target already exists, and thus viciously erases any existing file. Be careful when running it! Later, when discussing the system calls interface, we will see how to avoid this danger.

The second example manages a small database file with fixed-length records (i.e. all records have the same size), using the fseek() function. The source is found in the file stdc-small-db.c. Functions are supplied for reading a record and for writing a record, based on an index number. See the source code for more info. This program uses the fread() and fwrite() functions to read data from the file, or write data to the file. Check the on-line manual page for these functions to see exactly what they do.


Acknowledgement: This notes have been taken from the LUPG tutorial and slightly modified by Mirela Damian.