Strings in C

Review of Strings in C

A string in C is a sequence of zero or more characters followed by the NULL '\0' character. A null character is quite different conceptually from a null pointer, although both are represented by the integer 0.

String-valued variables are usually declared to be pointers of type char *. Such variables do not include memory space for the text of a string. Memory space must be allocated by means of an array declaration, or a string constant, or dynamic memory allocation. It is up to you to store the address of the chosen memory space into the pointer variable. Examples:

Constant string Array delaration Dynamic memory allocation

char * p = "hello";

char buf[6]; char * p = &buf[0]; strcpy(buf, "hello");

char * p = NULL; p = (char *)malloc(strlen("hello")+1); strcpy(p, "hello");

In each of these cases, the following memory space will be allocated:

Note that the first line of code in the rightmost column above stores the NULL pointer in the pointer variable. The NULL pointer does not point anywhere. Attempting to reference the string it points to gets an error.

Important note: All the C standard library functions operating on strings require the NULL terminating character.

Standard C string library

Operations on strings (or arrays of characters) are an important part of many programs. The GNU C library provides an extensive set of string utility functions, including functions for copying, concatenating, comparing, and searching strings. To use these functions, include the header file string.h in your program.

Function: size_t strlen (const char *s);

The strlen function returns the length of the null-terminated string s (not counting the NULL character). In other words, it returns the offset of the terminating null character within the array. For example,

strlen ("hello, world")
    => 12

When applied to a character array, the strlen function returns the length of the string stored there, not the number of bytes allocated for the string. You can get the size of the character array that holds a string using the sizeof operator:

char string[32] = "hello, world"; 
sizeof (string)
    => 32
strlen (string)
    => 12

Function: char * strcpy (char *to, const char *from);

This copies characters from the string from (up to and including the terminating null character) into the string to. Like memcpy, this function has undefined results if the strings overlap. The return value is the value of to. Example:

Code Memory snapshot

char buf[10]; char * p = "hello";

strcpy(buf, p);

Function: char * strncpy (char *to, const char *from, size_t size);

This function is similar to strcpy but always copies exactly size characters into to.

If the length of from is more than size, then strncpy copies just the first size characters.

If the length of from is less than size, then strncpy copies all of from, followed by enough null characters to add up to size characters in all. This behavior is rarely useful, but it is specified by the ANSI C standard. Example:

Code Memory snapshot

char buf[10]; char* p = "hello";

strncpy(buf, p, 2);

Function: char * strcat (char *to, const char *from);

The strcat function is similar to strcpy, except that the characters from from are concatenated or appended to the end of to, instead of overwriting it. That is, the first character from from overwrites the null character marking the end of to. Example:

Code Memory snapshot

char buf[10] = "Say";

strcat(buf, " hello");

Function: char * strncat (char *to, const char *from, size_t size);

This function is like strcat except that not more than size characters from from are appended to the end of to. A single null character is also always appended to to, so the total allocated size of to must be at least size + 1 bytes longer than its initial length.

Here is an example showing the use of strncpy and strcat. Notice how, in the call to strncat, the size parameter is computed to avoid overflowing the character array buffer.

#include <string.h>
#include <stdio.h>

#define SIZE 10

static char buffer[SIZE];

main ()
{
  strncpy (buffer, "hello", SIZE);
  printf ("%s\n", buffer);
  strncat (buffer, ", world", SIZE - strlen (buffer) - 1);
  printf ("%s\n", buffer);
}

The output produced by this program looks like:

hello
hello, wo

Function: int strcmp (const char *s1, const char *s2);

The strcmp function compares the string s1 against s2, returning a value that has the same sign as the difference between the first differing pair of characters (interpreted as unsigned char objects, then promoted to int).

If the two strings are equal, strcmp returns 0.

A consequence of the ordering used by strcmp is that if s1 is an initial substring of s2, then s1 is considered to be "less than" s2.

Function: int strncmp (const char *s1, const char *s2, size_t size);

This function is the similar to strcmp, except that no more than size characters are compared. In other words, if the two strings are the same in their first size characters, the return value is zero.

Here are some examples showing the use of strcmp and strncmp. These examples assume the use of the ASCII character set.

strcmp ("hello", "hello")
    => 0    /* These two strings are the same. */
strcmp ("hello", "Hello")
    => 32   /* Comparisons are case-sensitive. */
strcmp ("hello", "world")
    => -15  /* The character 'h' comes before 'w'. */
strcmp ("hello", "hello, world")
    => -44  /* Comparing a null character against a comma. */
strncmp ("hello", "hello, world", 5)
    => 0    /* The initial 5 characters are the same. */
strncmp ("hello, world", "hello, wide world!!!", 5)
    => 0    /* The initial 5 characters are the same. */

Function: char * strtok (char *newstring, const char *delimiters);

A string can be split into tokens by making a series of calls to the function strtok.

The string to be split up is passed as the newstring argument on the first call only. The strtok function uses this to set up some internal state information. Subsequent calls to get additional tokens from the same string are indicated by passing a null pointer as the newstring argument. Calling strtok with another non-null newstring argument reinitializes the state information. It is guaranteed that no other library function ever calls strtok behind your back (which would mess up this internal state information).

The delimiters argument is a string that specifies a set of delimiters that may surround the token being extracted. All the initial characters that are members of this set are discarded. The first character that is not a member of this set of delimiters marks the beginning of the next token. The end of the token is found by looking for the next character that is a member of the delimiter set. This character in the original string newstring is overwritten by a null character, and the pointer to the beginning of the token in newstring is returned.

On the next call to strtok, the searching begins at the next character beyond the one that marked the end of the previous token. Note that the set of delimiters delimiters do not have to be the same on every call in a series of calls to strtok.

If the end of the string newstring is reached, or if the remainder of string consists only of delimiter characters, strtok returns a null pointer.

Warning: Since strtok alters the string it is parsing, you always copy the string to a temporary buffer before parsing it with strtok. If you allow strtok to modify a string that came from another part of your program, you are asking for trouble; that string may be part of a data structure that could be used for other purposes during the parsing, when alteration by strtok makes the data structure temporarily inaccurate.

The string that you are operating on might even be a constant. Then when strtok tries to modify it, your program will get a fatal signal for writing in read-only memory.

Here is a simple example showing the use of strtok.

#include <string.h> #include <stddef.h> ... char buf[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *token; ... token = strtok (buf, delimiters); /* token => "words" */ token = strtok (NULL, delimiters); /* token => "separated" */ token = strtok (NULL, delimiters); /* token => "by" */ token = strtok (NULL, delimiters); /* token => "spaces" */ token = strtok (NULL, delimiters); /* token => "and" */ token = strtok (NULL, delimiters); /* token => "punctuation" */ token = strtok (NULL, delimiters); /* token => NULL */

See the tutorialspoint site for a comprehensive list of standard C string functions.

Constant string	Array delaration	Dynamic memory allocation
char * p = "hello";	char buf[6]; char * p = &buf[0]; strcpy(buf, "hello");	char * p = NULL; p = (char *)malloc(strlen("hello")+1); strcpy(p, "hello");

Code	Memory snapshot
char buf[10]; char * p = "hello";
strcpy(buf, p);

Code	Memory snapshot
char buf[10]; char* p = "hello";
strncpy(buf, p, 2);

Code	Memory snapshot
char buf[10] = "Say";
strcat(buf, " hello");