C Programming/Strings

Objective

edit
  • Learn more about arrays, char arrays, and strings.

String as char array

edit

There is no string type in C. Instead we have arrays or constants, and we can use them to store sets of characters. The string "Hello World!" can be stored like in the example below.

char s[] = "Hello World!";

This creates an array of chars so that s[0] is 'H', s[1] is 'e', etc. In addition, because C allows us to walk off the end of arrays there is a final, terminating character s[12] which is '\0'.

Hypothetically one could create s using:

char s[13] = {'H','e','l','l','o',' ','w','o','r','l','d','!','\0'};

String as char pointer

edit

C-Style strings variables are often dealt with not as an array of chars, but rather as a pointer to the first element of an array of chars sitting in memory. The type "char *" (pronounced char star) is used for this.

char *spointer = &s[0]; // The & reference operator gets the memory location of an object
                        // In this case we set spointer to point to the first character in the array
char *spointer = s; // Simplified way of writing the above; 
                    // An array implicitly "degrades" into a pointer to its first element

The value of spointer is a something like 1386120007. This is a memory address. Literally this means that the data we want is at the 1386120007th byte in memory. We can use the * (star) operator to inspect the data at that address. We find that it is the 'H' in "Hello World!". To get at the 'H' we dereference the pointer using the star operator.

spointer == 1386120007
*spointer == 'H'
*(spointer + 1) == 'e' // (spointer + 1) is an address (like 1386120008) while *(spointer + 1) is a character at that address
*(spointer + 2) == 'l' // (spointer + 2) is an address (like 1386120009) while *(spointer + 2) is a character at that address

We see that to get at the next value we can simply look at the next memory address (spointer + 1). This is because our string is stored sequentially in memory (unlike in a linked list for example).

Using this memory-address arithmetic and the string terminator discussed above we can very easily walk down a string, doing something at each character:

char* tempPointer; // A temporary pointer to trace along the string; 
tempPointer = spointer // Start at the beginning of the string (the 'H')
while(*tempPointer != '\0') //While we haven't reached the end
{
    *tempPointer = 'a';// Do something arbitrary to the character currently pointed to by spointer
    tempPointer++; // Move one character forward
}

String Pointers Pretending to be Arrays

edit

In the last section we saw that we could access the ith element of a string using

int i = ...
char *spointer = ...
*(spointer + i) = 'z'; // Set ith element of the string to 'z'

In C this can be rewritten in Array Syntax as follows:

spointer[i] = 'z'; // Same as *(spointer + i) = 'z'

Please note that even though this looks and behaves like an array of characters, it is not. This distinction can cause a lot of confusion for students just starting with pointers. It is recommended that you use the *(spointer + i) method for your first few assignments.

If we include string.h, and use the int strlen(const char *s) function to obtain the length of the string and use for loops and structures that may be more familiar to those coming from other languages. The example below also replaces all letters with 'a'.

#include <string.h>

...

int i;
int len = strlen(spointer);
for(i = 0; i < len; i++)
     spointer[i] = 'a';

Beware! strlen returns the length of the string not including the terminating character '\0' . It is very important that you always terminate any string you create, or it will continue through other memory until it finds a '\0'. If you modify some other program's memory your program may crash.

This array-style method is probably more familiar to many programmers not accustomed to pointers. It is strongly recommended however that students new to C do not use this method until they understand pointers. While convenient it hides what is happening inside the computer. Additionally, this method is slower because it looks through the string multiple times (first to determine the length in strlen, and then iterating over it in the for loop).

Allocating Strings Dynamically

edit

When creating static strings like "Hello World!" the compiler can allocate the space ahead of time. If your program needs to create a string of varying lengths then you'll have to allocate the memory yourself using malloc.

In duplicating a string, s, for example we would need to find the length of that string:

int len = strlen(s);

And then allocate the same amount of space plus one for the terminator and create a variable that points to that area in memory:

char *s2 = malloc((len + 1) * sizeof(char));

This creates the appropriate amount of space and gives us a pointer to the location of the first character. Now we just need to copy the string s over, character by character. This could be accomplished with the while or for loop methods described above.

Assignments

edit

Completion status: Almost complete, but you can help make it more thorough.