Char arrays are very often used in C language, while new C-learners may be confused by how to declare and use a char array. In this blog, I'll show the difference between three kinds of char array in C. First of all, let's take a look at the code below:
/* test.c */
#include <stdio.h>
int main() {
// Declaration of four arrays
const char *r = "hello world!";
char *s = "hello world!";
char t[20] = "hello world!";
char* u = (char*)malloc(20*sizeof(char));
memset(u, 0, 20);
strcpy(u, "hello world!");
// print the addresses storing each array
printf("0x%lx\n", &r);
printf("0x%lx\n", &s);
printf("0x%lx\n", &t);
printf("0x%lx\n", &u);
printf("-------------\n");
// print the addresses of first element in each array
printf("0x%lx\n", r);
printf("0x%lx\n", s);
printf("0x%lx\n", t);
printf("0x%lx\n", u);
printf("-------------\n");
// print the content of arrays (correct)
printf("%s\n", r);
printf("%s\n", s);
printf("%s\n", t);
printf("%s\n", u);
printf("-------------\n");
// print the content of arrays (incorrect)
printf("%s\n", &r);
printf("%s\n", &s);
printf("%s\n", &t);
printf("%s\n", &u);
return 0;
}
Let's compile this program in GCC: (using -m32 option to make it looks simple)
gcc -m32 -o test test.c
Now the result of those codes are:
0xffffd22c
0xffffd230
0xffffd238
0xffffd234
-------------
0x8048770
0x804877d
0xffffd238
0x804b008
-------------
hello world!
hello world!
hello world!
hello world!
-------------
p�} hello world!
} hello world!
hello world!
hello world!
From the result of this program, we could clearly see the difference between these three kinds of char arrays:
1. Defined as const char array using pointer
const char *r = "hello world!";
char *s = "hello world!";
This kind of char array’s l-value (address) is on stack, and r-value (content) is in .data area. If you don’t assign a const to it, it will be transferred to const. In this example, r and s are the same.
2. Defined as char array
char t[20] = "hello world!";
This kind of char array’s l-value (address) and r-value (content) is on stack.
3. Defined as a dynamic allocated char array
char* u = (char*)malloc(20*sizeof(char));
This kind of char array’s l-value (address) is in stack, and r-value (content) is in heap.
From this program, we know that the array that is declared not using dynamic memory whose address storing this array (&t) has the same address with its first elements address (t). Remember, t has double meaning: t is both the name of this char array (when we do sizeof(t), it comes to 20) and refer to the address of the first element in this array. While the array that is declared using dynamic memory whose address storing this array (&u) has the different address with its first elements address (u), and u is on heap. While the array that is declared using const whose address storing this array (&r and &s) has the different address with its first elements address (r and s), and r and s are in .data area. The table below illustrates how they exist in main memory.
Simple virtual memory image for the code example
// remind of our sample code
const char *r = "hello world!"; // const char array
char *s = "hello world!"; // const char array
char t[20] = "hello world!"; // char array
char* u = (char*)malloc(20*sizeof(char)); // dynamic char array
memset(u, 0, 20);
strcpy(u, "hello world!");
In theorem, the incorrect code in the last four lines of code cound not output the result. How could all of them generate "hello world" as well? Now let's analyze how the incorrect code output "correct" or "partially correct code". Just using GDB to see how they're arranged:
(gdb) x/10x r
0x8048770: 0x6c6c6568 0x6f77206f 0x21646c72 0x6c656800
0x8048780: 0x77206f6c 0x6c64726f 0x78300021 0x0a786c25
0x8048790: 0x2d2d2d00 0x2d2d2d2d
(gdb) x/10x &r
0xffffd22c: 0x08048770 0x0804877d 0x0804b008 0x6c6c6568
0xffffd23c: 0x6f77206f 0x21646c72 0x00000000 0x00000000
0xffffd24c: 0x80ad5600 0xf7fb43dc
As we can see in the GDB info above and previous output, r sits in 0xffffd22c, s sits in 0xffff230, t sits in 0xffff238 while u sits in 0xffff234. If we look it carefully, t fell beind u in stack while we defined t before u in code. I guess the GCC made some optimization on it to keep alignment of data fetch group in micro-architecture level to save one fetch cycle. Thus, s is in the last of the four arrays and its content is also in stack. The weird ouput of last four lines of code makes sense now.