* Representing graphs in C: ======================== - Two standard representations: adjacency matrix, and adjacency lists. By convention, vertices are usually represented as index numbers from 0 to n-1. In that case, an adjacency matrix is simply an [n x n] array whose (i,j)-th entry indicates whether the edge (i,j) is part of the graph or not. Adjacency lists are used most often for *sparse* graphs (without many edges), for which storing an entire adjacency matrix would be wasteful: for each vertex i in the graph, we store a list (as an array, or a linked list) of the vertices to which i is adjacent. - If our code will be dealing only with graphs of a fixed size, we can declare adjacency matrices at compile-time: #define GRAPH_SIZE 133 int adj_matrix[GRAPH_SIZE][GRAPH_SIZE]; - Unfortunately, that is rarely the case. Most of the time, the graph size is part of the input so it is not known until run-time. This means we have to allocate the adjacency matrix dynamically. In order to do this properly, we have to remember a few things about arrays and pointers in C... - In C, arrays and pointers are closely associated: if we have the following declaration: ``int a[10];'', then the name `a' is in fact a *pointer* to the first element in the array (in other words, a == &a[0]). More generally, `a+i' is the same as `&a[i]' so `*(a+i)' is the same as `a[i]'. (This is why array parameters or return values can be declared simply as pointers.) - Also, because an array is a lot like a pointer, a pointer can be used like an array! For example, if we declare ``int *p;'' and assign ``p = a;'', then we can use `p[5]' to refer to `a[5]'. Because of pointer arithmetic, we can also use `*(p+5)' to refer to the same element. (King's book has an excellent discussion of these issues in Chapter 12.) - Since arrays can only be declared with a constant size, how do we create arrays with a variable size? Using dynamic allocation, i.e., the `malloc' function. Remember that malloc takes as argument the number of bytes to allocate, and it returns a "generic" pointer to the first address of the block of memory allocated. For example, the following block of code is almost exactly equivalent to declaring ``int a[10];'': int *a; a = (int *)malloc( 10 * sizeof(int) ); - with the difference that we can put any expression (not just a constant) in the place of the `10', to allow us to get an array whose size is variable. (`sizeof' is an operator that returns the number of bytes required to store one element of the given type.) You can get more information on malloc from Chapter 17 in King's book. * Multidimensional arrays and memory allocation in C: ================================================== - A two-dimensional array ``a[][]'' is actually stored as an "array of arrays" or equivalently, as an array of pointers. In other words, if ``int a[10][10];'' has been declared, then ``a[0]'' is actually an array of 10 ints, which is the same as a pointer to an int. * So how do we dynamically allocate a two-dimensional array? Suppose we need an array of size [n x n] (to store an adjacency matrix, for example). The first thing we need is to declare a variable to store the array: since arrays are like pointers, a two-dimensional array is like a pointer to a pointer, so we declare it like this: int **adj_matrix; Next, we need to allocate memory for the matrix. How much should we allocate? Well, remember that adj_matrix will store n entries, each one of which will have what type? Pointer to an int (meaning, "an array of ints"). So we can simply allocate it like this: adj_matrix = (int **)malloc( n * sizeof(int *) ); This only reserves room to hold each "row" of the array. Now, we need to actually allocate the room for each single element: the only way to do this is row by row, with a loop like this one: for ( row = 0; row < n; row++ ) { adj_matrix[row] = (int *)malloc( n * sizeof(int) ); } Finally, we have to initialize every entry in this new array (entries are *not* initialized by malloc): for ( row = 0; row < n; row++ ) for ( col = 0; col < n; col++ ) adj_matrix[row][col] = 0; * More on dynamic allocation: ========================== - The only problem with writing functions that create new structs is that we have to copy the entire structure to the new variable after its been created. The same thing happens with struct parameters. One way around this is to use *pointers* to structs instead of the structs themselves. - We also need pointers to structs when dealing with linked lists. How do we access the members of a struct when we only have a pointer to it? There are two ways. We can use indirection, just like with any other pointer, or we can use the "->" operator. For example, if we have a pointer ``struct node node1;'' (where the struct is defined as above), we can access node1's key either with ``(*node1).key '' or with ``node1->key''. The second form is usually preferred (and it is typed just this way: '-' immediately followed by '>'). - Now, suppose we're writing functions to create linked lists and manipulate them, inserting new elements or deleting them. Obviously, we don't know ahead of time how many elements we'll need, so we cannot declare all the nodes before the program is started. And unlike, Java, we don't have a "new" operator we can call to create new structures at run-time... or do we? - In C, there are two main functions used to work with "dynamically" allocated memory (i.e., memory allocated at run-time). They are declared in : `malloc' and `free'. - `malloc' takes as argument the number of bytes to reserve in memory, and returns a "generic pointer" (type ``void *'') to the bloc of memory allocated. It has to return a generic pointer because it doesn't know what kind of data structure you want to allocate. There are a few issues to address here: Q: How do you know how many bytes to reserve if you want to allocate space for a struct, for example? A: Use the "sizeof" operator. sizeof(type_name) returns the number of bytes required to store one element of type "type_name". So in our example of linked lists, the call ``malloc(sizeof(struct node));'' will allocate the right amount of memory to store one node. Q: What do you do with this "generic" pointer that's returned by malloc? A: You need to cast it explicitly to the type of data you expect to get back, and assign it to some pointer variable. In our linked list example again, if we had already declared a variable as follows: ``struct node *first;'', then we could write first = (struct node *)malloc(sizeof(struct node)); Q: What if `malloc' is unable to allocate the memory? How do you find out? A: If malloc is unable to allocate the memory, it returns a special value NULL (a symbolic constant defined in the header file and also in ). Just testing against this value will let you determine if the allocation was successful. So in our example again, we would normally have code like this: struct node *first; first = (struct node *)malloc(sizeof(struct node)); if ( first == NULL ) { /* allocation failed: take appropriate action... */ } /*-if-*/ - `malloc' can be used to allocate room for *any* type of data, not just structs: we can use it for arrays, even for basic data types like int, char, etc. - Remember that the memory allocated by `malloc' is *not* initialized in any way. - MAJOR DIFFERENCE FROM JAVA: in C, memory that's allocated dynamically is NOT automatically reclaimed when it's not being used anymore. THERE IS NO "GARBAGE COLLECTOR" IN C! What this means is that you need to keep track yourself of all blocks of memory that were dynamically allocated and make sure to "recycle" them explicitly. This is done with the `free' function: free takes as argument a pointer to some block of dynamically allocated memory and makes this memory available again for the program to allocate anew. WARNING: trying to free a value that was not obtained from one of the dynamic allocation functions, or trying to use a pointer after the memory it points to has been freed can lead to unpredictable behaviour. - READ section 17.5 of King's book (on linked lists) to see how linked lists are implemented in C, and how the memory allocation functions malloc and free are used in practice. * Makefiles and multifile programs in C: ===================================== - All this can be found in King's book, Chapter 15. - When a program gets large (i.e., it has many functions), it helps to organize it by splitting it into many source files such that related functions are put in the same file. How do we structure such a program? First, put all the functions of the program into related groups. Each group will correspond to one file (e.g., in A2, we have three groups: graph-related functions, search-related functions, and the main part of the program). Each "group" of functions will consist of some functions that are used in the rest of the program, and some "internal" functions useful only to the other functions inside the group (this should sound familiar: it's a lot like the distinction between public and private methods of a class). For each source file, there should also be a corresponding header file (ending with ".h") that contains the declarations of data types and functions that are meant to be used in the rest of the program (this is what we've done in A2). These header files can then be included where they are needed to make sure the functions are properly declared before being used. Doing things this way makes it easy to change the function later: we only need to change the source file and the header file, without worrying about all the places where the function is used. - How do we compile a program consisting of many files? For example, if we have "graph.c", "search.c", and "main.c" that should all be combined somehow into one executable file... It turns out that gcc has options that allow us to do this, for instance: % gcc -c search.c will compile "search.c" but only create an object file "search.o" from it (it won't link it with system libraries to create an executable). So, we could compile the entire program like this: % gcc -c search.c % gcc -c graph.c % gcc -c main.c at which point we have three object files "search.o", "graph.o", and "main.o". But how do we create one executable file from these three object files? Simply call the compiler again to link all the files together (and with the standard library functions), like this: % gcc -o search search.o graph.o main.o - Sounds simple enough, but what if you have a program that consists of 50 different source files? Every time you make a change, you have to remember which files have changed so that you can recompile them into an object file and link the object files together into an executable... Not a fun thing to do by hand! - Luckily for us, there is a nifty UNIX program that can take care of all these details automatically: it's called `make'. `make' works by reading information from a "makefile" that tells it something about the structure of the program (i.e., which files depend on which others), and how to build the executable. Makefiles can be quite complex, but the basic building blocks are: - `macro' definitions of the form NAME = definition allow us to define "variables" and give them a value. The value of such a "variable" can later be accessed as $(NAME). - `dependencies' have the form file: file1 file2 ... and indicate that the first file "depends on" file1, file2, ... in the sense that if file1 is changed or file2 is changed or ..., or if "file" doesn't exist, then "file" needs to be compiled. "file" is called the "target" of the dependency. - `rules' have the form command-line1 command-line2 ... where literaly means a TAB character, and the command-lines specify how a particular file is to be compiled. - For example, let's look at the Makefile for A2: CC = gcc SRC = graph.c prim.c main.c output.c OBJ = $(SRC:.c=.o) graph.o: graph.h graph.c $(CC) -c graph.c prim.o: prim.c $(CC) -c prim.c output.o: output.c $(CC) -c output.c prim : $(OBJ) main.c $(CC) -o prim $(OBJ) clean: rm $(OBJ) prim The first three lines are macro definitions. Some of these macros are "standard" macros, e.g., "CC" specifies the name of the compiler. We may have also "CFLAGS" that specifies the command-line options to give to the compiler. The three following lines simply list dependencies for the three source files in the program. `make' know how to compile these files; it uses a set of "implicit" rules for source files that end in ".c": compile using ``$(CC) -c''. Following the macros are dependencies and rules. The following dependency tells `make' that the file "prim" depends on the object files defined in the macro "OBJS", so that if any of the object files change, or if the file named "prim" doesn't exist, "prim" will be recompiled. The next line is a rule that tells make how to compile "prim" from the object files. The last two lines deserve special mention: the name "clean" isn't actually the name of a file, and there is no rule in the Makefile that will ever create a file named "clean", so when you type ``make clean'', the following rule will always be executed: it's designed to remove all object files from the directory, leaving only the source and header files. One last thing that's important about Makefiles: if you type just ``make'', how does make know which target to look at? By default, it always looks at the first target in the Makefile, in this case "prim".