Programming in C: Make and Split

0x00pf · August 13, 2016, 4:47pm

At some point, your C program is gonna grow. Maintaining all that code in just one file is painful. Even more, you may be using different libraries and typing all those flags every time you want to compile your program also becomes painful. When you get to that point, is time to Make and Split.

Hello World

We will use a simple hello world program to introduce the basics on how to build makefiles and how to split your program in different files to keep it tidy and easy to maintain. Let’s start with some code:

#include <stdio.h>

int
greetings (char *str)
{
  printf ("Hello %s\n", str);

  return 0;
}

int
main (int argc, char *argv[])
{
  greetings ("World!");

  return 0;
}

This case is very simple and you can just compile it typing make or something like gcc -o helloworld helloword.c. However, and for educational purposes we are going to write our own Makefile, to build this simple program.

A Makefile

A Makefile for our example may look like this:

helloworld: helloworld.c
	${CC} ${CFLAGS} -o $@ $<

.PHONY: clean
clean:
	rm helloworld

Makefiles are usually named Makefile. If you just run make, the tool will look for that file in the current directory. If you want to give it some fancy name, then you need to use the -f flag followed by the name you chose make -f fancy_make_file.mk.

OK, the first thing you need to know about Makefiles is that they are a list of rules describing how to produce a file from another file or set of files. Rules follow the structure depicted below:

target: dependencies
<TAB>command to run

Let’s take a closest to the first rule in our Makefile to understand how does those rules work:

target. Is the name of the file you want to generate. In this case it is helloworld. This is the name of the executable we want to generate.
dependencies. This is the list of files required to generate the file specified by the target. Whenever one of those dependencies change (the files are modified) make will know that something has changed and the rule will be fired again. If none have changed the make will just let you know that there is nothing to do. In our example, whenever we change helloworld.c and we run make, the commands below will be executed.
command to run. If the target file does not exist, or any of the dependencies have been changed, the commands specified in this block will be executed. This is usually a compilation command that generated the targetfile. Let’s look in detail how we build the compilation command and why we did it like that. All those commands have to be prefixed with a <TAB> character… a bunch of spaces will not work.

Environmental variables

The compilation command in our helloworld rule, makes use of some default environment variables and some make built-in variables.

The CC variable is a standard way to specify the default compiler to use. CC stands for C compiler. This variable is usually unset and in that case, make will use the default c compiler for a UNIX system that is named cc. For modern GNU/Linux systems, this is usually a link to gcc maybe through the alternatives system.
The CFLAGS is also a standard variable used to specify the compiler flags we want to use to build our executable

You may be wandering what does standard means in the bullets above. Well, let’s explain this. make provides some implicit rules to make our lives easy. This rules are the ones that let us run make helloworld to just compile our program. Let’s see:

$ rm helloworld; make helloworld
cc  -o helloworld helloworld.c

As we can see, make uses as default compiler cc and it also uses the content of the variable CFLAGS to set the compiler options… well in this case the variable is empty so we just see an extra space. Let’s put something on that variable:

$ rm helloworld; CFLAGS="-static" make helloworld
cc -static -o helloworld helloworld.c

Voila!, we have built a static binary just setting the CFLAGS variable. Try to change CC to gcc. Or even better, to arm-linux-gnueabi-gcc. Wow, you are cross-compiling your program now…

When we write our own rules in a makefile this default behaviour is overwritten, and if we want to keep it (and we should) we have to manually add these variables to the rule. That is why we use:

${CC} ${CFLAGS} -o $@ $<

instead of

gcc -static -o helloworld helloworld.c

The second is a valid rule. You can just add it to the makefile and it will work. However it does not really take advantage of the make tool.

Make Build-in Variables

make is a powerful tool and it defines some internal variables and also pattern specific variable. In our simple example we are using two of those variables:

$@. This variable represents that current target associated to the rule. In our example it is the same that typing helloworld, the name of the binary. The advantage of this is that you can easily change the name of your binary in just one place
$<. This variable represents the file just after the colon, which is also a convenient way to get rid of some keystrokes.

PHONY commands

In general, make starts scanning the Makefile and firing all the rules it found in it whenever the target does not exist or the dependencies have changed since the last build (the modification time of the dependencies is posterior to the modification time of the target).

However, we will want to fire some rules manually. Two classical examples of those rules are clean and install. We do not want these rules to be automatically fired and for those cases we use the so-called .PHONY: targets. To fire those rules we have to explicitly indicate that in the command-line. For instance, to run our clean rule, that deletes the binary we should run:

make clean

In this example we just delete the executable. In a bigger project you may need to also delete intermediate object files, libraries,…

Well, this is a pretty basic introduction to the make tool. In most cases it is enough for working with small projects. For bigger projects you better use a build system like GNU autotools or CMake.

Splitting

So, to finish with this introduction on how to go from small demo programs to mini projects ;), we need to know the basics on how to split our program in pieces to easily deal with it. Again, we are going to split this minimal hello world program in pieces just for education purposes. In general, files above a few thousands of lines should be splited but, as usual, at the end, this is a bit of a personal taste.

We are going to move our greetings function into a separate file, create a header file to be able to properly access the function and change our Makefile to compile all together.

Let’s start moving the function into a new file called greet.c.

#include <stdio.h>
#include "greet.h"

int
greetings (const char *str)
{
  printf ("Hello %s\n", str);

  return 0;
}

As you can see, we have to keep the stdio.h include in this file because the code is using printf (stdio.h contains the definition for that function). We are also adding a new header for our component/module. Actually in this case it is not necessary, but in general, if you are chopping off a big program, you surely will need to include your specific header also in the .c file (the implementation).

In this case we are using quotes to include greet.h because we do not want this file to be installed in the system. We will just use it during the compilation of our program, and it will not be used anywhere else. Using the quotes instructs the pre-processor to look for the file in the current folder, instead of in the system folder (/usr/include).

The Interface

Now we have a C file with the code we want to split, but we need an interface so this code can be invoked from other files. This is what the header files are for (among other things). So, let’s write a basic general header file for our greetings function:

#ifndef 0x00sec_greet_h
#define 0x00sec_greet_h

#ifdef __cplusplus
extern "C" {
#endif

int greetings (const char *str);

#ifdef __cplusplus
}
#endif

#endif

Here we see quite some pre-processor stuff. The first part of the header file is intended to avoid multiple inclusions of the file.

So, the header file instructs the pre-processor to check if the macro 0x00sec_greet_h has been defined. If it is defined, meaning that this file has already been included, then the whole file is skipped. There is no need to process it again. If it is not defined then, the file is processed, and the first thing it does it to define the 0x00sec_greet_h macro, so future includes will be discarded.

Towards the middle of the file we see the prototype of our function, the one we are moving into a different file. This is all we need in this simple case. In a real project you may have quite some functions defined here as well as a bunch of data types used by those functions. Whenever you have to do this, you will know what has to be included here.

As you can see, a C prototype is just the function definition without the code.

The main program

The main program will now look like this:

#include "greet.h"

int
main (int argc, char *argv[])
{
  greetings ("World!");

  return 0;
}

Now, we do not need stdio.h anymore, as we are not calling printf from here (in this example). We need our new greet.h that contains the definition of the greetings function we are calling from main.

Now you can compile the program with

gcc -o hello hello.c greet.c

Or change your current Makefile

Improving our Makefile

If you had tried to change your makefile, you may have noticed that the $< will only add the first file in the dependencies list of the rule. However now we have two files. One way to solve this is to use a variable in our Makefile to include all the source files you want to use. Something like this:

SRC= hello.c greet.c

hello1: $(SRC)
        ${CC} ${CFLAGS} -o $@ ${SRC}

This works but, unfortunately is not the best way to do it. I will leave the good Makefile for you to try or maybe for the comments, because there is still one thing we have to discuss and this post is already quite long.

Using C code on C++ application

You may have noted that I have skipped a couple of pre-processor commands in our greet.h file. I reserved those for the very end. Those lines are intended to merge C and C++ code. This is a bit difficult to explain, but I will try my best. I will get the Makefile out of the scene now so we focus on what is going on with the pre-processor.

First we will recompile our simple program using g++ instead of gcc.

$ g++ -o hellocpp hello.c greet.c
$ nm hellocpp | grep greet
000000000040054d T greetings

We see that our function gets into the binary with the proper name greetings. Now, let’s remove the extra pre-processor lines from greet.h. It will look like this now:

#ifndef GREET_HEADER
#define GREET_HEADER

int greetings (const char *str);

#endif

And if we try to recompile our program:

$ g++ -o hellocpp hello.c greet.c
/tmp/ccENSxa4.o: In function `main':
hello.c:(.text+0x15): undefined reference to `greetings(char*)'
collect2: error: ld returned 1 exit status

The compiler cannot find the function now. Let’s see what is going on, just compiling the main program but without linking it to avoid the linking error:

$ g++ -c -o hellocpp.o hello.c
$ nm hellocpp.o | grep greet
                 U _Z9greetingsPKc

Yes, that is a pretty strange name for our greeting function. Those characters around our function name are named signatures, and this is the way, C++ provide parametric polymorphism among other things. In other words, this is why you can define methods in C++ with the same name, but different list of parameters. This is also the basics for the RTTI (Run-Time Type Identification) and other fancy things C++ can do.

We could talk a lot about this, its impact on the ABI issues with C++ from years and more… but I will not bother you with this now.

The missing preprocessor lines

I will reproduce here the relevant part of the header file for the reader’s convenience.

#ifdef __cplusplus
extern "C" {
#endif

int greetings (char *str);

#ifdef __cplusplus
}
#endif

The pre-processor lines above check if the __cplusplus macro is defined. This macro is defined by the C++ compiler precisely to allow this kind of definitions. This is the way, for a source code to know if it is being compiled by a C or by a C++ compiler. So, if we are using a C++ compiler, we will add the extern "C" string before the function prototype. This will tell the C++ compiler that the function has been written in C and no signature is needed to access it.

You may be wondering why you should care about this. Well, many important libraries in the system are written in C. When you code in C++ and you need to use that library, you will be facing this problem, if the library headers are not properly defined. So if you write some C code (sometimes this is the only way) and you want your C++ mates to use your code, you should add those lines to your header.

Conclusions

I would say this is the very basics to use make and split your code in different files. There are a lot of other things to learn from here, but I believe that from this point on you can do it yourself. Just RTFM.

https://www.gnu.org/software/make/manual/html_node/index.html#SEC_Contents

To finish as an example to show you that all this is pretty common, take a look for instance to the beginning of /usr/include/pcap/pcap.h.

Hack Fun!

Cromical · August 13, 2016, 8:54pm

Thanks for the tut! It was very helpful.

oaktree · August 13, 2016, 9:23pm

Great stuff. I’ll be implementing this soon in my bot project – no doubt.

EDIT: Done! Worked like a charm! Thanks @0x00pf.

0x00pf · August 14, 2016, 8:30am

Glad to hear it was useful!

system · January 21, 2018, 12:31am

This topic was automatically closed after 30 days. New replies are no longer allowed.