Article | Talk | Edit | History  

CE - Intro to programming/Programming concepts/pre-processor

From the World Wide Wiki

"Well I've been from shore to shore, my friend,

     And neither of them shine.
 All you get is what you build yourself,
     And building takes some time."
          -Evan Greer, "Boston to Ashville"


Jump to: navigation, search
Computer Engineering
Index of articles in the Computer Engineering Curriculum
Prereqs
*Science prereqs
*Calc I - derivatives and intergrals
* Electrostatics
100 level
*Intro to computer engineering
*Intro to programming
*Intro to electricty
*Calc II - limits and series
200 level
*Linear circuits
*Intro to digital logic
*Intro to Object Oriented Programming with Java
300 level
*Computer architecture
*Intro to electronic devices
*Programming in C and C++
400 level
*Embedded systems
*Networks
*Programming Data Structures and Algorithms
*Signal processing
Electives
*Additional topics in computer programming

This article is part of a series of articles intending to offer a curriculum of Computer Engineering. For information, please see Category:Computer engineering curriculum.

One last concept to re-address before we get back to programming, and that's the pre-processor. We already say this when we put #include <stdio> in our example HelloWorld program. The pre-processor is just another tool in the tool chain which goes through and does some text-based substitutions in your source file before the compiler gets to it. This is mostly just to help you stay organized and make it so you don't have to type so much.

The pre-processor processes things called directives. Some of the most common directives are #include, #define, and #ifdef.

Contents

#include

This #include directive should be followed by the path to a header filer. We'll get into details about these later, but header files basically provide at least the declarations of functions, variables, and other things, so that you can use them in your code without the compiler complaining that it doesn't know what they are. The #include directive tells the pre-processor to take the contents of the specified header file and insert it directly in place in your source file. Now remember, like I said before, this doesn't affect your actual file, just the virtual copy of your file that the compiler will process. Your source file remain intact.

There are two ways to specify a file for inclusion with the #include directive. The first is the one we've seen above where you wrap the path in angle brackets (< and >). These angle brackets tell the compiler to go to the system's standard includes directory and then look for the specified file. The location of the standard includes directory is system dependent, and your build tools probably accept a command line argument to specify an alternate path. The basic concept is that these are widely used header files that a lot of different programs will want to use. For example, the header files that declare the functions you use to interface with the operating system are usually stored here.

The other way is by wrapping the name in standard double quotes (" and "). This tells the compiler that the path for the include file is relative to the current directory. This is when you start writing your own header files which you probably store in the same directory as your source files.

For either quoted files, or angle bracketed files, the path can include directories using the standard Unix path notation. That is, directories are separated with a backslash (/). So if, for instance, in your system header directory, there's a folder called math, and inside that folder there's a header file called angles.h (.h is the standard extension for header files), then you could include this file with:

#include <math/angles.h> 

The same works for quoted paths.

#define

The #define directive (lovingly called "pound-define") is a very dangerous and horribly overused directive. But it's very common and can be useful, even powerful, if used correctly. Essentially what it does is to define a codeword for the pre-processor and a value for the code word. Now, anywhere in your code that it sees the codeword, even if it's in the middle of another word, it's going to replace it with the given value.

Constants

Now why would anyone want to do this? Well, it's used for constant values a lot. For instance, a very common #define is:

#define PI 3.141592653 

Now anywhere you need to use the value of pi in your code, you would just type PI, instead of typing all those digits. For instance, you might have some code that looks like this:

#define PI 3.141592653
 
double circumference(double radius){
     return 2*radius*PI;
}

Which is a function that calculates the circumference of a circle, given it's radius. (Note that double is a primitive numerical type used for numbers that aren't integers. We'll discuss this more later). When the pre-processor is done with this code and passes it on to the compiler, the compiler will see it like this:

double circumference(double radius){
     return 2*radius*3.141592653;
}

The #define'd value is substituted in directly. Don't make the mistake of thinking that PI is a variable, though. Remember, variables store values, and are replaced by their actual value at run time. #defines "store" source code and are replaced before compile time. In this case, the compiler would actually see 3.141592653 as a literal.

One of the biggest problems people run into with this is the no-holds-barred attack of the pre-processor. Imagine the following code, similar to our HelloWorld program, but with a pre-processor #define for PI:

#include <stdio>
 
#define PI 3.141592653
 
int main(){
     printf("CODING IS BORING, HACKING IS HAPPINESS!");
     return 0;
}

Go ahead if you want, code that up, build it, and run it. Guess what's going to print to your screen?

CODING IS BORING, HACKING IS HAP3.141592653NESS!

The pre-processor has replaced the PI in HAPPINESS with exactly what you told it to replace PI with, a numerical approximation of pi. It's one thing when it happens in a message you're printing, but it's a whole other issue when the unanticipated replacement happens in the middle of your code. Imagine a variable named PINENUTS. That's a valid variable name, but 3.141592653NENUTS isn't. The compiler's going to complain about that, and chances are it's going to be real difficult to track down since that doesn't actually appear anywhere in your source code, only in the post-pre-processed (sorry about that) code.

Not-so-constants

So you might be thinking you'd rather just bite the bullet and type out 3.141592653 every time and not even worry about the preprocessor. So what about if the value of pi is going to change? Okay, so pi probably won't be changing anytime soon. But what if, instead of pi, you had a high-precision (meaning lots of digits) number that corresponded to some real-life value, say the angle of the trusses on a bridge. You've decided to play it safe and type out 57.6431576 every time you need to in your code. Then the civil engineer comes back to you and says they had to revise the plans, the angle is not 54.1755632 degrees. Are you going to go back and change all those values? Because if you had just carefully used a #define instead, you would only have to change it once.

Now there are actually still better alternatives than that (like using a variable) for the example above, but there are times when a #define is very useful. The key is to just be careful about it. For one thing, #define codewords should pretty much always be all uppercase, and you should avoid using all uppercase elsewhere in your code. Second, don't use #defines like PI, try something less likely to show up unexpectedly like __PI or even __PI__. Other than that, just be careful.

Macros

I'm only touching on this briefly, because it's a little more complicated than we need to be just now. The #define directive can also be used to create macros which allow you to do amazingly sinister things with your code. Very loosely, macros are to functions as #define substitutions are to variables. A macro is a #define that takes arguments, and then substitutes some in some source code which contains those arguments. An example of a macro is one that computes the hypotenuse of a rihgt triangle, given the lengths of the other two legs:

#define HYPOTENUSE(A,B) (sqrt(A*A + B*B)) 

We're assuming that sqrt is an actual function defined somewhere in our code. This macro might be used as follows:

int x;
int y;
int h;
 
x = 7;
y = 10;
h = HYPOTENUSE(x, y);

After pre-processing, the compiler would get code that looks like this:

int x;
int y;
int h;
 
x = 7;
y = 10;
h = (sqrt(x*x + y*y));

Again, it's just a straight text substitution of source code, the pre-processor doesn't know or care anything about types or values. The following would run through the pre-processor just as well, even though the compiler's going to choke on it:

int x;
int y;
int h;
 
x = 7;
y = 10;
h = HYPOTENUSE(Silver-Bells, Turtle-Doves);

Turns into the following invalid code:

int x;
int y;
int h;
 
x = 7;
y = 10;
h = (sqrt(Silver-Bells*Silver-Bells + Turtle-Doves*Turtle-Doves));

Of course if Silver, Bells, Tutle, and Doves were all declared numerical values, than that would compiled fine, but definitely not give you what you might expect from the unprocessed code.

preprocessor flags

The final usage of #define we're going to talk about is as a pre-processor flag. In this usage, you don't actually define a value for the codeword, you just define a codeword. This usage doesn't do anything directly, but the pre-processor will remember for the duration of your code, that this codeword has been defined. We'll see how this is useful with the next directive


#ifdef

The next directive goes along with the last one (#define), and is very useful with the first one (#include). This is the #ifdef directive. This directive, and some of it's friends, allow you do conditionally include or exclude source code from going to the compiler. What this directive actually means is "If a certain codeword has been #defined, then include this source code...". The codeword you want to test for is specified in plain text after the directive, and the directive #endif is used to terminate the block of source code that is conditionally included. It looks like this:

#ifdef PI
 
//this code will only be included if we've #define'd the symbol PI
#define CIRCUMFERENCE(R) (2*R*PI)
 
int x = 2*PI;
int r = 10;
int c = CIRCUMFERENCE(r);
 
#endif
 
//This code gets included no matter what
int z = 17;

Now the above example is plain stupid, but it illustrates the syntax. More commonly, #ifdef is used with a pre-processor flag, as described in the last section. That is, a #define with no associated value. The only purpose for such a thing is to act as a flag for #ifdef and it's friends.

A more realistic and quite common example is for use with debugging (i.e., finding problems in your code). Let's say you're coding a rather complex algorithm and it's not giving you the value you expect. To see where the problem is, you want to see what various values are at different points in the algorithm, so you print them to the screen using our old friend printf. But you only want to do this when you're debugging, you don't want other people using your code to have their screens littered with the output. So you use the #ifdef directive to conditionally include or exclude the print statements based on a pre-processor flag called DEBUG. It looks like this:

#define DEBUG
 
int myAlgorithm(int x){
 
#ifdef DEBUG
     //print x if we're debugging
     //Never mind about the %d and \n, that's just the syntax for printing a number (and a linebreak)
     // with printf. We'll talk about that a while later.
     printf("%d\n", x);
#endif
 
     int s = x + 15;
     int s2 = x / 4.71;
 
#ifdef DEBUG
     //Maybe this is where we're getting an error, see what the value is here
     printf("%d\n", s2);
#endif
 
     // etc....
 
    return s;
 
}

So if we want to debug, we leave the code just as it is, and the compiler gets it looking like this:

 
int myAlgorithm(int x){
 
     //print x if we're debugging
     //Never mind about the %d and \n, that's just the syntax for printing a number (and a linebreak)
     // with printf. We'll talk about that a while later.
     printf("%d\n", x);
 
     int s = x + 15;
     int s2 = x / 4.71;
 
     //Maybe this is where we're getting an error, see what the value is here
     printf("%d\n", s2);
 
     // etc....
 
    return s;
 
}

If we figure out what the problem is and fix it and no longer need to debug, then we simply get rid of the DEBUG #define, and the compiler gets code looking like this:

 
int myAlgorithm(int x){
 
 
     int s = x + 15;
     int s2 = x / 4.71;
 
 
     // etc....
 
    return s;
 
}

Note that we can also comment out the #define DEBUG, instead of actually deleting it. The pre-processor skips over directives in comments just like the compiler skips comments. This is probably a better idea in case it turns out something is wrong, and you need to go back to debugging again. The syntax for comments is the same; a double back slash (//) starts a comment that extends to the end of the line.

#ifndef and Header files

A friend of #ifdef is #ifndef, notice the n. This is just the opposite of #ifdef, it only includes things if the given codeword is not defined. This is widely used when writing header files, which we talked about with the #include directive. The issue is when you are trying to compile together multiple source files that each #include the same header file. If the header file is included more than once, than anything declared in it will be declared more than once, which is illegal in C++, and the compiler will tell you so (very cryptically, usually). It's a common problem, too, because each source file may need functions of other things declared in a common header file. Even worse is header files can include other header files, so even if you two source files don't directly #include any of the same header files, the files they include (or the ones they include, etc...) might.

The way to avoid problems here is to use a preprocessor flag to keep track of whether or not a files has been included, and then only include it if it hasn't. Alternatively, and more simply, we can wrap the contents of the entire header file in a #ifndef and a #endif so that if it's already been included, it will still be included again, but it will be completely empty this time. The common style looks something like this:

//Only include this header file if this flag isn't set
#ifndef MY_HEADER_FILE_INCLUDED
 
//Set the flag the first time it's included, so we don't include it again.
#define MY_HEADER_FILE_INCLUDED
 
// This is where all the actual contents of the header file goes...
 
//At the end of the file, we have to terminate the #ifndef
#endif
 

Note that the codeword for the flag, MY_HEADER_FILE_INCLUDED, in this case, should include the actual name of the header file in it, so that each header file has a different flag.