Apple Developer Connection
Advanced Search
Member Login Log In | Not a Member? Contact ADC

The GNU Compiler
			Collection on Mac OS X

This article provides an overview of the GNU Compiler Collection (GCC) from the Free Software Foundation and its use on Mac OS X. GCC is a free software project that has been used for many years to create software for UNIX and other platforms. Apple's developer tool Xcode (and previously, Project Builder) uses GCC under the hood for building executable images from source code. The GNU Debugger (GDB), a companion to GCC, comprises the foundation of the Xcode debugger.

The examples presented here are command-line driven for several reasons:

  1. Many UNIX developers are comfortable with command-line based tools.
  2. Command-line tools provide a least common denominator for developers working on multiple platforms (including UNIX variants).
  3. Many development projects are already set up with makefiles that simplify the build process. Rewriting those as Xcode projects may be time consuming and non-portable.
  4. GCC is widely used in universities, and students new to Mac OS X programming will find the implementation on Mac OS X works very similarly to other platforms, with the addition of Apple-specific options.

How GCC Works

GCC consists of a set of tools that may be invoked using a single command plus options. True to the UNIX philosophy, the GCC tools perform specialized functions but work in harmony, passing the output from one tool as input to the next.

GCC invokes these tools in a series of stages: compiler, assembler, and linker.

  1. The compiler preprocesses, parses and analyzes source code; the output is a set of assembly language instructions. Preprocessing, which expands directives such as #include and #define, is performed by the compiler and not by a separate program. This is a change from previous versions of GCC.
  2. The assembler produces object files from the assembly input. In the examples in this article the assembler stage is included implicitly.
  3. The linker transforms the object code into an executable image.

Source and header files may be any of several types, most commonly .c, .cc, .cp, .m, .mm, and .h on Mac OS X. GCC supports a variety of languages and extensions denoting source files, precompiled headers, source files not requiring preprocessing, and so on. Refer to the GCC manual for more information. A link is provided at the end of this article.

Apple's version of GCC is based on standard GCC releases and adds features that support Mac OS X. Some of these features get folded in to the standard releases. Since GCC is an open source project it depends on its developer community for enhancements and fixes. Developers are encouraged to participate so that GCC can continue to evolve to support new languages, new processors, better optimization techniques, and so on.

A Simple Example

The Carbon Example (52KB) illustrates compiling, linking, and running a simple application that has no user interface but does use a timer task. It installs the task in the Time Manager queue. Every second the task triggers and invokes a TimerUPP, or callback function, defined in our source code. The callback simply logs the current time to stdout.

The #include statement at the top of the file pulls in all other frameworks and headers used by the CoreServices framework. The #include syntax is <frameworkName/headerFileName.h>. For example, Mac OS X Time Manager tasks are handled by functions declared in Timer.h in the CarbonCore framework. Since CoreServices is an umbrella framework, its single include file CoreServices.h includes CarbonCore/CarbonCore.h, which includes CarbonCore/Timer.h and other appropriate headers. Put another way:

CoreServices/CoreServices.h includes CarbonCore/CarbonCore.h includes CarbonCore/Timer.h

This eases the burden on the programmer: rather than hunt down individual header files, you only need to include one framework/header combination. You can include other frameworks as needed.

The top of the file contains declarations for several global variables, including the timer proc that gets installed in the Time Manager queue. Once the application runs its course (five iterations), it disposes of the timer proc rather than leave it installed.

main first calls an init function to setup the timer task, then loops waiting for a flag to set. After disposing of the timer proc, main returns.

The timer proc (MyTimerProc) prints the current time and increments the counter. If the counter is below its limit, MyTimerProc re-primes the timer task, otherwise it sets the done flag.

MyInit sets up the timer task, installs it, then primes it the first time.

You could write this application to instead provide a user interface with windows, menus, and so on. Simply include the appropriate frameworks in the source code files.

#include <CoreServices/CoreServices.h>

void MyInit( void );
void MyTimerProc( TMTaskPtr tmTaskPtr );

Boolean gQuitFlag = false;
int gCount = 0;
TimerUPP gMyTimerProc = NULL;

int main( int argc, char *argv[])

    while ( false == gQuitFlag ) {

    DisposeTimerUPP( gMyTimerProc );

    return 0;
void MyTimerProc( TMTaskPtr tmTaskPtr )
    DateTimeRec localDateTime;
    GetTime( &localDateTime );

    printf( "MyTimerProc at %d:%d:%d\n", localDateTime.hour, 
        localDateTime.minute, localDateTime.second );


    if ( gCount > 4 )
        gQuitFlag = true;
        PrimeTimeTask( ( QElemPtr ) tmTaskPtr, 1000 );

void MyInit( void )
    struct TMTask myTask;
    OSErr err = 0;

    gMyTimerProc = NewTimerUPP( MyTimerProc );

    if ( gMyTimerProc != NULL )
        myTask.qLink = NULL;
        myTask.qType = 0;
        myTask.tmAddr = gMyTimerProc;
        myTask.tmCount = 0;
        myTask.tmWakeUp = 0;
        myTask.tmReserved = 0;
        err = InstallTimeTask( ( QElemPtr )&myTask );
        if ( err == noErr )
            PrimeTimeTask( ( QElemPtr )&myTask, 1000 );
        else {
            DisposeTimerUPP( gMyTimerProc );
            gMyTimerProc = NULL;
            gQuitFlag = true;

Mac OS X frameworks allow you to add system features and user interface capabilities to your applications. The frameworks are arranged hierarchically, so you only need to include in your source code the top-level framework of interest, rather than sub-frameworks underneath. When invoking GCC, the -framework linker option may be fed to the compiler, which will pass the option to the linker. The following GCC invocation compiles test.c, links against the CoreServices framework, and generates the executable output file test, as specified by the -o flag.

% gcc -framework CoreServices -o test test.c

To run the executable, invoke it by name. This example runs the file test in the current directory. The "./" specifies that the path to the command starts in the current directory, and "test" is the name of the file to execute. The output appears in the Terminal window.

% ./test
MyTimerProc at 16:41:27
MyTimerProc at 16:41:28
MyTimerProc at 16:41:29
MyTimerProc at 16:41:30
MyTimerProc at 16:41:31

Useful Flags

Here are brief descriptions of several GCC command-line options:

  • The verbose flag (-v) displays details of each command executed by GCC.

    % gcc -v
    Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
    Thread model: posix
    gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)

    This example adds -v to the Carbon build example. Note the options passed to the linker as part of the ld invocation, near the bottom of the listing.

    % gcc -v -framework CoreServices -o test test.c
    Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
    Thread model: posix
    gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
     /usr/libexec/gcc/darwin/ppc/3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 
        -D__GNUC_PATCHLEVEL__=0 -D__APPLE_CC__=1640 -D__DYNAMIC__ test.c -fPIC -quiet 
        -dumpbase test.c -auxbase test -version -o /var/tmp//ccPmjEgo.s
    GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640) (ppc-darwin)
            compiled by GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640).
    GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=131072
    ignoring nonexistent directory "/usr/local/include"
    ignoring nonexistent directory "/usr/ppc-darwin/include"
    ignoring nonexistent directory "/Local/Library/Frameworks"
    #include "..." search starts here:
    #include <...> search starts here:
    End of search list.
    Framework search starts here:
    End of framework search list.
     /usr/libexec/gcc/darwin/ppc/as -arch ppc -o /var/tmp//ccpC3PmB.o 
     ld -arch ppc -dynamic -o test -lcrt1.o -lcrt2.o -L/usr/lib/gcc/darwin/3.3 
        -L/usr/lib/gcc/darwin -L/usr/libexec/gcc/darwin/ppc/3.3/../../.. 
        -framework CoreServices /var/tmp//ccpC3PmB.o -lgcc -lSystem |
  • The following GCC invocation compiles test.c but does not invoke the linker; the -c flag stops the process after compilation.

    % gcc -c test.c

    The output file by default will be named test.o. If you need a different file, then specify the -o flag followed by a file name.

    If you choose to stop the build process after compilation, you can then link by invoking the compiler again but allowing it to continue after the compilation step. This is the recommended approach. If you invoke the linker manually you may spend a lot of time getting the linker flags correct.

  • The -g flag instructs GCC to include debug info for use when running GDB.

    % gcc -g -c test.c
  • Warning flags, including -W and the pickier -Wall, display warnings regarding code that is not technically in error, but that may cause problems. For example, removing the explicit cast in the call to PrimeTimeTask in the Carbon Example results in a warning:

    PrimeTimeTask( tmTaskPtr, 1000 );
    % gcc -c -W test.c
    test.c: In function `MyInit':
    test.c:63: warning: passing arg 1 of `PrimeTimeTask' from incompatible 
        pointer type
  • The -framework linker option was discussed above. This example links against the CoreServices framework, and invokes the compiler in verbose mode (-v) so you can see the generated linker call. The output file will be named test, and the input to this step is the file test.o.

    % gcc -v -framework CoreServices -o test test.o
    Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
    Thread model: posix
    gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
     ld -arch ppc -dynamic -o test -lcrt1.o -lcrt2.o -L/usr/lib/gcc/darwin/3.3 
        -L/usr/lib/gcc/darwin -L/usr/libexec/gcc/darwin/ppc/3.3/../../.. 
        -framework CoreServices test.o -lgcc -lSystem |

    To link against multiple frameworks, include each with its own -framework flag:

  • -framework CoreServices -framework Carbon
  • You can change compiler versions using gcc_select.

    sudo /usr/sbin/gcc_select <version: 2, 3 or 3.x>
    % sudo /usr/sbin/gcc_select 3.1
    Default compiler has been set to:
    Apple Computer, Inc. GCC version 1256, based on gcc version 3.1 20021003 

    The -l flag lists available compiler versions.

    % gcc_select -l
    Available compiler versions:
    2.95.2          3.1             3.3             3.3-fast

    Run gcc_select -h to view additional options.

    The compiler version matters because changes to the Application Binary Interface since Mac OS X v10.0 have rendered C++ and Objective-C++ executables incompatible with earlier releases. However, C and Objective-C programs still run the same. To build for 10.1 and earlier you must use version 2 of GCC for C++/Obj-C++ applications and kernel extensions; version 2.95.2 was the GCC final release that shipped with Mac OS X v10.1. GCC version 3.1 shipped with Mac OS X v10.2 Jaguar, and 3.3 with Mac OS X v10.3 Panther. Note that if you mix languages in the application you should rebuild using the appropriate compiler version: 2.95.2 for Mac OS X v10.0 and v10.1, 3.1 for v10.2 Jaguar, and 3.3 for v10.3 Panther.

This list is not exhaustive. Look at the man gcc pages or one of the recommended references at the end of this article for additional flags.


Makefiles help automate the build process. A makefile is typically named makefile or Makefile, and contains commands for GCC regarding various targets and their dependencies. You can change the commands in the file and, once it is working, not worry about forgetting a flag or option. This is very useful in the middle of the night when you are tired and likely to make mistakes. Since GCC command-line entries can get very long you are less likely to invoke it incorrectly.

A makefile contains a set of targets. Each target may be dependent on other targets. Each target also includes a command-line invocation preceded by a <tab> character. Here is the syntax:

# Comments begin with a '#'.
target-name: [dependency_1 dependency_2 ...]
	command [flags] input-file(s)
another-target-name: dependency
	command [flags] input-file(s)

The following example incorporates the test.c file used to generate the code optimization samples. The first target, named test, depends on the target test.obj. If the output generated by target test.obj is newer than test, or is not a file, then the make utility will run the appropriate command-line. In this case, invoke GCC and generate a file named test, using the file test.o as input.

Where does test.o come from? It is the output generated by the test.obj target. The input to target test.obj is the file test.c. You can specify an output file using the -o option, or let GCC name the output file using the pattern input-filename.o (lowercase letter 'o').

# The target named test depends on target test.obj.
# The command for target test:
#   1. invokes gcc, 
#   2. generates a file named test (no extension) as output, and
#   3. uses file test.o as input.
test: test.obj
	gcc -o test test.o
# The next target name is test.obj. That is only its name. 
# The name does not have to relate to what the target actually builds.
# This target builds object files from source.
# The command for test.obj instructs gcc to:
#   1. stop after compilation (no linking), 
#   2. print verbose output, and
#   3. use the file test.c as input.
# The default output file name here will be test.o.
test.obj: test.c
	gcc -c -v test.c
# Remove unwanted binaries, both the file named test and any .o files.
	rm test *.o
# Generate assembly files. Useful for debugging.
	gcc -v -S -O0 -o test.O0.s test.c
	gcc -v -S -O1 -o test.O1.s test.c
	gcc -v -S -O2 -o test.O2.s test.c
	gcc -v -S -O3 -o test.O3.s test.c

You invoke the make utility and specify the target, shown here from within a Terminal session. Any messages will appear in the window.

$ make test

gcc -c -v test.c
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
 /usr/libexec/gcc/darwin/ppc/3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 
    -D__GNUC_PATCHLEVEL__=0 -D__APPLE_CC__=1640 -D__DYNAMIC__ test.c -fPIC 
    -quiet -dumpbase test.c -auxbase test -version -o /var/tmp//ccmr23gL.s
GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640) (ppc-darwin)
        compiled by GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/ppc-darwin/include"
ignoring nonexistent directory "/Local/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
End of search list.
Framework search starts here:
End of framework search list.
 /usr/libexec/gcc/darwin/ppc/as -arch ppc -o test.o /var/tmp//ccmr23gL.s
gcc -o test test.o


If the make script is stored in a file named something other than makefile, pass the filename as a flag. For example, if the file was instead named, use:

make -f test

GNU Debugger

The GNU debugger (GDB) allows you to step through code, watch values, and monitor execution from the command-line. Xcode uses GDB as the basis for its debugger. If you are using GCC you will want to also learn to use GDB.

The first step is to compile your code with the -g flag: this includes GDB information in the object files. Without it you will get strange errors when you try to use GDB to run your executable.

You invoke GDB using the gdb command along with the name of the executable to debug. GDB prints a banner, after which it is ready to accept commands. The example used here is the Carbon application discussed earlier.

% gdb test
GNU gdb 5.3-20030128 (Apple version gdb-309) (Thu Dec  4 15:41:30 GMT 2003)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin".
Reading symbols for shared libraries ... done

The best thing to start with is a breakpoint. This command sets a breakpoint on line 1.

(gdb) break 1
Breakpoint 1 at 0x1bd8: file test.c, line 1.

Begin execution by typing run. GDB prints information about what it is doing, then proceeds to and stops at the breakpoint.

(gdb) run
Starting program: /ktree/test 
[Switching to thread 1 (process 1254 thread 0x1603)]
Reading symbols for shared libraries ......... done

Breakpoint 1, main (argc=795571314, argv=0x65652f74) at test.c:12
12      {

The step command steps into a function.

(gdb) step
MyInit () at test.c:47
47          OSErr err = 0;

Use the print command to view a variable value.

(gdb) print err
$1 = 0

Step over a function using next.

(gdb) next
main (argc=1, argv=0xbffffbb0) at test.c:13
13          MyInit();
(gdb) next
15          while ( false == gQuitFlag ) {

Next, set a breakpoint in the callback (line 28), then continue execution. GDB pauses when execution reaches the breakpoint. Check the date/time value. Note that GDB does its best when displaying the structure fields. The structure looks a bit strange, so check the data type of localDateTime using the whatis command. This action may be performed on any variable that is in scope, and is useful if you do not want to jump back to a code editor window and look at the source code.

(gdb) break 28
Breakpoint 2 at 0x2b8c: file test.c, line 28.
(gdb) continue
[Switching to process 1080 thread 0x1203]

Breakpoint 2, MyTimerProc (tmTaskPtr=0xbffffd10) at test.c:28
28          GetTime( &localDateTime );
(gdb) print localDateTime
$3 = {
  year = 0, 
  month = 119, 
  day = 25967, 
  hour = 7018, 
  minute = 0, 
  second = 0, 
  dayOfWeek = 0
(gdb) whatis localDateTime
type = DateTimeRec

Step through the next couple of lines and view the formatted output, which looks correct.

(gdb) step
30          printf( "MyTimerProc at %d:%d:%d\n", localDateTime.hour, 
                localDateTime.minute, localDateTime.second );
(gdb) step
MyTimerProc at 14:45:21
32          gCount++;

Use the where command to view a stack trace:

(gdb) where
#0  MyTimerProc (tmTaskPtr=0xbffffd10) at test.c:32
#1  0x902be7e8 in TimerThread ()
#2  0x900246e8 in _pthread_body ()

Change a variable value using set. This example sets gCount past its threshold value and cause the program to terminate prematurely. Notice that the source code following each line number is the next line to be executed, not the last line executed.

(gdb) print gCount
$4 = 0
(gdb) step
34          if ( gCount > 4 )
(gdb) print gCount
$5 = 1
(gdb) set gCount = 5
(gdb) step
36              gQuitFlag = true;
(gdb) continue

Program exited normally.

Here is the conventional way to stop debugging and exit GDB:

(gdb) stop
(gdb) quit


Code optimization may be performed by the programmer, the compiler, or the runtime environment. This section focuses on optimizations that GCC can perform at build time. The typical tradeoff is to choose smaller code size over faster execution speed, or vice versa. It is impossible to fully optimize for both at the same time, though GCC does its best, as do other compilers. When in doubt, it may be better to optimize for size, since smaller code may execute relatively faster. For example, large functions or loops containing data access patterns that do not exhibit a strong locality of reference may not fit into a processor's cache lines, which can lead to cache misses and subsequent fetches from memory. Smaller functions or loops with local data access stand a better chance of fitting in a given cache line and requiring fewer memory accesses.

Another reason for looking at optimization settings is because often developers debug without optimization enabled, then release an optimized version to the public. Being familiar (not necessarily intimate) with the assembly listing of your program under various optimization settings may help you determine where execution failed when you receive the occasional crash report.

Remember that optimized code typically bears little resemblance to the original source code. This makes it difficult to look at an assembly listing for an optimized program and determine the flow of control. It can be nearly impossible to to look at a crash log and determine the point in an optimized program where a problem occured, unless you have symbols included, which is not typically the case.

Unoptimized code follows the original source code directly, making it easier to debug. In fact, you will have better luck first debugging the code and then optimizing it, rather than the other way around. Trying to do both simultaneously is also a bad idea.

You can use the -S GCC option to stop the compilation process before running the assembler. The following command generates an unoptimized (level 0) output file named test.O0.s from input file test.c. You can then dissect the assembly code in the file using a text editor.

gcc -S -O0 -o test.O0.s test.c

Several of the common options are discussed here. The GCC manual contains additional information regarding optimization settings. The source code and assembly listings are available in the Optimization Example (88KB) folder.

Optimization levels and modes

  • Level 0

    Using an optimization flag of -O0 turns off optimization. This is the best setting when debugging code the first time, and maybe beyond. You should use this setting to generate a baseline build from which to start your debugging and subsequent performance analysis efforts. The machine instructions map easily to the source code, so to twist the WYSIWYG acronym a bit, "what you wrote is what you get" in the debugger. Several Level 0 examples are provided for reference in the following discussions.

  • Level 1

    GCC attempts to both reduce the code size and execution time. Only certain types of optimizations apply here. Register allocation attempts to place as many variables in registers as will fit, for faster access and fewer load/store instruction pairs.

    This function generates the accompanying machine instructions under Level 0 and Level 1:

    void arrayAssignmentLoop( void ) {
       unsigned int count = 10;
       unsigned int array[ 10 ], item = 0;
       do {
          array[ item++ ] = count;
       } while ( count > 0 );
    With no optimization (-O0) enabled, the value for count is stored and updated on the stack.
    	stmw r30,-8(r1)
    	stwu r1,-128(r1)
    	mr r30,r1
    	li r0,10
    	stw r0,32(r30)       ; count stored at 32 bytes off the SP
    	li r0,0
    	stw r0,96(r30)
    	addi r11,r30,96
    	lwz r9,0(r11)
    	mr r0,r9
    	slwi r2,r0,2
    	addi r0,r30,32
    	add r2,r2,r0
    	addi r2,r2,16
    	lwz r0,32(r30)
    	stw r0,0(r2)
    	addi r9,r9,1
    	stw r9,0(r11)
    	lwz r2,32(r30)       ; Load count into r2
    	addi r0,r2,-1        ; Decrement count
    	stw r0,32(r30)       ; Store count back on the stack
    	lwz r0,32(r30)       ; Load count for comparison
    	cmpwi cr7,r0,0
    	bne cr7,L11
    	lwz r1,0(r1)
    	lmw r30,-8(r1)
    Enabling Level 1 optimization (-O1) moves those values to registers. It eliminates the need for the variable count, loading and using the count register instead.
    	li r0,10              ; max stored in r0
    	mtctr r0              ; Move 10 to count register
                                  ; var count has been optimized away
    	li r2,0
    	addi r9,r1,-64
    	slwi r0,r2,2
    	mfctr r11
    	stwx r11,r9,r0
    	addi r2,r2,1
    	bdnz L10              ; Decrement count register and branch if not zero
  • Level 2

    GCC applies additional optimizations but excludes loop unrolling and implicit function inlining, both of which reduce execution time but increase code size. You can use the inline keyword to indicate functions that should be inlined and GCC will make a determination on whether to perform inlining. Common subexpression elimination, strength reduction, and loop optimizations are also performed. (See definitions in the following section, Specific Optimizations.)

  • For example, this source code generates the accompanying machine instructions under Level 0 and Level 2:

    unsigned int doWhileWithReturn( void ) {
       unsigned int i = 100;
       unsigned int result = 0;
       unsigned int a = 31, b = 2, c = 99;
       do {
          result += a * b;
          c = a * b;
       } while ( i-- > 0 );
       c = a * b;
       return result;

    Here is the unoptimized code (the -O0 option):

    	stmw r30,-8(r1) ; save non-volatile registers
    	stwu r1,-80(r1) ; SP update
    	mr r30,r1       ; i
    	li r0,100
    	stw r0,32(r30)
    	li r0,0         ; result
    	stw r0,36(r30)
    	li r0,31        ; a
    	stw r0,40(r30)
    	li r0,2         ; b
    	stw r0,44(r30)
    	li r0,99        ; c
    	stw r0,48(r30)
    	lwz r2,40(r30)  ; load a into r2
    	lwz r0,44(r30)  ; load b into r0
    	mullw r2,r2,r0  ; multiply a and b, store in r2
    	lwz r0,36(r30)  ; load result
    	add r0,r0,r2    ; add product to result
    	stw r0,36(r30)  ; store new value of result
    	lwz r2,40(r30)
    	lwz r0,44(r30)
    	mullw r0,r2,r0  ; multiply a and b, store in r0
    	stw r0,48(r30)  ; store new value of c
    	lwz r2,32(r30)  ; load i
    	addi r0,r2,-1   ; subtract 1 from i
    	mr r2,r0
    	stw r2,32(r30)  ; store i
    	li r0,-1
    	cmpw cr7,r2,r0  ; compare i to -1, update condition register
    	bne cr7,L16     ; loop if i > 0 (branch to label l16)
    	lwz r2,40(r30)
    	lwz r0,44(r30)
    	mullw r0,r2,r0  ; multiply a and b, store in r0
    	stw r0,48(r30)  ; store new value of c
    	lwz r0,36(r30)  ; Load result
    	mr r3,r0        ; Move result to r3
    	lwz r1,0(r1)    ; Restore SP
    	lmw r30,-8(r1)  ; Restore registers
    	blr             ; Return

    The -O2 option optimizes most of the loop and eliminates unused variables:

    	li r0,101       ; Load count register with 101
    	mtctr r0
    L21:                    ; Loop has been almost completely optimized away
    	bdnz L21        ; Decrement count register and branch to label L21 
                            ;   if not zero
    	li r3,6262      ; Load result value of 6,262 into r3
  • Level 3

    GCC applies additional optimizations including implicit inlining, or inlining of functions not marked with the keyword inline. This is a general-purpose speed optimization setting.

  • Fast

    This mode, invoked by -fast (for C and Objective-C; use -fastf for C++ and Objective-C++), packages a number of optimizations that target the G5. This mode generates faster, though probably larger, code. It will unroll loops, transpose nested loops (change the access order to improve locality of reference), convert loop initialization to memset calls, and inline library calls.

    This setting aggressively inlines functions. For example, here is a main function that, aside from a few variable assignments, simply calls other functions.

    int main( void ) {
       unsigned int   result;
       double doubleResult;
       result = doWhileWithReturn();
       printf( "doWhileWithReturn returned %d\n", result );
       doubleResult = doubleTest();
       printf( "doubleTest returned %lf\n", doubleResult );
       return 0;

    The unoptimized version calls each function:

    	mflr r0
    	stmw r30,-8(r1)
    	stw r0,8(r1)
    	stwu r1,-96(r1)
    	mr r30,r1
    	bcl 20,31,"L00000000001$pb"
    	mflr r31
    	bl L_arrayAssignmentLoop$stub  ; Branch to arrayAssignmentLoop
    	bl L_doWhileWithReturn$stub    ; Branch to doWhileWithReturn
    	mr r0,r3
    	stw r0,64(r30)
    	addis r3,r31,ha16(LC0-"L00000000001$pb")
    	la r3,lo16(LC0-"L00000000001$pb")(r3)
    	lwz r4,64(r30)
    	bl L_printf$stub
    	bl L_doubleTest$stub           ; Branch to doubleTest

    The -fast optimized version has inlined the calls to arrayAssignmentLoop and doWhileWithReturn:

    	mflr r2
    	li r3,2                ; Begin inlined and unrolled arrayAssignmentLoop
    	li r11,10
    	li r10,9
    	li r9,8
    	li r8,7
    	li r7,6
    	li r6,5
    	li r5,4
    	stw r2,8(r1)
    	stwu r1,-128(r1)
    	li r4,3
    	stw r3,96(r1)
    	stw r11,64(r1)
    	stw r10,68(r1)
    	stw r9,72(r1)
    	stw r8,76(r1)
    	stw r7,80(r1)
    	stw r6,84(r1)
    	stw r5,88(r1)
    	stw r4,92(r1)           ; End of arrayAssignmentLoop
    	li r3,1
    	li r2,99
    	stw r3,100(r1)
    	.p2align 4,,15
    L149:                           ; Top of loop for doWhileWithReturn
    	cmpwi cr0,r2,3
    	addi r2,r2,-4
    	bne cr0,L149            ; Branch to top of loop
    	lis r5,ha16(LC1)
    	li r4,6262              ; Result of doWhileWithReturn
    	la r3,lo16(LC1)(r5)
    	bl L_printf$stub
    	bl _doubleTest          ; Branch to doubleTest
  • Size

    You can force smaller code size using the -Os flag. Smaller code may be a good choice because it can reduce cache misses and paging.

    This example compares doubleTest under both -fast and -Os. Here is the source code:

    double doubleTest( void ) {
       const unsigned int limit = 100;
       double array[ limit ][ limit ], sum = 0;
       unsigned int i, j;
       for ( i = 0; i < limit; i++ ) {
          for ( j = 0; j < limit; j++ ) {
             array[ i ][ j ] = i;
             sum += array[ i ][ j ];
       return sum;

    Under -fast the inner loop gets unrolled, resulting in 10 each of the instructions stfd (store double precision floating-point) and fadd (floating-point add double precision).

    L117:                      ; Top of outer loop
    	rldicl r5,r11,0,32
    	li r9,0
    	add r2,r10,r8
    	std r5,32(r30)
    	lfd f2,32(r30)
    	fcfid f0,f2
    	.p2align 4,,15
    L116:                      ; Top of inner loop
    	fadd f11,f1,f0
    	addi r9,r9,10
    	stfd f0,0(r2)
    	stfd f0,8(r2)
    	stfd f0,16(r2)
    	stfd f0,24(r2)
    	cmplwi cr0,r9,99
    	stfd f0,32(r2)
    	stfd f0,40(r2)
    	stfd f0,48(r2)
    	stfd f0,56(r2)
    	stfd f0,64(r2)
    	stfd f0,72(r2)
    	addi r2,r2,80
    	fadd f10,f11,f0
    	fadd f9,f10,f0
    	fadd f8,f9,f0
    	fadd f7,f8,f0
    	fadd f6,f7,f0
    	fadd f5,f6,f0
    	fadd f4,f5,f0
    	fadd f3,f4,f0
    	fadd f1,f3,f0
    	ble cr0,L116       ; Branch to top of inner loop
    	addi r11,r11,1
    	addi r10,r10,800
    	cmplwi cr1,r11,99
    	ble cr1,L117       ; Branch to top of outer loop

    Under -Os the inner loop is much tighter, with only 1 store and add pair per iteration. A profiler can help you determine whether this executes quicker than the -fast version.

    L41:                      ; Top of outer loop
    	stw r9,36(r30)
    	li r8,100
    	stw r10,32(r30)
    	mtctr r8
    	lfd f0,32(r30)
    	add r2,r11,r0
    	fsub f0,f0,f13
    L46:                      ; Top of inner loop
    	stfd f0,0(r2)     ; Store followed by add
    	fadd f1,f1,f0
    	addi r2,r2,8
    	bdnz L46          ; Branch to top of inner loop
    	addi r9,r9,1
    	addi r11,r11,800
    	cmplwi cr7,r9,99
    	ble+ cr7,L41      ; Branch to top of outer loop

Specific optimizations

  • Loop unrolling

    Expand a loop to include two or more iterations before checking the loop conditional. Use the flag -floop-optimize.

  • Inlining functions

    Functions with a line count below a certain threshold may have their instruction sequence substituted for the corresponding function call. Since function calls involve overhead for stack frame setup, this may result in faster (though longer) code. Turn on inlining using -finline-functions.

  • Strength reduction

    Replace expensive operations with simpler operations. Use -fstrength-reduce.

    For example, this loop:

       for ( i = 0; i < 1000; i++ )
          sum += i * 5;

    generates this unoptimized code:

    	li r0,0
    	stw r0,40(r30)   ; sum
    	li r0,0
    	stw r0,32(r30)   ; i
    	...              ; Top of loop
    	lwz r0,32(r30)   ; Load i into r0
    	mulli r2,r0,5    ; Multiply r0 by 5, place result in r2
    	lwz r0,40(r30)   ; Load sum
    	add r0,r0,r2     ; Add r2 to sum
    	stw r0,40(r30)   ; Store new sum
    	...              ; Update i and branch to top of loop

    This optimized version replaces the multiplication in the loop with an add:

    	li r3,0        ; sum
    	li r2,0        ; i
    	add r3,r3,r2   ; Update sum
    	addi r2,r2,5   ; Add 5 to r2 each iteration: 5 + 5 + 5 ...
    	bdnz L30
  • Dead code elimination

    Remove unused code from the final image, reducing its size. This is currently an experimental feature in GCC: use the flag -fssa-dce.

  • Common subexpression elimination

    Replace multiple references to the same expression with the result of that expression (calculated once). Several variants exist, but the basic version is -fgcse.

  • Loop invariants

    An expression whose value does not change between loop iterations may be move out of the loop. The flag -floop-optimize handles this and other loop optimizations.

  • Instruction scheduling

    Scheduling for a particular processor may result in faster execution on such a system, though running the same code on other processors may be less efficient.

  • Cross-module inlining

    In this mode the compiler has the entire application (all compilation units) in view when determining what and where to inline. Cross module inlining is enabled by two things: -fast and the inclusion of all compilation units on a single compiler invocation command line.

  • Feedback-directed-optimization under -fast

    With Feedback Directed Optimization (FDO) the compiler uses a runtime profile of the application in order to make inlining and hot/cold code location decisions. It does this via a three step process:

    1. build the app using -fast with -fcreate-profile specified;
    2. run the app with a model set of data;
    3. rebuild the app using -fast with -fuse-profile to generate the FDO based optimization.

Additional optimization settings are described in the GCC manual.

For More Information

This article touched on the fundamentals of GCC, optimization, make, and GDB. The following resources provide additional information about these tools on Mac OS X:

Also try these non-ADC resources:

  • Programming with GNU Software and other titles from O'Reilly & Associates, Inc.
  • Efficient Memory Programming, originally published by McGraw-Hill but now available as an e-book, discusses code design techniques that can be applied by the programmer in order to improve the compiler's chances of effectively optimizing the code.
  • Various compiler books discuss code optimization algorithms and implementation.

If you want to keep up with GCC on other platforms try the GCC home page. Keep in mind that Apple-specific builds and docs are not available on this site.

Posted: 2004-07-12