The goals here are to compare how long various versions of gcc take to compile the output of the Gambit Scheme->C compiler applied to the programs in the Gambit benchmark suite, as well as compare the run times of the various benchmarks.
The Gambit benchmark compiler is no longer run by default as part of the Gambit benchmark suite because compiling the C code generated by the Gambit Scheme->C compiler for this program can take ridiculous amounts of space with some versions of gcc. Here are time reports using -ftime-report for compiling compiler.i, used as a test case in GCC PR 39157, with the same options as listed under "Methodology" below and various compilers:
Compiler | User time | System time | Wall time |
---|---|---|---|
gcc-4.1.2 | 910.89 | 5.67 | 916.66 |
gcc-4.2.4 | 316.49 | 5.82 | 322.33 |
gcc-4.3.3 | 217.94 | 4.77 | 222.73 |
gcc-4.4.1 20090522 | 205.93 | 4.50 | 210.45 |
gcc-4.5.0 20090521 | 335.10 | 23.22 | 825.20 |
gcc-4.5.0 -fno-forward-propagate | 260.56 | 4.67 | 265.38 |
For gcc-4.5.0 20090521, there is a large discrepancy between run time and wall time for df reaching defs, expand, and especially forward prop; this is because the machine I'm using has 8GB of RAM and some of the passes require > 9GB of RAM, causing a lot of swapping. I also have a detailed time and memory usage report for gcc-4.5.0 on May 15, 2009 (on a different machine) compiling this test case (where it allocates > 60GB of bitmaps over the course of the compilation). When -fno-forward-propagate was added to the options line to gcc-4.5.0, the run time was reduced substantially, and, more importantly, the memory requirements were reduced to about 1360 MB
The runtime of operations on very large bignums is determined mainly by the performance of a single FFT routine embedded in the code for bignum multiplication. (This is true for GMP and nearly every other fast bignum library, including Gambit's.) Thus, I'm interested in the speed of this routine, and filed PR 33928 when I noticed a slowdown. Here is a table that shows the runtime of the FFT routine when compiled by various versions of gcc when with the options
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
There's also a line for gcc-4.5.0 with -fno-forward-propagate.
Compiler | Runtime |
---|---|
4.1.2 release | 192ms |
4.2.4 release | 156ms |
4.3.3 release | 184ms |
4.4.1 20090522 | 180ms |
4.5.0 20090521 | 168ms |
4.5.0 -fno-forward-propagate | 168ms |
Each benchmark was compiled once by each compiler and then run three times; the lowest runtime of the three is reported here. Each compiler was configured with the options: --enable-languages=c --enable-checking=release --disable-multilib
The compiler options were chosen to (a) to achieve the "best" performance at "reasonable" compile times and memory requirements, and (b) to work around this compiler problem that has not been fixed in 4.2.4 or 4.3.3.
Compiler | Options |
---|---|
4.1.2 release | -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp |
4.2.4 release | -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fno-move-loop-invariants |
4.3.3 release | -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fno-move-loop-invariants |
4.4.1 20090522 | -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp |
4.5.0 20090521 | -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp |
In the following table, the colors have the following meaning:
Color |
---|
Minimum compile time |
< 1.05 minimum compile time |
> 1.25 minimum compile time |
Minimum run time |
< 1.025 minimum run time |
> 1.10 minimum run time |
Every other case |
So, for compile times, dark blue is best and light blue is good, and for run times, dark green is best and light green is good. Red is bad for both, and white is middling. Note that I assign less importance to compiler slowdowns than I do to runtime slowdowns.
Benchmark | 4.1.2 release | 4.2.4 release | 4.3.3 release | 4.4.1 20090522 | 4.5.0 20090521 | 4.5.0 20090521 -fno-forward-propagate | ||||||
Compile time | Run time | Compile time | Run time | Compile time | Run time | Compile time | Run time | Compile time | Run time | Compile time | Run time | |
browse | 0.50 | 2.248 | 0.52 | 2.212 | 0.54 | 2.284 | 0.56 | 2.284 | 0.61 | 2.248 | 0.57 | 2.244 |
cpstak | 0.31 | 1.096 | 0.31 | 1.092 | 0.33 | 1.100 | 0.35 | 1.096 | 0.36 | 1.096 | 0.36 | 1.096 |
ctak | 0.31 | 1.076 | 0.31 | 0.984 | 0.34 | 1.064 | 0.33 | 1.012 | 0.34 | 1.000 | 0.34 | 0.996 |
dderiv | 0.38 | 1.140 | 0.40 | 1.156 | 0.41 | 1.252 | 0.41 | 1.308 | 0.45 | 1.196 | 0.45 | 1.212 |
deriv | 0.33 | 1.044 | 0.34 | 1.040 | 0.33 | 1.052 | 0.34 | 1.040 | 0.36 | 1.040 | 0.36 | 1.040 |
destruc | 0.34 | 0.816 | 0.36 | 0.764 | 0.39 | 0.844 | 0.39 | 0.776 | 0.42 | 0.780 | 0.40 | 0.776 |
diviter | 0.30 | 1.044 | 0.30 | 1.048 | 0.32 | 1.056 | 0.32 | 1.044 | 0.34 | 1.052 | 0.32 | 1.044 |
divrec | 0.30 | 1.236 | 0.32 | 1.240 | 0.32 | 1.244 | 0.33 | 1.236 | 0.34 | 1.240 | 0.33 | 1.240 |
puzzle | 0.39 | 0.376 | 0.39 | 0.388 | 0.43 | 0.504 | 0.45 | 0.412 | 0.48 | 0.384 | 0.46 | 0.376 |
takl | 0.32 | 0.608 | 0.32 | 0.632 | 0.34 | 0.648 | 0.34 | 0.688 | 0.36 | 0.632 | 0.36 | 0.632 |
trav1 | 0.42 | 0.384 | 0.43 | 0.392 | 0.46 | 0.460 | 0.46 | 0.460 | 0.48 | 0.396 | 0.47 | 0.380 |
trav2 | 0.42 | 1.016 | 0.41 | 0.880 | 0.44 | 0.732 | 0.47 | 0.920 | 0.50 | 0.748 | 0.46 | 0.704 |
triangl | 0.34 | 0.804 | 0.33 | 0.844 | 0.33 | 0.836 | 0.37 | 0.812 | 0.37 | 0.872 | 0.37 | 0.844 |
fft | 0.34 | 0.220 | 0.35 | 0.228 | 0.34 | 0.240 | 0.39 | 0.228 | 0.37 | 0.216 | 0.37 | 0.220 |
fib | 0.25 | 0.692 | 0.25 | 0.680 | 0.27 | 0.724 | 0.27 | 0.692 | 0.29 | 0.612 | 0.28 | 0.636 |
fibfp | 0.31 | 0.776 | 0.33 | 0.744 | 0.34 | 0.836 | 0.34 | 0.708 | 0.36 | 0.788 | 0.34 | 0.748 |
mbrot | 0.31 | 0.616 | 0.31 | 0.616 | 0.31 | 0.620 | 0.33 | 0.612 | 0.34 | 0.616 | 0.32 | 0.608 |
nucleic | 2.78 | 0.164 | 5.67 | 0.152 | 5.79 | 0.168 | 5.85 | 0.168 | 3.20 | 0.164 | 3.50 | 0.168 |
pnpoly | 0.29 | 0.208 | 0.29 | 0.204 | 0.28 | 0.216 | 0.31 | 0.224 | 0.33 | 0.200 | 0.32 | 0.220 |
sum | 0.25 | 0.232 | 0.25 | 0.184 | 0.26 | 0.200 | 0.25 | 0.204 | 0.28 | 0.220 | 0.27 | 0.192 |
sumfp | 0.28 | 2.680 | 0.27 | 2.672 | 0.30 | 2.756 | 0.31 | 2.664 | 0.32 | 2.668 | 0.32 | 2.652 |
tak | 0.28 | 0.720 | 0.28 | 0.772 | 0.29 | 0.768 | 0.29 | 0.736 | 0.30 | 0.704 | 0.29 | 0.748 |
tfib | 0.27 | 1.028 | 0.27 | 0.980 | 0.29 | 1.120 | 0.27 | 1.060 | 0.30 | 1.044 | 0.29 | 1.072 |
ack | 0.25 | 0.392 | 0.25 | 0.420 | 0.26 | 0.420 | 0.26 | 0.424 | 0.28 | 0.420 | 0.28 | 0.444 |
array1 | 0.25 | 0.272 | 0.27 | 0.284 | 0.28 | 0.284 | 0.28 | 0.240 | 0.31 | 0.232 | 0.29 | 0.236 |
cat | 0.25 | 0.836 | 0.25 | 0.828 | 0.25 | 0.860 | 0.28 | 0.868 | 0.29 | 0.848 | 0.29 | 0.904 |
string | 0.26 | 0.908 | 0.29 | 0.912 | 0.28 | 0.908 | 0.28 | 0.908 | 0.29 | 0.928 | 0.28 | 0.892 |
sum1 | 0.25 | 0.928 | 0.26 | 0.848 | 0.27 | 0.896 | 0.28 | 0.912 | 0.30 | 0.928 | 0.30 | 0.872 |
sumloop | 0.26 | 3.020 | 0.27 | 2.968 | 0.26 | 2.964 | 0.28 | 2.660 | 0.30 | 2.980 | 0.28 | 2.972 |
tail | 0.26 | 0.700 | 0.27 | 0.672 | 0.28 | 0.692 | 0.30 | 0.696 | 0.32 | 0.680 | 0.30 | 0.680 |
wc | 0.27 | 0.384 | 0.28 | 0.368 | 0.30 | 0.384 | 0.31 | 0.384 | 0.34 | 0.392 | 0.32 | 0.384 |
conform | 1.32 | 0.724 | 1.29 | 0.712 | 1.48 | 0.752 | 1.42 | 0.696 | 1.59 | 0.680 | 1.52 | 0.720 |
dynamic | 6.61 | 0.696 | 6.50 | 0.684 | 6.61 | 0.704 | 7.82 | 0.692 | 8.73 | 0.684 | 8.10 | 0.684 |
earley | 0.85 | 0.664 | 0.83 | 0.660 | 0.95 | 0.696 | 1.01 | 0.688 | 1.10 | 0.664 | 1.07 | 0.664 |
fibc | 0.29 | 0.756 | 0.27 | 0.704 | 0.31 | 0.812 | 0.30 | 0.748 | 0.31 | 0.748 | 0.30 | 0.724 |
graphs | 0.72 | 0.656 | 0.71 | 0.676 | 0.79 | 0.688 | 0.78 | 0.688 | 0.88 | 0.656 | 0.88 | 0.668 |
lattice | 0.59 | 1.612 | 0.59 | 1.832 | 0.65 | 1.660 | 0.66 | 1.604 | 0.71 | 1.768 | 0.70 | 1.856 |
matrix | 1.20 | 1.004 | 1.22 | 0.984 | 1.31 | 1.016 | 1.30 | 1.028 | 1.47 | 0.984 | 1.42 | 0.968 |
maze | 0.96 | 0.408 | 0.95 | 0.420 | 1.08 | 0.436 | 1.04 | 0.428 | 1.18 | 0.416 | 1.16 | 0.412 |
mazefun | 0.72 | 0.616 | 0.74 | 0.592 | 0.82 | 0.652 | 0.81 | 0.640 | 0.88 | 0.608 | 0.84 | 0.600 |
nqueens | 0.26 | 0.808 | 0.27 | 0.792 | 0.28 | 0.812 | 0.27 | 0.724 | 0.30 | 0.740 | 0.30 | 0.728 |
paraffins | 0.44 | 2.108 | 0.44 | 2.224 | 0.48 | 2.268 | 0.50 | 2.144 | 0.52 | 2.108 | 0.52 | 2.160 |
peval | 1.54 | 0.656 | 1.51 | 0.632 | 1.70 | 0.688 | 1.75 | 0.660 | 1.97 | 0.624 | 1.88 | 0.624 |
pi | 0.50 | 0.640 | 0.50 | 0.672 | 0.53 | 0.704 | 0.51 | 0.664 | 0.56 | 0.640 | 0.54 | 0.624 |
primes | 0.29 | 1.276 | 0.30 | 1.272 | 0.30 | 1.292 | 0.32 | 1.264 | 0.33 | 1.248 | 0.35 | 1.236 |
ray | 0.63 | 0.156 | 0.65 | 0.152 | 0.68 | 0.168 | 0.72 | 0.164 | 0.77 | 0.164 | 0.78 | 0.168 |
scheme | 2.75 | 1.116 | 2.62 | 0.924 | 2.70 | 0.944 | 2.76 | 1.128 | 3.12 | 1.112 | 2.94 | 0.864 |
simplex | 0.64 | 0.340 | 0.63 | 0.324 | 0.70 | 0.352 | 0.73 | 0.360 | 0.76 | 0.304 | 0.74 | 0.352 |
slatex | 3.82 | 0.336 | 3.65 | 0.336 | 3.83 | 0.352 | 3.80 | 0.344 | 4.67 | 0.340 | 4.34 | 0.332 |
perm9 | 0.34 | 0.752 | 0.34 | 0.772 | 0.36 | 0.848 | 0.38 | 0.792 | 0.38 | 0.792 | 0.38 | 0.776 |
nboyer | 0.74 | 1.076 | 0.74 | 1.068 | 0.80 | 1.072 | 0.82 | 1.076 | 0.90 | 1.064 | 0.88 | 1.040 |
sboyer | 0.75 | 0.824 | 0.75 | 0.788 | 0.78 | 0.828 | 0.82 | 0.832 | 0.89 | 0.820 | 0.86 | 0.788 |
gcbench | 0.46 | 1.984 | 0.46 | 1.888 | 0.50 | 1.904 | 0.51 | 1.836 | 0.52 | 1.812 | 0.51 | 1.816 |
Overall, considering runtime, compile-time, and compiler memory requirements of various benchmarks, it seems that it may not be a good idea to include -fforward-propagate in -O1.
The version of Gambit was
heine:~/programs/gambc-v4_4_3-devel/bench> ../gsc/gsc -v v4.4.3 20090514200332 x86_64-unknown-linux-gnu
The benchmarks were run with the command
heine:~/programs/gambc-v4_4_3-devel/bench> cat run-all-benches #! /bin/csh set files = 'gcc-4.1.2 gcc-4.2.4 gcc-4.3.3 gcc-4.4-branch gcc-mainline' foreach file ( $files ) cd /home/lucier/programs/gambc-v4_4_3-devel; make mostlyclean ; ./configure CC=/pkgs/$file/bin/gcc --enable-single-host ; make -j 4 cd /home/lucier/programs/gambc-v4_4_3-devel/bench; /bin/rm results.Gambit-C-r6rs-fixflo-unsafe; ./bench -r 3 -s r6rs-fixflo-unsafe gambit all; /bin/mv results.Gambit-C-r6rs-fixflo-unsafe results.Gambit-C-r6rs-fixflo-unsafe-$file end
The file bench was modified to run the benchmarks without installing Gambit:
heine:~/programs/gambc-v4_4_3-devel/bench> rcsdiff bench =================================================================== RCS file: RCS/bench,v retrieving revision 1.1 diff -r1.1 bench 174c174 < echo gsc $COMPOPTS $1.scm --- > echo /home/lucier/programs/gambc-v4_4_3-devel/gsc/gsc -:=/home/lucier/programs/gambc-v4_4_3-devel/ $COMPOPTS $1.scm 175a176 > ls -l ./$1.o1 181c182 < then gsi -:m10000,d- ./$1.o1 --- > then /home/lucier/programs/gambc-v4_4_3-devel/gsi/gsi -:m10000,d-,=/home/lucier/programs/gambc-v4_4_3-devel/ ./$1.o1
The script gsc-cc-o.bat was modified to time the .c->.o compile separately:
heine:~/programs/gambc-v4_4_3-devel> git diff diff --git a/bin/makefile.in b/bin/makefile.in index f8d96d1..aed46e8 100644 --- a/bin/makefile.in +++ b/bin/makefile.in @@ -112,7 +112,7 @@ gsc-cc-o.bat: makefile echo "" >> gsc-cc-o.bat; \ echo "cd \"\$${GSC_CC_O_C_FILENAME_DIR}\"" >> gsc-cc-o.bat; \ echo "" >> gsc-cc-o.bat; \ - echo "@GSC_CC_O@" >> gsc-cc-o.bat; \ + echo "time @GSC_CC_O@" >> gsc-cc-o.bat; \ chmod +x gsc-cc-o.bat; \ else \ echo "@echo off" > gsc-cc-o.bat; \
The compilers were configured with
heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.1.2/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.1.2/configure --prefix=/pkgs/gcc-4.1.2 --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.1.2 heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.2.4//bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.2.4/configure --prefix=/pkgs/gcc-4.2.4 --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.2.4 heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.3.3///bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3 --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.3.3 (GCC) heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.4-branch////bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.4-branch/configure --prefix=/pkgs/gcc-4.4-branch --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.4.1 20090522 (prerelease) (GCC) heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-mainline/////bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib --enable-checking=release Thread model: posix gcc version 4.5.0 20090521 (experimental) [trunk revision 147758] (GCC)
The machine and operating system was:
heine:~/programs/gambc-v4_4_3-devel/bench> uname -a Linux heine.math.purdue.edu 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux
and the processor was
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz