Compile-time and run-time comparison of Gambit scheme programs compiled with different versions of gcc

Goals

The goals here are to compare how long various versions of gcc take to compile the output of the Gambit Scheme->C compiler applied to the programs in the Gambit benchmark suite, as well as compare the run times of the various benchmarks.

Preliminary benchmarks

CPU time and memory required to compile "compiler.i"

The Gambit benchmark compiler is no longer run by default as part of the Gambit benchmark suite because compiling the C code generated by the Gambit Scheme->C compiler for this program can take ridiculous amounts of space with some versions of gcc. Here are time reports using -ftime-report for compiling compiler.i, used as a test case in GCC PR 39157, with the same options as listed under "Methodology" below and various compilers:

Compiler	User time	System time	Wall time
`gcc-4.1.2`	910.89	5.67	916.66
`gcc-4.2.4`	316.49	5.82	322.33
`gcc-4.3.3`	217.94	4.77	222.73
`gcc-4.4.1 20090522`	205.93	4.50	210.45
`gcc-4.5.0 20090521`	335.10	23.22	825.20
`gcc-4.5.0 -fno-forward-propagate`	260.56	4.67	265.38

For gcc-4.5.0 20090521, there is a large discrepancy between run time and wall time for df reaching defs, expand, and especially forward prop; this is because the machine I'm using has 8GB of RAM and some of the passes require > 9GB of RAM, causing a lot of swapping. I also have a detailed time and memory usage report for gcc-4.5.0 on May 15, 2009 (on a different machine) compiling this test case (where it allocates > 60GB of bitmaps over the course of the compilation). When -fno-forward-propagate was added to the options line to gcc-4.5.0, the run time was reduced substantially, and, more importantly, the memory requirements were reduced to about 1360 MB

Runtime for FFT code

The runtime of operations on very large bignums is determined mainly by the performance of a single FFT routine embedded in the code for bignum multiplication. (This is true for GMP and nearly every other fast bignum library, including Gambit's.) Thus, I'm interested in the speed of this routine, and filed PR 33928 when I noticed a slowdown. Here is a table that shows the runtime of the FFT routine when compiled by various versions of gcc when with the options

-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp

There's also a line for gcc-4.5.0 with -fno-forward-propagate.

Compiler	Runtime
4.1.2 release	192ms
4.2.4 release	156ms
4.3.3 release	184ms
4.4.1 20090522	180ms
4.5.0 20090521	168ms
4.5.0 -fno-forward-propagate	168ms

Methodology

Each benchmark was compiled once by each compiler and then run three times; the lowest runtime of the three is reported here. Each compiler was configured with the options: --enable-languages=c --enable-checking=release --disable-multilib

The compiler options were chosen to (a) to achieve the "best" performance at "reasonable" compile times and memory requirements, and (b) to work around this compiler problem that has not been fixed in 4.2.4 or 4.3.3.

Compiler	Options
4.1.2 release	-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
4.2.4 release	-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fno-move-loop-invariants
4.3.3 release	-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fno-move-loop-invariants
4.4.1 20090522	-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
4.5.0 20090521	-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp

Results

In the following table, the colors have the following meaning:

Color
Minimum compile time
< 1.05 minimum compile time
> 1.25 minimum compile time
Minimum run time
< 1.025 minimum run time
> 1.10 minimum run time
Every other case

So, for compile times, dark blue is best and light blue is good, and for run times, dark green is best and light green is good. Red is bad for both, and white is middling. Note that I assign less importance to compiler slowdowns than I do to runtime slowdowns.

**Compile-time and run-time comparison of Gambit scheme programs compiled with different versions of gcc**
Benchmark	4.1.2 release		4.2.4 release		4.3.3 release		4.4.1 20090522		4.5.0 20090521		4.5.0 20090521 -fno-forward-propagate
	Compile time	Run time	Compile time	Run time	Compile time	Run time	Compile time	Run time	Compile time	Run time	Compile time	Run time
browse	0.50	2.248	0.52	2.212	0.54	2.284	0.56	2.284	0.61	2.248	0.57	2.244
cpstak	0.31	1.096	0.31	1.092	0.33	1.100	0.35	1.096	0.36	1.096	0.36	1.096
ctak	0.31	1.076	0.31	0.984	0.34	1.064	0.33	1.012	0.34	1.000	0.34	0.996
dderiv	0.38	1.140	0.40	1.156	0.41	1.252	0.41	1.308	0.45	1.196	0.45	1.212
deriv	0.33	1.044	0.34	1.040	0.33	1.052	0.34	1.040	0.36	1.040	0.36	1.040
destruc	0.34	0.816	0.36	0.764	0.39	0.844	0.39	0.776	0.42	0.780	0.40	0.776
diviter	0.30	1.044	0.30	1.048	0.32	1.056	0.32	1.044	0.34	1.052	0.32	1.044
divrec	0.30	1.236	0.32	1.240	0.32	1.244	0.33	1.236	0.34	1.240	0.33	1.240
puzzle	0.39	0.376	0.39	0.388	0.43	0.504	0.45	0.412	0.48	0.384	0.46	0.376
takl	0.32	0.608	0.32	0.632	0.34	0.648	0.34	0.688	0.36	0.632	0.36	0.632
trav1	0.42	0.384	0.43	0.392	0.46	0.460	0.46	0.460	0.48	0.396	0.47	0.380
trav2	0.42	1.016	0.41	0.880	0.44	0.732	0.47	0.920	0.50	0.748	0.46	0.704
triangl	0.34	0.804	0.33	0.844	0.33	0.836	0.37	0.812	0.37	0.872	0.37	0.844
fft	0.34	0.220	0.35	0.228	0.34	0.240	0.39	0.228	0.37	0.216	0.37	0.220
fib	0.25	0.692	0.25	0.680	0.27	0.724	0.27	0.692	0.29	0.612	0.28	0.636
fibfp	0.31	0.776	0.33	0.744	0.34	0.836	0.34	0.708	0.36	0.788	0.34	0.748
mbrot	0.31	0.616	0.31	0.616	0.31	0.620	0.33	0.612	0.34	0.616	0.32	0.608
nucleic	2.78	0.164	5.67	0.152	5.79	0.168	5.85	0.168	3.20	0.164	3.50	0.168
pnpoly	0.29	0.208	0.29	0.204	0.28	0.216	0.31	0.224	0.33	0.200	0.32	0.220
sum	0.25	0.232	0.25	0.184	0.26	0.200	0.25	0.204	0.28	0.220	0.27	0.192
sumfp	0.28	2.680	0.27	2.672	0.30	2.756	0.31	2.664	0.32	2.668	0.32	2.652
tak	0.28	0.720	0.28	0.772	0.29	0.768	0.29	0.736	0.30	0.704	0.29	0.748
tfib	0.27	1.028	0.27	0.980	0.29	1.120	0.27	1.060	0.30	1.044	0.29	1.072
ack	0.25	0.392	0.25	0.420	0.26	0.420	0.26	0.424	0.28	0.420	0.28	0.444
array1	0.25	0.272	0.27	0.284	0.28	0.284	0.28	0.240	0.31	0.232	0.29	0.236
cat	0.25	0.836	0.25	0.828	0.25	0.860	0.28	0.868	0.29	0.848	0.29	0.904
string	0.26	0.908	0.29	0.912	0.28	0.908	0.28	0.908	0.29	0.928	0.28	0.892
sum1	0.25	0.928	0.26	0.848	0.27	0.896	0.28	0.912	0.30	0.928	0.30	0.872
sumloop	0.26	3.020	0.27	2.968	0.26	2.964	0.28	2.660	0.30	2.980	0.28	2.972
tail	0.26	0.700	0.27	0.672	0.28	0.692	0.30	0.696	0.32	0.680	0.30	0.680
wc	0.27	0.384	0.28	0.368	0.30	0.384	0.31	0.384	0.34	0.392	0.32	0.384
conform	1.32	0.724	1.29	0.712	1.48	0.752	1.42	0.696	1.59	0.680	1.52	0.720
dynamic	6.61	0.696	6.50	0.684	6.61	0.704	7.82	0.692	8.73	0.684	8.10	0.684
earley	0.85	0.664	0.83	0.660	0.95	0.696	1.01	0.688	1.10	0.664	1.07	0.664
fibc	0.29	0.756	0.27	0.704	0.31	0.812	0.30	0.748	0.31	0.748	0.30	0.724
graphs	0.72	0.656	0.71	0.676	0.79	0.688	0.78	0.688	0.88	0.656	0.88	0.668
lattice	0.59	1.612	0.59	1.832	0.65	1.660	0.66	1.604	0.71	1.768	0.70	1.856
matrix	1.20	1.004	1.22	0.984	1.31	1.016	1.30	1.028	1.47	0.984	1.42	0.968
maze	0.96	0.408	0.95	0.420	1.08	0.436	1.04	0.428	1.18	0.416	1.16	0.412
mazefun	0.72	0.616	0.74	0.592	0.82	0.652	0.81	0.640	0.88	0.608	0.84	0.600
nqueens	0.26	0.808	0.27	0.792	0.28	0.812	0.27	0.724	0.30	0.740	0.30	0.728
paraffins	0.44	2.108	0.44	2.224	0.48	2.268	0.50	2.144	0.52	2.108	0.52	2.160
peval	1.54	0.656	1.51	0.632	1.70	0.688	1.75	0.660	1.97	0.624	1.88	0.624
pi	0.50	0.640	0.50	0.672	0.53	0.704	0.51	0.664	0.56	0.640	0.54	0.624
primes	0.29	1.276	0.30	1.272	0.30	1.292	0.32	1.264	0.33	1.248	0.35	1.236
ray	0.63	0.156	0.65	0.152	0.68	0.168	0.72	0.164	0.77	0.164	0.78	0.168
scheme	2.75	1.116	2.62	0.924	2.70	0.944	2.76	1.128	3.12	1.112	2.94	0.864
simplex	0.64	0.340	0.63	0.324	0.70	0.352	0.73	0.360	0.76	0.304	0.74	0.352
slatex	3.82	0.336	3.65	0.336	3.83	0.352	3.80	0.344	4.67	0.340	4.34	0.332
perm9	0.34	0.752	0.34	0.772	0.36	0.848	0.38	0.792	0.38	0.792	0.38	0.776
nboyer	0.74	1.076	0.74	1.068	0.80	1.072	0.82	1.076	0.90	1.064	0.88	1.040
sboyer	0.75	0.824	0.75	0.788	0.78	0.828	0.82	0.832	0.89	0.820	0.86	0.788
gcbench	0.46	1.984	0.46	1.888	0.50	1.904	0.51	1.836	0.52	1.812	0.51	1.816

Conclusions

Overall, considering runtime, compile-time, and compiler memory requirements of various benchmarks, it seems that it may not be a good idea to include -fforward-propagate in -O1.

Appendix: Detailed information about benchmark software and configuration

The version of Gambit was

heine:~/programs/gambc-v4_4_3-devel/bench> ../gsc/gsc -v
v4.4.3 20090514200332 x86_64-unknown-linux-gnu

The benchmarks were run with the command

heine:~/programs/gambc-v4_4_3-devel/bench> cat run-all-benches 
#! /bin/csh
set files = 'gcc-4.1.2  gcc-4.2.4	gcc-4.3.3  gcc-4.4-branch  gcc-mainline'
foreach file ( $files )
  cd /home/lucier/programs/gambc-v4_4_3-devel; make mostlyclean ; ./configure CC=/pkgs/$file/bin/gcc --enable-single-host ; make -j 4
  cd /home/lucier/programs/gambc-v4_4_3-devel/bench; /bin/rm results.Gambit-C-r6rs-fixflo-unsafe; ./bench -r 3 -s r6rs-fixflo-unsafe gambit all; /bin/mv results.Gambit-C-r6rs-fixflo-unsafe results.Gambit-C-r6rs-fixflo-unsafe-$file
end

The file bench was modified to run the benchmarks without installing Gambit:

heine:~/programs/gambc-v4_4_3-devel/bench> rcsdiff bench
===================================================================
RCS file: RCS/bench,v
retrieving revision 1.1
diff -r1.1 bench
174c174
<     echo gsc $COMPOPTS $1.scm
---
>     echo /home/lucier/programs/gambc-v4_4_3-devel/gsc/gsc -:=/home/lucier/programs/gambc-v4_4_3-devel/ $COMPOPTS $1.scm
175a176
>     ls -l ./$1.o1
181c182
<   then gsi -:m10000,d- ./$1.o1
---
>   then /home/lucier/programs/gambc-v4_4_3-devel/gsi/gsi -:m10000,d-,=/home/lucier/programs/gambc-v4_4_3-devel/ ./$1.o1

The script gsc-cc-o.bat was modified to time the .c->.o compile separately:

heine:~/programs/gambc-v4_4_3-devel> git diff
diff --git a/bin/makefile.in b/bin/makefile.in
index f8d96d1..aed46e8 100644
--- a/bin/makefile.in
+++ b/bin/makefile.in
@@ -112,7 +112,7 @@ gsc-cc-o.bat: makefile
          echo "" >> gsc-cc-o.bat; \
          echo "cd \"\$${GSC_CC_O_C_FILENAME_DIR}\"" >> gsc-cc-o.bat; \
          echo "" >> gsc-cc-o.bat; \
-         echo "@GSC_CC_O@" >> gsc-cc-o.bat; \
+         echo "time @GSC_CC_O@" >> gsc-cc-o.bat; \
          chmod +x gsc-cc-o.bat; \
        else \
          echo "@echo off" > gsc-cc-o.bat; \

The compilers were configured with

heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.1.2/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.1.2/configure --prefix=/pkgs/gcc-4.1.2 --enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.1.2
heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.2.4//bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.2.4/configure --prefix=/pkgs/gcc-4.2.4 --enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.2.4
heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.3.3///bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3 --enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.3.3 (GCC) 
heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-4.4-branch////bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.4-branch/configure --prefix=/pkgs/gcc-4.4-branch --enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.4.1 20090522 (prerelease) (GCC) 
heine:~/programs/gambc-v4_4_3-devel/bench> /pkgs/gcc-mainline/////bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib --enable-checking=release
Thread model: posix
gcc version 4.5.0 20090521 (experimental) [trunk revision 147758] (GCC)

The machine and operating system was:

heine:~/programs/gambc-v4_4_3-devel/bench> uname -a
Linux heine.math.purdue.edu 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux

and the processor was

model name	: Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz