DistCC and CCache on Linux

xkcd knows this

After a project reaches a certain size, compiling the code becomes a task in and of itself and is a source of wasted time the world over. Recently I started looking at tools to reduce the amount of time I waste waiting on code to compile and I found a few things that can help with that.

ccache

The first tool I found was something called ccache, and it works almost like a web browser cache, i.e. it stores the output of every compile operation I run in my home directory, and when I run the same job again it recalls the same result.

All one needs to do to get it working is to either prefix every call to the compiler with ccache or, and this is what I recommend, prefix your PATH environment variable with /usr/lib/ccache/bin. I just have a line in my .zshenv file that does just that:

    export PATH="/usr/lib/ccache/bin:$PATH"

Now be warned, the first run with a large project will take longer than a run without ccache enabled. The reason for this is that it has to build its internal cache which takes some amount of time between calls to the compiler.

You can check the ccache statistics with the command ccache -s. Here’s an example:

    $ ccache -s
    cache directory                     /home/dholman/.ccache
    primary config                      /home/dholman/.ccache/ccache.conf
    secondary config      (readonly)    /etc/ccache.conf
    stats updated                       Fri Nov  6 13:31:48 2020
    cache hit (direct)                 16980
    cache hit (preprocessed)            2373
    cache miss                         76011
    cache hit rate                     20.29 %
    called for link                     2154
    called for preprocessing            2603
    multiple source files                  2
    compiler produced stdout               4
    compiler produced empty output       652
    compile failed                       657
    preprocessor error                  1541
    bad compiler arguments               260
    unsupported source language           10
    autoconf compile/link               3087
    unsupported code directive           123
    could not write to output file       110
    no input file                       2858
    cleanups performed                    65
    files in cache                      6387
    cache size                         586.9 MB
    max cache size                      20.0 GB

distcc

The next tool was a bit more fiddly to get working and may or may not work for everyone, so be warned. Also, in order to get any benefit out of this, you need a separate computer attached to the same network as your development workstation.

Distcc, according to the website, is a distributed compiler. That’s not really true, as it functions as an extension to the existing compiler already running on my machine. In my case I have a Dell R710 dual Xeon system running in the basement, and I have distcc set up as a daemon listening on TCP port 3632 on that system. Here’s an excerpt from /etc/default/distcc allowing all hosts on my network to use it:

    #
    # Which networks/hosts should be allowed to connect to the daemon?
    # You can list multiple hosts/networks separated by spaces.
    # Networks have to be in CIDR notation, f.e. 192.168.1.0/24
    # Hosts are represented by a single IP Adress
    #
    # ALLOWEDNETS="127.0.0.1"

    ALLOWEDNETS="127.0.0.1 192.168.1.0/24"

The next step was to install distcc on my workstation and configure it to use the server downstairs as a slave:

    # --- /etc/distcc/hosts -----------------------
    # See the "Hosts Specification" section of
    # "man distcc" for the format of this file.
    #
    # By default, just test that it works in loopback mode.
    192.168.1.26/24,cpp,lzo

With all of this configured I could then build a project with make -j<number of parallel jobs> CC=distcc. Here’s a few results from some of my personal projects and other things.

Forge game engine:

    $ time make -j20 CC=distcc
    [ 23%] Building C object CMakeFiles/forge.dir/src/ui/button.c.o
    [ 23%] Building C object CMakeFiles/forge.dir/src/data/stack.c.o
    [ 23%] Building C object CMakeFiles/forge.dir/src/data/list.c.o
    [ 30%] Building C object CMakeFiles/forge.dir/src/ui/spinner.c.o
    [ 38%] Building C object CMakeFiles/forge.dir/src/ui/text.c.o
    [ 46%] Building C object CMakeFiles/forge.dir/src/engine.c.o
    [ 53%] Building C object CMakeFiles/forge.dir/src/ui/rect.c.o
    [ 61%] Building C object CMakeFiles/forge.dir/src/graphics.c.o
    [ 69%] Building C object CMakeFiles/forge.dir/src/input.c.o
    [ 76%] Building C object CMakeFiles/forge.dir/src/entity.c.o
    [ 92%] Building C object CMakeFiles/forge.dir/src/sprite.c.o
    [ 92%] Building C object CMakeFiles/forge.dir/src/tmx.c.o
    [100%] Linking C shared library libforge.so
    [100%] Built target forge
    make -j20 CC=distcc  0.03s user 0.02s system 97% cpu 0.046 total

Linux kernel with default config:

    $ time make -j20 CC=distcc
    ...
    Kernel: arch/x86/boot/bzImage is ready  (#1)
    make -j20 CC=distcc  463.45s user 116.23s system 120% cpu 8:01.29 total

Astute readers might notice that these results seem a little far-fetched for just distcc alone, and they would be right. Because I set ccache to be called before the compiler, distcc gets hooked into the call of ccache. I do not recommend doing it the other way around and neither do the developers of ccache and distcc. I also recommend not setting up distcc to be called on every call to the compiler, as distributing source code over TCP/IP like this can be a very time consuming process on sub-gigabit speed networks. One other thing to note is that the compiler versions must match, or all calls to distcc will fail.

One can look at what distcc is currently doing behind the scenes with distccmon-text <refresh seconds>, or if a GUI interface is preferred, you can use distccmon-gnome.

distcc monitor