Integrating Testwell CTC++ with Bazel

9 November 2023 — by Johan Herland, Mark Karpov

While working for a Tweag client, improving and optimizing their Bazel build system, the client gave us a side quest: Enable their CI pipeline to produce regular and complete coverage reports with Testwell CTC++ — a code coverage tool from Verifysoft. This article explains how we integrated Testwell CTC++ coverage testing into their Bazel build system.

Integrating coverage testing with Testwell CTC++ into a project that builds with Bazel has its own unique challenges. We navigated the combination of these two tools to provide a solution that was a good fit for our client.

The mismatch between Bazel’s built-in coverage support and CTC++

As described in its documentation, Bazel does include some support for code coverage reports, however, it is relatively limited and inflexible, only supporting GCOV/LCOV formats (see this Bazel issue for more details). Testwell CTC++, on the other hand, uses its own format, and overall uses a different approach and workflow to coverage analysis than the GNU tool suite.

CTC++ runs as a wrapper around your usual C/C++ compiler and linker: Where you’d usually invoke $CC $cc_args to compile, and $LD $ld_args to link, you instead invoke ctc $ctc_opts $CC $cc_args to compile and ctc $ctc_opts $LD $ld_args to link.

In the compile step, CTC++ will preprocess/instrument the code being compiled with its own macros. In addition, it will write a MON.sym file containing details about its instrumentation. (This one file will accumulate instrumentation details for all compilation units.)

The link step simply supplies a -lctc on the linker command line to provide the implementation of the various instrumentation calls made by the macros inserted in the compile step.

When the code (for example a suite of unit tests) is later run, the instrumentation will perform coverage counts/analysis, and write the collected stats to a MON.dat file. Like for MON.sym, the single MON.dat file will accumulate coverage stats for all runs involving executables that were compiled with ctc. The MON.sym and MON.dat files provide the basis for generating a coverage report with the ctcreport tool.¹

Integrating CTC++ into Bazel

We quickly gave up on trying to integrate CTC++ into Bazel’s existing coverage functionality for a few reasons:

It seemed infeasible to either convince Bazel to expect a different coverage format, or to convince CTC++ to produce lcov output.
CTC++ requires replacing the compiler/linker executable with the ctc wrapper, and this was not catered for by the existing coverage support in Bazel.
There is no support in Bazel to handle the extra output files (MON.sym/MON.dat) that CTC++ generates.

Instead, we focused on how to configure the build to run CTC++ in the way it wants to be run: as a wrapper around the compiler/linker.

We ended up writing a custom Bazel toolchain generator for CTC++ (source available below). This is a Bazel repository rule that copies the @local_config_cc toolchain rules (the rules that are otherwise used to build the unit test suite), but replaces its compiler/linker commands with our own wrapper script.

Our wrapper script is a small shell script that sets up the environment and options required by CTC++ and invokes ctc $ctc_opts $real_cc $@, which forwards all the compiler/linker arguments from Bazel onto the ctc command-line. More details about this wrapper script below.

Bazel’s mantra of “{ Fast, Correct } — Choose two” relies on having complete knowledge of the build graph. This includes explicitly stating the inputs and outputs of all build steps. With our new toolchain that invokes CTC++, we have in effect added a new output file to every compilation step (MON.sym), and another output file to every unit test step (MON.dat). Moreover — unless we configure CTC++ otherwise — these files are shared outputs across all compilation/test steps,² something that we cannot easily encode into Bazel’s build graph.³

In the end we did not instrument Bazel to keep track of the MON.sym/MON.dat files. Instead, we rely on CTC++ coverage builds and test runs to always be performed from a clean source tree, with no reuse of build artifacts. In our case this works out well, since CTC++ coverage builds/runs are ultimately run automatically as part of CI (where we can ensure these conditions are being met). We do not expect developers to combine CTC++ with incremental rebuilds in their local source trees.

Navigating the Bazel sandbox

It is worth explaining some complications introduced by the sandbox in which Bazel runs its build steps. Our project builds on Linux, and we thus use the linux-sandbox flavor of sandboxing provided by Bazel. This includes using Linux namespaces to isolate each build step, both from other build steps, as well as from the surrounding system. This is a powerful mechanism that helps achieve multiple objectives:

Making sure all build inputs are completely specified in Bazel, by cutting off access to anything that is not explicitly declared. This also helps improve the reproducibility of build artifacts, and thus the reusability/cacheability of intermediate build products.
Preventing unwanted details from the surrounding system from leaking into the build product (e.g. hostnames, timestamps, dependencies on system libraries, etc.)
Preventing the build from polluting the surrounding system (e.g. writing files in places where they don’t belong)

However, when throwing CTC++ into the mix, there are certain concessions we have to make in order to make everything work together:

Keeping `MON.sym`/`MON.dat` outside the sandbox

In order for the writes to MON.sym/MON.dat to be reflected outside the sandbox, we direct them to a shared directory outside the sandbox, and additionally pass this directory to Bazel’s --sandbox_writable_path option in order for these writes to be allowed inside the restricted sandbox environment.

Access to the CTC++ license server

CTC++ also needs to verify that it’s running with a valid license, and this requires access to a license server, meaning that we must be able to access the network from within the Bazel sandbox.

Preventing corruption of CTC++ temporary files across sandboxes

While CTC++ is running, it makes use of some temporary files. These are put in /tmp by default, and while CTC++ takes care to use the current process ID (PID) when naming these files to prevent concurrent CTC++ processes from using the same filenames, there is an unfortunate interplay with Bazel’s sandboxing.

Bazel’s sandbox includes a PID namespace to ensure that processes inside one sandbox cannot “see” processes in other sandboxes. However, this namespace causes the PIDs inside each sandbox (as seen from inside the sandbox) to restart their numbering from 1. Thus, the build process inside each sandbox has a rather predictable and repeatable PID, which means that CTC++ processes sometimes end up choosing the same name for their temporary files.

When combined with the fact that Bazel by default does not isolate the /tmp directory between sandboxes, this causes some temporary files to get corrupted by simultaneous writes from different CTC++ process, which then led to corrupted data being copied into the MON.sym/MON.dat files.

The solution we landed on here was to direct CTC++ (via the TMP_DIRECTORY configuration option) to write its temporary files inside the sandbox (instead of in /tmp) which immediately resolved the corruption.

Mapping sandbox paths to source tree paths

Another complication introduced by Bazel’s sandbox is that the source file paths that are recorded inside MON.sym/MON.dat reference the sandbox directories that are created and deleted by Bazel before/after each build step. Fortunately, the ctcreport tool comes with a -map-source-identification option that allows the sandbox paths to be mapped back into persistent/real source paths. Thus CTC++ quite elegantly allows us to build the source code from sandboxes with highly variable (but predictable) names, while still allowing all source file references to be resolved back to their canonical location and be successfully found at report generation time.

The solution

Here follows a short overview of the most importants code additions we made to perform the integration outlined above:

The Bazel toolchain generator for CTC++

This is a Bazel repository rule, its job is to:

Download, extract and install the CTC++ tool suite for Linux.⁴
Prepare our compiler/linker wrapper script.
Copy the @local_config_cc toolchain rules, and replace the compiler/linker command with our wrapper script.⁵

def _impl(repository_ctx):
    repository_ctx.download_and_extract(
        repository_ctx.attr.url,
        sha256 = repository_ctx.attr.sha256,
        output = ".",  # unpacks into ctclinux/ subdir
    )

    ctc_install_dir = repository_ctx.path("ctc_installed")
    repository_ctx.execute(["mkdir", ctc_install_dir])
    repository_ctx.execute([
        "make",
        "install",
        "-C",
        str(repository_ctx.path("ctclinux")),
        "prefix={}".format(ctc_install_dir),
        "FLEXLICFILE={}".format(repository_ctx.attr.license),
    ])
    repository_ctx.execute(["rm", "-rf", "ctclinux"])

    # Write the absolute path to our CTC++ installation into the wrapper script.
    # Embedding absolute paths to the Bazel output base is inhermetic, can cause
    # cache misses, and should not be done in most cases. In this specific case
    # it is okay because CTC++ coverage builds are uncached.
    repository_ctx.template(
        "ctc_wrapper.sh",
        Label("//bazel/toolchains/ctc_coverage:ctc_wrapper.sh.tpl"),
        {
            "%{CTC_INSTALL_DIR}%": str(ctc_install_dir),
        },
        executable = True,
    )

    # Create a copy of the @local_config_cc toolchain that invokes our CTC++
    # wrapper script instead of the compiler/linker.
    wrapper_path = repository_ctx.path("ctc_wrapper.sh")
    vanilla_cc_toolchain_path = repository_ctx.path(Label("@local_config_cc//:BUILD")).dirname
    for x in vanilla_cc_toolchain_path.readdir():
        repository_ctx.execute(["cp", "-rL", x, "."])

    # Bazel uses gcc (not ld) to link, so we don't need to wrap ld
    repository_ctx.execute(
        ["sed", "-i", "-e", "s#/usr/bin/gcc#{}#".format(wrapper_path), "BUILD"],
    )

ctc_toolchain = repository_rule(
    implementation = _impl,
    attrs = {
        "license": attr.string(mandatory = True),
        "sha256": attr.string(mandatory = True),
        "url": attr.string(mandatory = True),
    },
)

This repository rule is then invoked from the project’s WORKSPACE file with a statement similar to this:

ctc_toolchain(
    name = "ctc_toolchain",
    license = "...",
    sha256 = "...",
    url = "https://some_internal_server/path/to/ctclinux-v10.0.1-x64.zip",
)

Configuring Bazel to build with CTC++

With this addition to .bazelrc, we can enable CTC++ coverage testing by passing --config=ctc_coverage on the command line:

# Flags for coverage testing with CTC++
# Disable build caches, we want CTC++ coverage to build from scratch always
build:ctc_coverage --disk_cache= --remote_cache=
build:ctc_coverage --noremote_accept_cached
build:ctc_coverage --noincompatible_remote_results_ignore_disk
build:ctc_coverage --noremote_upload_local_results
# Use the ctc_toolchain for all "target" builds
build:ctc_coverage --crosstool_top=@ctc_toolchain//:toolchain
# But skip CTC++ for building host tools (tools needed during the build)
build:ctc_coverage --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
# Where to place the MON.sym/MON.dat files, and ensure sandbox allows this
build:ctc_coverage --action_env=CTC_RESULTS_DIR=/path/to/ctc_results
build:ctc_coverage --sandbox_writable_path=/path/to/ctc_results

The wrapper script for invoking CTC++

This is the ctc_wrapper.sh.tpl template that is expanded into the ctc_wrapper.sh wrapper script in the toolchain generator above. This script ends up being invoked every time Bazel wants to compile or link something with the ctc_toolchain introduced above:

#!/bin/sh

# Environment variables:
# CTC_INSTALL_DIR: Where is CTC++ installed?
# CTC_RESULTS_DIR: Where to write CTC++ MON.sym and MON.dat files
CTC_INSTALL_DIR=%{CTC_INSTALL_DIR}%

export CTCHOME="$CTC_INSTALL_DIR/lib/ctc"

sym_file="$CTC_RESULTS_DIR/MON.sym"

# MAX_CMDLINE_LENGTH=256000 - Support long compiler/linker command lines from Bazel.
# TMP_DIRECTORY=. - Make sure CTC++ temp files are written inside the sandbox
# More CTC++ options like -i and -C EXCLUDE=... should go here as well:
ctc_opts="MAX_CMDLINE_LENGTH=256000 -C TMP_DIRECTORY=. -n $sym_file"

exec "$CTC_INSTALL_DIR/bin/ctc" $ctc_opts /usr/bin/gcc $@

Bringing it all together in a single entry point

Finally, here is a convenience script for performing a full build/test run with CTC++ and generating a HTML coverage report:

#!/usr/bin/env bash
#
# Usage: ./ctc_report.sh [TEST_TARGET...]
#
# If no target is given, default to all tests, aka. //...
#
# Objective:
# - Build and run tests in _one_ `bazel test` invocation
# - Generate test report in ./ctc_results/report/index.html
# - Package all of ./ctc_results into a .tar.gz file

TARGETS="${@-//...}"
BAZEL_ARGS="--build_tests_only --keep_going --config=ctc_coverage"
WORKSPACE="$(pwd)"
CTC_INSTALL_DIR=$(bazel query '@ctc_toolchain//:BUILD' --output location | cut -d: -f1 | sed 's#BUILD#ctc_installed#')

if [ ! -f "$WORKSPACE/WORKSPACE" ]; then
    echo "ERROR: Must be run from project root!"
    exit 1
fi

# Clean slate:
bazel clean
rm -rf ctc_results && mkdir ctc_results

export CTCHOME="$CTC_INSTALL_DIR/lib/ctc"

echo "* Running: bazel test $BAZEL_ARGS -- $TARGETS"
bazel test $BAZEL_ARGS -- $TARGETS 2>&1 | tee ctc_results/bazel.log

echo "* Finished building/running tests, generating report"
(
    cd ctc_results &&
    "$CTC_INSTALL_DIR/bin/ctcreport" \
        MON.sym MON.dat \
        -measures mcdc,d,s \
        -nsb \
        -map-source-identification "project_name,$WORKSPACE" \
        -D "ProjectName=project_name" \
        -exclude-files "..." \
        -shorten-path "$WORKSPACE/" \
        -o report
)

echo "* Creating ctc_results.tar.gz archive"
tar czf ctc_results.tar.gz ctc_results

echo "* Done. Report is found at ctc_results/report/index.html inside archive"

Conclusion

The solution presented here might not win any beauty contests, and it comes with some inherent limitations (e.g. in terms of cacheability and lack of support for incremental builds) that surely fall short of build system ideals. What our client needs, however, is a CI pipeline that provides regular and complete coverage reports with actionable metrics, and we have certainly been able to provide that.

One important thing this integration has taught us is the flexibility afforded by being able to insert a toolchain wrapper that allows us to precisely control exactly how Bazel invokes the compiler and linker. We certainly could have hoped for it to be easier to introduce this wrapper (i.e. not having to copy + modify existing toolchain rules), but once the wrapper is in place, it provides the perfect place to observe and control the interaction between Bazel and CTC++.

More generally, we hope this story helps illustrate the importance of knowing your tools, including how to leverage their strengths and work around their weaknesses.

Thanks to Guillaume Desforges, Christopher Harrison, and Andreas Herrmann for their reviews of this article.

Although it is possible to split instrumentation details/stats into more than a single MON.sym/MON.dat file pair, this requires setting/passing a different -n option on the ctc command line, which complicates our wrapper script with no clear benefit.↩
Don’t worry, CTC++ uses flock() to protect against concurrent writes to these files when multiple compilations/tests are running concurrently.↩
If we could declare these extra outputs properly to Bazel, they would surely wreak havoc on any calculations Bazel does to reuse intermediate build products for faster incremental rebuilds…↩
This performs a make install, which is often frowned upon inside a repository rule, as it is a source of non-hermeticity. In this case, however, we’re installing a pre-built binary distribution, and the make install does little more than copying files from the extracted zip archive and writing a simple config file.↩
Here we make an assumption that the @local_config_cc toolchain has already been configured to invoke /usr/bin/gcc as its compiler. This assumption must be adjusted when applying this technique to another toolchain/project/codebase.↩

About the authors

Johan HerlandJohan is a Developer Productivity Engineer at Tweag. Originally from Western Norway, he is currently based in Delft, NL, and enjoys this opportunity to discover the Netherlands and the rest of continental Europe. Johan has almost twenty years of industry experience, mostly working with Linux and open source software within the embedded realm. He has a passion for designing and implementing elegant and useful solutions to challenging problems, and is always looking for underlying root causes to the problems that face software developers today. Outside of work, he enjoys playing jazz piano and cycling.

Mark KarpovMark is a build system expert with a particular focus on Bazel. As a consultant at Tweag he has worked with a number of large and well-known companies that use Bazel or decided to migrate to it. Other than build systems, Mark's background is in functional programming and in particular Haskell. His personal projects include high-profile Haskell libraries, tutorials, and a technical blog.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Great Nickel configurations from little merges grow Organist: stay sane managing your development environments →