Tweag

nixtract 0.1.0

26 October 2023 — by Guillaume Desforges

Tweag is excited to announce the first release of nixtract 0.1.0! This is our first step towards a broader effort to make Nix the best tool to tackle tomorrow’s challenges of the Software Supply Chain.

In order to understand why we need nixtract, let me tell you about the “secret” value of Nixpkgs.

Is it a bird? A plane? It’s a graph!

The Nix language allows you to define the “recipe” to build anything into a package, like the sources and the steps to make the package, but also the dependencies it needs to build. For instance we can define the recipe for a package that depends on zlib.

{ stdenv, fetchFromGitHub, zlib }:

stdenv.mkDerivation {
  pname = "mypackage";
  version = "1.0.0";

  src = fetchFromGitHub {
    owner = "someowner";
    repo = "somerepo";
    rev = ...;
    hash = ...;
  };

  buildInputs = [ zlib ];
}

Such a recipe is called a “derivation”. Behind the zlib variable we use here also lies a derivation, another recipe to build said library.

Looking at this expression, we can think of it as a graph. Derivations are the nodes, and they are connected when one depends on another.

mypackage
zlib

This graph reads: mypackage depends on zlib.

This is what we call a “dependency graph”.

From one to many

Once we’ve written the recipe of a package, it would be a waste not to share it, wouldn’t it? That’s what people thought, hence creating Nixpkgs, a repository of Nix expressions that define packages maintained by the Nix community.

Nixpkgs is a great endeavour, maintaining one of the largest collection of packages ever made, which we could use to draw a dependency graph!

Maintainers also add all sorts of metadata such as the name, the version, the license, the homepage, the description, and so on, which could be used to make an even richer graph.

This data is extremely valuable, and can be used for many purposes such as:

  • listing all the packages that a package depends on, both direct and transitive (what we call the closure of the package)
  • listing all licenses that are in the closure of a package
  • if plugged into a binary cache, it would help analyze the closure size and find the dependencies that have the most impact on it
  • if plugged into a CVE database, it would help flag all compromised packages

One could imagine a dashboard with all the packages used in a system, or even throughout a company’s systems, to quickly monitor, audit and act on the entire Software Supply Chain.

The information is here in Nixpkgs, but hard to use because it’s encoded in Nix expressions. nixtract’s goal is to make it accessible in a way that’s easier to consume.

extracting data from Nix expressions

Let’s consider a small Nix expression.

{
  a = 1;
  b = { c = 2; };
  d = [ { e = 3; } { f = 4; } ];
}

We can explore this nested structure like so:

$ nix eval -f example.nix --json 'a'
1
$ nix eval -f example.nix --json 'b.c'
2

'a' and 'b.c' are called attribute paths, and describe how to reach a value. This is the mechanism that is used to access packages in Nixpkgs. For example, we can install packages through their attribute path:

$ nix-env -iA pythonPackages.requests

This first accesses the pythonPackages attribute of the usual pkgs attribute set. pythonPackages is an attribute set, and we can then access its requests attribute.

One could be tempted to parse Nix expressions with a custom parser to extract the graph. Unfortunately, it gets a bit more complicated when a node has many children, represented as a list:

$ nix eval -f example.nix d --apply 'd: (builtins.elemAt d 0).e'
3

And it would quickly get too complex with Nix functions:

let
  names = ["a" "b" "c"]
  items = [0, 1, 2];
in
  builtins.listToAttrs (builtins.map (x: {name = builtins.elemAt names x; value = x;}) items)

This last expression builds a simple attribute set {a = 0; b = 1; c = 2}, but it does require to evaluate it to make any sense.

All in all, the only strategy available to extract the graph data written in Nix code is to evaluate said Nix code.

Evaluating the entire Nixpkgs

Nixpkgs being Nix code, it can be imported and evaluated, but also manipulated in Nix code!

let
  pkgs = import <nixpkgs> {};
in
  builtins.attrNames pkgs

Here pkgs is an attribute set like any in Nix, so we can manipulate it to our heart’s content.

A first approach to extract the information from Nixpkgs was thus to use a Nix expression that returns the whole graph as JSON.

Our more regular readers probably have heard about this on this very same blog about a year ago. However, at this time, we built the graph of packages that are top-level, and only them. As it turns out, this is not sufficient for a complete analysis; what we need is the entire graph of derivations, even the ones not accessible at the top-level.

So now, the idea is to recursively go down pkgs, find derivations, transform them into a JSON-like attribute set, but also keep recursing into its dependencies.

For the curious, below is a small snippet demonstrating that idea. Don’t worry if it looks intimidating, you don’t need to understand it to read this blog post.

{ nixpkgs, system }:
let
  # recurse :: string -> any -> attrs | null
  recurse =
    name: value:
      if !(nixpkgs.lib.isDerivation value) then
        # recurse when possible
        if (value.recurseForDerivations or false)
        then
          builtins.mapAttrs recurse value
        else
          null
      # process derivation and recurse into inputs
      else
        {
          # path to the evaluated derivation file
          derivationPath = value.drvPath;
          # one could add any information

          # recurse into inputs
          buildInputs = map (recurse null) (value.buildInputs or [ ]);
          # do the same for other inputs...
        }
    ;
in
  # build the value that will be yielded
  map
    # go through a bit of processing to not have the children nodes written in the node
    (drv: {
      inherit (drv) derivationPath;
      buildInputs = map (inputDrv: inputDrv.derivationPath) (drv.buildInputs or []);
    })
    # process recursively
    (nixpkgs.lib.collect
      (x: (x.derivationPath or null) != null)
      (builtins.mapAttrs recurse nixpkgs.legacyPackages.${system})
    )

If you would like to learn more about it, you can check out the complete expression we used: nixpkgs-graph-explorer/core/extract/nixpkgs-graph.nix1. It handles the many tricks of evaluating Nixpkgs.

The idea would then be to evaluate it and get the output data.

$ nix eval -f nixpkgs-to-graph.nix --json

Unfortunately, that dream quickly came to an end when recursing over the entire Nixpkgs. As much as we tried to tailor an expression to recursively describe, Nix kept filling the RAM and made our laptops crash.

But if one Nix can’t do it all, maybe many Nix-s can!

Towards nixtract

nixtract uses not one, but two Nix expression.

The first one focuses on finding all the top-level derivations that can be directly accessed from pkgs. It differs from the snippet shown earlier in that

  • It does not recurse on the inputs of the derivation,
  • It outputs nothing more than the attribute path and the output path,
  • The output is streamed rather than batched.
$ TARGET_FLAKE_REF="nixpkgs" TARGET_SYSTEM="x86_64-linux" nix eval --json --file ./nixtract/find-attribute-paths.nix
trace: {"foundDrvs":[{"attributePath":"AMB-plugins.out","derivationPath":"/nix/store/p060rxjqna1pnmg63yz6j7ya1as1wk4k-AMB-plugins-0.8.1.drv","outputPath":"/nix/store/z15irhv9lbr3f86fgifzbhrnga2m42ji-AMB-plugins-0.8.1"}]}
trace: {"foundDrvs":[{"attributePath":"ArchiSteamFarm.out","derivationPath":"/nix/store/s57c44cqwn7sw7vc7b7zx0na4snvkh9c-archisteamfarm-5.4.4.5.drv","outputPath":"/nix/store/ka2wsngqkblcbl3awbcmhlpxllrddif4-archisteamfarm-5.4.4.5"}]}
...

The second one focuses on following an attribute path and “describing” the derivation.

$ TARGET_FLAKE_REF="nixpkgs" TARGET_SYSTEM="x86_64-linux" TARGET_ATTRIBUTE_PATH="hello" nix eval --json --file ./nixtract/describe-derivation.nix
{"attributePath":"hello","buildInputs":[],"derivationPath":"/nix/store/r91xxmyayj90xlr0378r817ly09khpxg-hello-2.12.1.drv","name":"hello-2.12.1","nixpkgsMetadata":{"broken":false,"license":"GNU General Public License v3.0 or later","pname":"hello","version":"2.12.1"},"outputPath":"/nix/store/7syg3lif5ik17dsrwgk3s00s116q87by-hello-2.12.1","outputs":[{"name":"out","outputPath":"/nix/store/7syg3lif5ik17dsrwgk3s00s116q87by-hello-2.12.1"}],"parsedName":{"name":"hello","version":"2.12.1"}

In Python, we make a queue of attribute paths that we need to “describe”, fed by the first expression ran in a process, and we process each of them using the second expression ran in other processes.

But the first expression is not the only one contributing to the queue. When the second expression, the one that “describes”, visits a derivation that has inputs, these input derivations will be added by the Python script to the queue if they haven’t already been added. This ensures that not only top level derivations are described, but all the derivations in the closure of each visited derivation are as well.

Instead of recursing everything at once in one Nix process that ends up gobbling up all our RAM, we spawn many small Nix processes that don’t explode. An extra benefit is that we can now parallelize the extraction and get the data faster.

and voilà!

All in all, you only use a single command line to start the magic:

$ nixtract -
{"attribute_path": "AMB-plugins.out", "derivation_path": "/nix/store/90bclfvqnc0mzn6vsd99bzl383vl2j04-AMB-plugins-0.8.1.drv", "output_path": "/nix/store/7n05ir19xx0086dk1wq1zm1gq8g1ymyr-AMB-plugins-0.8.1", "outputs": [{"name": "out", "output_path": "/nix/store/7n05ir19xx0086dk1wq1zm1gq8g1ymyr-AMB-plugins-0.8.1"}], "name": "AMB-plugins-0.8.1", "parsed_name": {"name": "AMB-plugins", "version": "0.8.1"}, "nixpkgs_metadata": {"pname": "AMB-plugins", "version": "0.8.1", "broken": false, "license": "GNU General Public License v2.0 or later"}, "build_inputs": [{"attribute_path": "AMB-plugins.out.buildInputs.0", "build_input_type": "build_input", "output_path": "/nix/store/ib5bh4b9fl40zamlx0zpfgw8ygaj01l5-ladspa.h-1.15"}]}
...

{"attribute_path": "CHOWTapeModel.out.buildInputs.2", "derivation_path": "/nix/store/pvdzfr30s1xc8qcvavmmkj1j51lpnifp-curl-8.3.0.drv", "output_path": "/nix/store/r4638iv5a646jz14rl02r6r9cqhggaky-curl-8.3.0-dev", "outputs": [{"name": "bin", "output_path": "/nix/store/yiip8b2ks99ixn5jf9jlx0bsxl5r2h2m-curl-8.3.0-bin"}, {"name": "dev", "output_path": "/nix/store/r4638iv5a646jz14rl02r6r9cqhggaky-curl-8.3.0-dev"}, {"name": "out", "output_path": "/nix/store/bhmynyjwzc2r6iqf7fhc3yzjcv3paiwa-curl-8.3.0"}, {"name": "man", "output_path": "/nix/store/c19b760rmwv0j57bgscxmic9qkylihzg-curl-8.3.0-man"}, {"name": "devdoc", "output_path": "/nix/store/ad15cgxydfspm6sqda4plrv2vf2kfib9-curl-8.3.0-devdoc"}, {"name": "debug", "output_path": "/nix/store/bv42bixmsj38cdfdrf08760pxc5z03pq-curl-8.3.0-debug"}], "name": "curl-8.3.0", "parsed_name": {"name": "curl", "version": "8.3.0"}, "nixpkgs_metadata": {"pname": "curl", "version": "8.3.0", "broken": false, "license": "curl License"}, "build_inputs": [{"attribute_path": "CHOWTapeModel.out.buildInputs.2.nativeBuildInputs.0", "build_input_type": "native_build_input", "output_path": "/nix/store/5daca24rn22c65ff25lc6z0g0imfphvr-pkg-config-wrapper-0.29.2"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.nativeBuildInputs.1", "build_input_type": "native_build_input", "output_path": "/nix/store/2j7b1ngdvqd0bidb6bn9icskwm6sq63v-perl-5.38.0"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.0", "build_input_type": "propagated_build_input", "output_path": "/nix/store/xz08admjq3viprgxnb6gc9xvbbhdxq7h-brotli-1.1.0-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.1", "build_input_type": "propagated_build_input", "output_path": "/nix/store/s64na02z7fd3q6d04br9ivbdrqcmx5vj-libkrb5-1.20.1-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.2", "build_input_type": "propagated_build_input", "output_path": "/nix/store/46dnqv0sjm5g1sgd4r0rqp50sv7wr1ma-nghttp2-1.54.0-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.3", "build_input_type": "propagated_build_input", "output_path": "/nix/store/84d89lbdfdzy1wb363k3avl8qb8i6lcb-libidn2-2.3.4-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.4", "build_input_type": "propagated_build_input", "output_path": "/nix/store/lqsfdc4p9vax2h55csza9iw46zjpghl6-openssl-3.0.10-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.5", "build_input_type": "propagated_build_input", "output_path": "/nix/store/ifvv1iq1fq7j3z7ykz997gnba8a2yy5m-libssh2-1.11.0-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.6", "build_input_type": "propagated_build_input", "output_path": "/nix/store/9m8fx7phv5gr67b2yd5a43p287hg10g0-zlib-1.3-dev"}, {"attribute_path": "CHOWTapeModel.out.buildInputs.2.propagatedBuildInputs.7", "build_input_type": "propagated_build_input", "output_path": "/nix/store/zgflsh0x6c8lka6y0n2diiph6ahiglxh-zstd-1.5.5-dev"}]}
...

This command outputs the information about all derivations, top-level or any dependency, from a flake (by default nixpkgs from your flake registry).

What’s more, our processing is threaded in order to stream the result, to make sure that the user does not have to wait for everything to be processed before being able to start consuming the data. This is really convenient to pipe with other tools like jq:

$ nixtract - | jq -r '.derivation_path'
/nix/store/90bclfvqnc0mzn6vsd99bzl383vl2j04-AMB-plugins-0.8.1.drv
/nix/store/vwz2w3arkprzxc28d0f621hbv8gq468f-ArchiSteamFarm-5.4.9.3.drv
...

Not only can you extract the data from the nixpkgs flake for your current system, but you can extract the data from any flake and for any system:

$ nixtract --target-flake-ref 'github:nixos/nixpkgs/23.05' --target-system 'x86_64-linux' -
{"attribute_path": "AMB-plugins.out", "derivation_path": "/nix/store/ddw13azl05jp816s57zzkqlf7qa2c7xg-AMB-plugins-0.8.1.drv", "output_path": "/nix/store/sbkj5l8ynhywr1fqszxr91g043gvxjc1-AMB-plugins-0.8.1", "outputs": [{"name": "out", "output_path": "/nix/store/sbkj5l8ynhywr1fqszxr91g043gvxjc1-AMB-plugins-0.8.1"}], "name": "AMB-plugins-0.8.1", "parsed_name": {"name": "AMB-plugins", "version": "0.8.1"}, "nixpkgs_metadata": {"pname": "AMB-plugins", "version": "0.8.1", "broken": false, "license": "GNU General Public License v2.0 or later"}, "build_inputs": [{"attribute_path": "AMB-plugins.out.buildInputs.0", "build_input_type": "build_input", "output_path": "/nix/store/wkxygwfn22g8fjs6alq9j1x2jiaw679k-ladspa.h-1.15"}]}
{"attribute_path": "AMB-plugins.out.buildInputs.0", "derivation_path": "/nix/store/3j3lzndm8ky97hc59s3bfdf6ln38wg2h-ladspa.h-1.15.drv", "output_path": "/nix/store/wkxygwfn22g8fjs6alq9j1x2jiaw679k-ladspa.h-1.15", "outputs": [{"name": "out", "output_path": "/nix/store/wkxygwfn22g8fjs6alq9j1x2jiaw679k-ladspa.h-1.15"}], "name": "ladspa.h-1.15", "parsed_name": {"name": "ladspa.h", "version": "1.15"}, "nixpkgs_metadata": {"pname": "ladspa.h", "version": "1.15", "broken": false, "license": "GNU Library General Public License v2"}, "build_inputs": []}
...

This is just the beginning

We have released nixtract 0.1.0, check out the README and try it now!

The data extracted by nixtract is somewhat basic, but you can expect it to extract more metadata from derivations in the future.

nixtract focuses on extracting the data, which is hardly enough. We hope to combine nixtract with other tools so that anyone is able to scan the whole Nixpkgs graph as they need. It would also be exciting to combine this source of data with others, like binary caches and CVE databases, to enable even more use cases.

We hope that nixtract becomes a foundational brick that sparks new efforts towards better understanding Nixpkgs and improving Software Supply Chain management.


  1. As I am writing this blog post, a coworker has pointed out to me that we basically ended up reimplementing nix-eval-jobs here. We could consider using it directly.
About the authors
Guillaume DesforgesGuillaume is a versatile engineer based in Paris, with fluency in machine learning, data engineering, web development and functional programming.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

Company

AboutOpen SourceCareersContact Us

Connect with us

© 2024 Modus Create, LLC

Privacy PolicySitemap