Tweag

Source filtering with file sets

28 November 2023 — by Silvan Mosberger

Sponsored by Antithesis (distributed systems reliability testing experts), I’ve developed a new library to filter local files in Nix which I’d like to introduce!

This post requires some familiarity with Nix and its language. So if you don’t know what Nix is yet, take a look first, it’s pretty neat.

In this post we’re going to look at what source filtering is, why it’s useful, why a new library was needed for it, and the basics of the new library.

This post notably won’t really teach you a lot about the new library, that’s what the official tutorial and the reference documentation is for. But if you’d like to know some background and motivation, please read on!

Why filter sources

You most likely have come across this pattern:

stdenv.mkDerivation {
  src = ./.;
  # stuff..
}

This is the basis for a Nix expression to build a project in a local directory. There’s a lot of magic to make this work, but we’ll focus on just a few aspects:

  • Bar some exceptions, attributes passed to stdenv.mkDerivation are automatically turned into environment variables that are available to the derivation builder.
  • The relative path expression ./. is turned into a string of the form "/nix/store/<hash>-<name>" by hashing the contents of the directory that Nix file is in, and adding it to the Nix store.

We then end up with a store derivation whose environment variables include src = "/nix/store/<hash>-<name>". To get more info on how all of this works, see the documentation on derivations.

This generally does work, with the big caveat that all files in the local directory are copied into the Nix store and become a dependency of the derivation. This means that:

  • Changing any file will cause the resulting derivation to change, making Nix unable to reuse previous build results.

    For example, if you just format your Nix files, Nix will have to build the project’s derivation again!

  • If you have secret files in the local directory, they get added to the Nix store too, making them readable by any user on the system!

    Note If you use the experimental new nix CLI with Flakes and Git in a current (2.18.1) version of Nix, only files tracked by Git will be available to Nix, so this generally won’t be a problem.

    But be careful: If you don’t use Git, the entire directory is always copied to the Nix store! Furthermore, experimental features may change over time.

The hardcore way to filter sources

To address this, Nix comes with builtins.path, which allows controlling how paths get added to the Nix store:

builtins.path {
  # The path to add to the store
  path = ./.;
  # A function determining whether to include specific paths under ./.
  filter =
    # A path under ./.
    path:
    # The path type, either "directory", "normal", "symlink" or "unknown"
    type:
    # The return value of this function indicates
    # whether the path should be included or not.
    # In this case we always return true, meaning everything is included
    true;
}

While this interface looks straightforward, it’s notoriously tricky to get it to do what you want.

Let’s give it a try and start with something trivial, say only including a single file in the current directory:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == ./file;
}
$ touch file

$ tree "$(nix eval --raw -f.)"
/nix/store/dg5zq00kxabc3lfg03bnrfwax1ndgn6s-filter

0 directories, 0 files

It doesn’t work, we just get an empty directory! The problem here is that the filter function is not called with path values but rather strings, and the == operator always returns false when given two values of different types.

To fix this, we can use the builtin toString function, which converts path values to strings:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == toString ./file;
}
$ tree "$(nix eval --raw -f.)"
/nix/store/2myzf03ca2ch3lc40p7frvqqbvm5nm2m-filter
└── file

1 directory, 1 file

Great, this works! Now let’s try the same for dir/file:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == toString ./dir/file;
}
$ mkdir dir && touch dir/file

$ tree "$(nix eval --raw -f.)"
/nix/store/dg5zq00kxabc3lfg03bnrfwax1ndgn6s-filter

0 directories, 0 files

Apparently this doesn’t work for nested directories.

The problem now is that the filter function first gets called on dir itself. And because ./dir != ./dir/file, it returns false, therefore excluding dir entirely.

To fix this we need to make sure the function recurses into the directories, which we can do by checking for type == "directory":

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    # Return true for all directories
    type == "directory"
    # But also for the file we want to include
    || path == toString ./dir/file;
}

But what if there is another directory we don’t care about?

$ mkdir another-dir

$ tree $(nix eval --raw -f.)
/nix/store/wj0y4f4x5llzz8kj48jd0gvszddp3jr0-filter
├── another-dir
└── dir
    └── file

3 directories, 1 file

This worked, dir/file is there. But so is the another-dir, although it doesn’t even contain any files!

We could go on like that, but I think you get the gist: This function is tricky to use!

Introducing the file set library

Let’s compare this to the new file set library:

{ lib ? import (fetchTarball "channel:nixos-23.11" + "/lib") }:
lib.fileset.toSource {
  root = ./.;
  fileset = ./dir/file;
}
$ tree $(nix eval --raw -f. outPath)
/nix/store/csgp388b3zqxp2av01gjncy9sadxib9q-source
└── dir
    └── file

2 directories, 1 file

This is much more straightforward and does what you’d expect. But where is the file set here?

The key here is that when the file set library expects to get a file set, but gets a path, it implicitly turns the path into a file set. So to the library, ./dir/file is a file set containing just the single file ./dir/file, while ./dir would be a file set containing all files in the directory ./dir.

The real power of this library however comes from the fact that file sets behave just like mathematical sets, and it comes with some core functions to support that:

Some other notable features are:

  • Files are never added to the store unless explicitly requested with lib.fileset.toSource.
  • Maximum laziness: Directories are never recursed into more than necessary.
  • Actionable error messages in case something doesn’t look right.
  • Minimal assumptions: The library only relies on stable Nix features. It even works correctly with possible future changes to the behavior of Nix paths.

But this is not the best place to teach you about the library. For that, head over to the official tutorial instead, or check out the reference documentation!

The file set library is going to be included in the upcoming NixOS 23.11 release. If you encounter any problems using it or are missing some feature, let me know in this tracking issue.

Comparison

For completeness, we also need to look at previous related efforts and see how they compare to this library:

  • builtins.fetchGit ./. allows creating a store path from all Git-tracked files, so it’s very similar to lib.fileset.gitTracked. However, it’s tricky to further restrict or extend the set selected files, since the above filter-based approach wouldn’t work on store paths without some changes.

  • lib.sources.cleanSourceWith is a simple wrapper around builtins.path. While it has the same filter-based interface, it improves over builtins.path by being chainable, allowing the set of included files to be further restricted.

  • lib.sources.cleanSource uses cleanSourceWith underneath to set filter to a reasonable default, filtering out some of the most common unwanted files automatically. The file set library doesn’t yet have a good replacement for this, but there is lib.fileset.fromSource, which you can use to convert any lib.sources-based value to a file set.

  • lib.sources.sourceByRegex and lib.sources.sourceFilesBySuffices are also functions built on top of cleanSourceWith, and as such can be chained with each other. While sourceFilesBySuffices is not bad, the interface of sourceByRegex is rather clunky and error-prone. Furthermore, it’s hard to add more files to the result.

  • gitignore.nix and pkgs.nix-gitignore allow you to filter files based on Git’s .gitignore files, which is very related to Git-tracked files. The file set library doesn’t replace these functions, but it can be used as a more composable foundation.

  • nix-filter is a third-party lib.sources wrapper. It wraps it with a nicer interface, but suffers from some unclear semantics and composability issues. The file set library should serve as an improved replacement.

  • Source combinators was a previous attempt to create a composable interface for handling source files. It was a bit tricky to use and never merged, but this is in fact the work that inspired the new file set library!

Conclusion

We’ve seen that filtering sources can improve your Nix experience by avoiding unnecessary derivation rebuilds. While it was possible to filter sources before using the builtins.path function and other approaches, there are many pitfalls. The lib.fileset library in comparison makes source filtering a breeze.

In addition to a huge thanks to Antithesis as the main sponsor of this work, I’d also like to thank Robert Hensing from Hercules CI and Valentin Gagarin from Tweag for all the help they’ve given me during reviews!

About the authors
Silvan MosbergerSilvan is an active member of the Nix community, specializing in NixOS modules and deployment, Nix internals and state-of-the-art improvements. For writing actual programs he preferably uses Haskell, but can also use Bash if need be.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

Company

AboutOpen SourceCareersContact Us

Connect with us

© 2024 Modus Create, LLC

Privacy PolicySitemap