Assumptions for Liquid Haskell in the large

22 June 2023 — by Facundo Domínguez

For a while now, Tweag has committed in improving various aspects of Liquid Haskell (LH), a tool that gives the Haskell programmer both the ability to express properties about programs and to verify that they meet these expectations.

In this post we present a specific improvement that I integrated recently, which cuts down the maintenance cost to use LH when introducing assumptions about functions coming from large or multiple packages.

Problem: How to reason about dependencies

Let us suppose that we are trying to verify that the following function yields a non-empty list.

module M where

{-@ consPlus1 :: Int -> [Int] -> { v:[Int] | len v > 0 } @-}
consPlus1 :: Int -> [Int] -> [Int]
consPlus1 x xs = map (+1) (x:xs)

The definition starts with a comment dedicated to LH, delimited by {-@ ... @-}. It contains a type signature for consPlus1 with refinement types. A refinement type denotes a subtype of another type, characterized by a given predicate, like len v > 0. In this case the predicate of the result uses a function len that yields the length of a list in the logic. This predicate states that consPlus1 yields a list with at least one element.

This function depends on the map function from the base library. LH can infer that consPlus1 yields a non-empty list, if it knows that map yields a non-empty list. Proving that map yields a non-empty list requires inspecting its definition. However, the definition of map is not in the current package, it comes from the base dependency.

One option to reach a complete proof would be to add LH annotations to the base package, proving there that map yields non-empty lists when fed non-empty lists. This would require either convincing the maintainers of base to add these annotations, or creating a copy of base with the LH annotations and proofs that we need. Both options are problematic.

If the maintainers of base accept our LH annotations, they will be expected to keep them up to date, as base and LH evolve. Moreover, they will need to figure out the most relevant set of annotations and proofs worth maintaining among user inquiries.

If we copy the base package and our project starts depending on a modified dependency, we will be responsible for updating our copy when the original base package is modified. Moreover, if other projects start using their own copy of base, then we will end up with a bunch of copies with different annotations each, which then makes composing the projects a challenge.

The rest of the post is devoted to explain what LH did initially to improve upon this situation, and how I further improved it later on.

Idea: Separating annotations and definitions

Proving that map yields some property of interest does often require its definition. But if we are satisfied with just assuming that the property holds without an actual proof, then we could tell so to LH at the place where we need to use such an assumption, without any need to modify the dependency itself.

Here’s how we could write our assumption about map.

{-@ assume map :: (a -> b) -> xs:[a] -> { v:[a] | len xs = len v } @-}

The assumption starts with the assume keyword, indicating that the predicate in the return type is not to be proved. Then follows an assumed type signature for map with refinement types. The parameters of the function can have a name, like xs, that can be used in the predicates of later arguments and in the result. The predicate of the return type states that the input and the output have the same length.

This works fine if we need to use the assumption only once, but if the assumption is needed at multiple places, it is inconvenient to copy the assumption at each usage site.

Could we put the assumption at some place where LH could find it when needed? This is a problem similar to the one in Python, where authors might want to add type signatures to the interfaces of preexisting libraries in dependencies. The first answer to this question was known as wrapper packages and they were introduced at the time LH became a GHC plugin in 2020.

Old solution: Wrapper packages

A wrapper package is a package that reexports all the definitions from an existing package, and adds assumptions about them. In the case of base, we would have a wrapper package liquid-base that contained modules using the same names of modules in base. A minimal wrapper package for our running example would be this:

{-# LANGUAGE QualifiedImports #-}
module GHC.Base(module X) where
import "base" GHC.Base as X

{-@ assume map :: (a -> b) -> { xs:[a] | len xs > 0 } -> { v:[a] | len v > 0 } @-}

The module liquid-base:GHC.Base is a drop-in replacement for base:GHC.Base that can introduce assumptions about functions in GHC.Base. More generally, the liquid-base package is a drop-in replacement for the base package, because it would define similar reexporting modules for every module in base.

When we are verifying a program that depends on base, we remove base from the dependencies and use liquid-base instead. What’s good about this approach is that we do not need to change the import declarations in our program to get the new assumptions, and the relevant assumptions are in scope at the call site of every function. Thus our dependencies in a hypothetical file our_package.cabal would be:

...
build-depends:
  liquid-base

A disadvantage of this approach, however, is that the programmer must manually track that the right version of liquid-base is used to replace base, and so on for every dependency on which assumptions need to be introduced.

Another disadvantage is that liquid-base must really reexport definitions from all modules in base even if we only need to add a single assumption for map. Otherwise, liquid-base could not be a replacement for base. This imposes a maintenance overhead that is proportional to the number of dependencies and their sizes. This overhead needs to be paid when first creating a wrapper package and then again when updating it to accommodate changes in the wrapped package.

Lastly, the wrapper packages pollute and obfuscate the dependencies of a project. A package that depends on base doesn’t show it explicitly since it depends instead on liquid-base. The reader has to know somehow or trust that liquid-base really exports the same Haskell definitions as base.

In order to address the above problems, I proposed and implemented an alternative mechanism to share assumptions.

New solution: Packages with assumptions

Ideally, LH would be able to find assumptions without defining wrapper packages and without changing the original source code.

Since LH already allowed to put assumptions anywhere, the remaining problem was how to make LH find these assumptions. The user could straightforwardly add import declarations of the modules containing the assumptions, but it would be preferable to avoid modifying the original source code as much as possible. The new mechanism relies on a naming convention for modules that hold assumptions.

If the module we are verifying imports GHC.Base, then with the new mechanism LH will look for assumptions in a module named GHC.Base_LHAssumptions, if it exists.

module GHC.Base_LHAssumptions where
import GHC.Base
{-@ assume GHC.Base.map :: (a -> b) -> { xs:[a] | len xs > 0 } -> { v:[a] | len v > 0 } @-}

And so on for the rest of the imports: if module X is imported, then extra assumptions will be read from modules named X_LHAssumptions if they exists in any dependency of the current package.

We can thus provide a package liquid-base-assumptions, which essentially contains modules with assumptions. E.g.

GHC.Base_LHAssumptions
GHC.List_LHAssumptions
Prelude_LHAssumptions
...

Now, when we want to verify a package that depends on base, we add a dependency on liquid-base-assumptions to allow LH to find the _LHAssumptions modules that it contains. Dependencies could look as follows in our_package.cabal:

...
build-depends:
  base,
  liquid-base-assumptions

If a module doesn’t export a function that needs assumptions, there is no need to define a module _LHAssumptions, and certainly no need to define modules that just reexport definitions from elsewhere, which is precisely the overhead we wanted to avoid.

An additional simplification is that we can save the user the trouble of picking a version of the assumptions that matches the base version in use. This is because the version of base is mostly dictated by the version of GHC that the project uses, and LH knows which GHC version is in use when it is built. More generally, LH can guess the assumptions to use for every package whose version is tied to the compiler (i.e. it is a boot package). Because of this, the _LHAssumptions modules corresponding to boot packages now come included in the liquidhaskell package. Then the dependencies in our_package.cabal become again

...
build-depends:
  base

Final remarks

The mechanism based on the naming convention was integrated recently in LH and can be used from the latest version in the github repository. Because dozens of reexporting modules were eliminated, the time needed to build and test LH has nearly halved.

The implementation passes the test suite of LH, and now it is time to experiment a bit with larger examples. For instance, it is plausible that situations will arise where assumptions need to be modified, and provisions will need to be taken to inhibit the implicit loading of _LHAssumptions modules or override some of the imported assumptions. The solution in Python, which is similar to the one in LH, does anticipate cases like these.

Special thanks to Niki Vazou and Ranjit Jhala for helping me navigate the sometimes tricky design space to arrive at the current solution. Suggestions to improve the management of assumptions are welcome, so please, don’t hesitate to reach out and share your thoughts.

About the authors

Facundo DomínguezFacundo is a software engineer supporting development and research projects at Tweag. Prior to joining Tweag, he worked in academia and in industry, on a varied assortment of domains, with an overarching interest in programming languages.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Coverage-guided fuzzing of Haskell programs for cheap Packaging Topiary in OPAM →