libmamba vs classic#

The libmamba solver attempts to be a drop-in replacement for the classic solver; however, there are some differences which could not be avoided. These are the three primary reasons:

  • Fundamental differences between the underlying solver algorithms

  • Underlying implementation details

  • Conscious decisions made by developers to improve overall user experience

Should I use conda-libmamba-solver?#

Use conda-libmamba-solver if:

  • You want a faster solver with low-memory footprint.

  • Some of your environments do not solve quick enough, or at all, with classic.

  • You are okay with slightly different (but metadata-compliant) solutions.

Note

Users most often find alternative solutions surprising when they request packages with very few restraints. If the given solution is not fully satisfying, try to restrict your request a bit more.

For example, if you run conda install scipy and do not get the latest version, try using a more explicit command: conda install scipy=X.Y.

The classic solver could be important to you if:

  • Backwards compatibility is important (i.e. environments must be solved exactly as they always have been)

  • The intentional deviations from the classic solver (see below) are not acceptable and you prefer the old behavior

These reasons could be especially important if you continue to use long lived environments that were initially created with the classic solver.

Important

You can always use --solver=classic to re-enable the classic solver temporarily for specific operations, even after setting libmamba as default.

Intentional deviations from classic#

With the release of conda-libmamba-solver, we took the opportunity to improve some aspects of the solver experience that were not possible to change in classic due to backwards compatibility restraints. The main ones are:

  • conda-libmamba-solver does not use current_repodata.json by default. Instead, it always uses the full repodata.json files.

  • conda-libmamba-solver does not retry with --freeze-installed by default. Instead, it has a tighter retry logic that progressively relaxes the constraints on the conflicting packages.

  • conda-libmamba-solver does not allow the user to override the configured pinned specs by specifying incompatible constraints in the CLI. Instead, it will error early. To override pinned specs, it needs to be done explicitly in the relevant configuration file(s) (e.g. temporarily commenting out the pin spec, or modifying the pin for a more recent version). Note that compatible CLI specs are still allowed, and will be used to select the best solution. For example, having a pinned spec for python=3.8 will not prevent you from requesting python=3.8.10, but python=3.9 will be rejected.

  • conda-libmamba-solver provides a way to hard-lock a given package to its currently installed version. To do so, specify only the name of the package as a pinned spec. Once installed, the solver will prevent any modifications to the package. Use with care, since this can be a source of conflicts. Adequately constrained pins are a more flexible alternative.

Technical differences between libmamba and classic#

We know conda-libmamba-solver brings a faster solver to conda, but why is that? And, why couldn’t the classic solver just become faster?

Note

The following sections provide deeper technical details about the reasons, both at the implementation and algorithmic level.

If you don’t care about that much detail, just know that:

  • Deep within, both classic and conda-libmamba rely on C-based code to solve the SAT problem. However, classic uses Python objects to define and manage the SAT clauses, which incurs a large overhead.

    libmamba lets libsolv do the heavy lifting, operating in C++ and C, respectively. conda-libmamba-solver tries to delegate to the libmamba and libsolv compiled libraries as soon as possible to minimize the Python overhead.

  • classic has a more involved retry-logic than can incur in more time-consuming solver attempts, especially for existing environments.

  • Both options use SAT solvers, but they invoke them differently. classic uses a multistep, multi-objective optimization scheme, which resembles a global optimization scheme.

    libsolv opts for a backtracking alternative, closer to a local optimization scheme. This can result in libmamba choosing a different member of the whole solution ensemble.

Implementation differences#

Let’s first analyze how both solvers are implemented.

The classic solver logic is distributed across several abstraction layers in conda.

  • conda.cli.install:

    This module contains the base implementation for conda [env] install|remove|update|create. It eventually delegates to the Solver class, after some preparation tasks. This module can run up to 4 solver attempts by default: use current_repodata.json first, or retry with repodata.json, and with and without the --freeze-installed flag.

  • conda.core.solve.Solver:

    This class provides a three-function API that interfaces with the Transaction system. Almost of all the logic falls under the Solver.solve_final_state() method.

    At this step, classic downloads the channels metadata, collects information about the target environment and applies the command-line instructions provided by the user.

    The end result is a list of MatchSpec objects; in other words, a list of constraints that underlying solver must use to best select the needed packages from the channels.

  • conda.resolve.Resolve:

    This class receives the MatchSpec instructions from the higher level Solver class and transforms them into SAT clauses, as implemented in the conda.common.logic.Clauses and conda.common._logic.Clauses classes.

    Resolve.solve() is the method that governs the algorithmic details of “solving the environment”.

  • conda.common._logic._SatSolver:

    Provides the parent class for all three SAT solver wrappers implemented as part of the classic logic (PycoSat, PyCryptoSat, PySat).

    The default one is PycoSat, but you can change it with the sat_solver option in your configuration.

  • conda.common._logic._PycoSatSolver:

    This class wraps the pycosat bindings to picosat, the underlying C library that actually solves the SAT clauses problem.

For conda-libmamba-solver, we initially tried to provide an implementation at the _SatSolver level, but libsolv (and hence libmamba) didn’t expose a SAT-based API. We ended up with an implementation a bit higher up in the abstraction tree:

  • conda.cli.install:

    We always ignore current_repodata.json and implement the --freeze-installed attempts closer to the solver so we don’t have to re-run the preparation steps.

  • conda_libmamba_solver.solver.LibmambaSolver:

    A conda.core.solve.Solver subclass that completely replaces the Solver.solve_final_state() method. We used this opportunity to refactor some of the pre-solver logic (spread across different layers in classic) into a solver-agnostic module (conda_libmamba_solver.state) with nicer-to-work-with helper objects. Our subclass instantiates the libmamba objects.

  • libmamba.api.Solver:

    The libmamba.api Python module is generated by pybind11 bindings to the underlying libmamba C++ library. Some of the objects we rely on are api.Solver (interfaces with libsolv), api.Context (reimplementation of conda.base.context) and the api.{Pool,Repo} stack (handles the channel metadata and target environment state).

  • libsolv:

    libmamba relies on this C project directly to handle the solving steps. The conda-specific logic is implemented in the conda.c file.

The implementation details reveal some of the reasons for the performance differences:

  • classic uses many Python layers before it finally reaches the compiled code (picosat):

    • Tens of MatchSpec objects reflect the input state: installed packages, system constraints and user-requested packages

    • The channel index (repodata files) results in tens of thousands of PackageRecord objects

    • The SAT clauses end up being expressed as tens or hundreds of thousands of logic.Clauses and _logic.Clauses objects.

    • The optimization algorithm in Resolve.solve() invokes picosat several times, switching between Python and C contexts very often, recreating Clauses as necessary.

  • conda-libmamba-solver, in contrast, switches to C++ pretty early in the abstraction tree.

Algorithmic details#

Retry logic#

classic tries hard to minimally modify your environment, so by default, the flag --freeze-installed will be applied.

This means all your installed packages will be constrained to their current installed version. If the SAT solver couldn’t find a solution, then classic will analyze which packages are causing the conflict.

If the conflicting packages were not explicitly requested by the user (in the current or previous operations in the target environment), their version constraint will be relaxed and a new solving attempt will be made.

If, despite the progressive constraint relaxation, the SAT solver cannot find a solution, the Solver class will raise an exception to the conda.cli.install module. This will trigger a second round of attempts, without --freeze-installed. In simplified Python:

for repodata in ("current_repodata.json", "repodata.json"):
    solver = Solver(repodata_fn=repodata)
    for should_freeze in (True, False):
        success = solver.solve(freeze_installed=True)
        if success:
            break
else:
    raise SolverError()


class Solver:
    "Super simplified version. Actual implementation is spread across many layers"

    def solve(self, *args, **kwargs):
        index = download_channel(channels, repodata_fn)
        constraints = collect_metadata(target_environment, user_requested_packages)
        while True:
            sat_solver = SATSolver(index)
            clauses = sat_solver.build_clauses(constraints)  # expensive!
            success = sat_solver.solve(clauses)  # multi-step optimization
            if success:
                return True
            else:
                conflicts = sat_solver.find_conflicts()
                initial_constraints = constraints.copy()
                constraints.update(conflicts)
                if initial_constraints == constraints:
                    return False

A similar retry logic is implemented in conda_libmamba_solver, but libsolv gives us the conflicting packages as part of the solving attempt result for free, which allows us to iterate faster.

We don’t need a separate attempt to disable --freeze-installed because our retry logic handles conflicts and frozen packages in the same way.

Additionally, this retry logic can also be disabled or reduced with an environment variable for extreme cases (very large environments). We also ignore current_repodata.json altogether. All of these changes make the overall logic simpler and faster, which compounds on top of the lightning-fast libmamba implementation.

SAT algorithms#

Given a set of MatchSpec objects, classic will apply a multistep, multi-objective optimization strategy that invokes the actual SAT solver several times:

  • conda.resolve.Resolve.solve() will optimize several objective metrics. In no particular order, some of these rules are:

    • Maximize the versions and build numbers of required packages

    • Minimize the number of track_features

    • Prefer non-noarch over noarch if both are available

    • Minimize the number of necessary upgrades and/or removals

  • conda.common._logic.Clauses.minimize():

    This is used for each step above, and involves a series of SAT calls per minimization. All of these calls involve, at some point, passing Python objects over to the C context, which incurs some overhead.

In contrast, libmamba delegates fully to libsolv, which has its own logic for conda-specific problems.

You can read more about it in the mamba-org/mamba documentation, but the most important part:

  • libsolv is a backtracking SAT solver, inspired by minisat. This means that it explores “branches” of a solution until it finds one that satisfies the input constraints.

    If we understand classic’s approach as a global-like optimization strategy, and one could say libsolv’s better resembles a local optimization approach.

  • This means that in the presence of several compatible solutions, libsolv might choose one that is different to the one proposed by classic.

Note

Tip Large conda-forge migrations often rely on multiple coexisting build variants to ease the transition (e.g. openssl v1 to v3). This introduces several alternative branches libsolv can end up exploring and selecting, perhaps with surprising results.

Being more explicit about the requested packages usually helps get obtaining the expected solution; e.g. if you want to install scipy=1.0 (the latest version), express that explicitly: conda install scipy=1.0 instead of conda install scipy.

Index reduction#

classic prunes the channel metadata (internally referred to as the “index”) in every Resolve.solve() call. This reduces the search space by excluding packages that won’t ever be needed by the current set of input constraints. Conversely, this performance optimization step can longer and longer the larger the index gets.

In libsolv, pruning is part of the filtering, sorting and selection mechanism that informs the solver (see policy.c and selection.c). It runs in C, using memory-efficient data structures.

IO differences#

conda-libmamba-solver uses the same IO stack as conda classic. In the past, we relied on libmamba’s IO for repodata fetching, but this is not the case anymore.

Practical examples of solver differences#

Python 3.11 + very old Pydantic#

Case study inspired by issue #115

The following environment file will give different solutions with classic and conda-libmamba-solver.

name: gmso
channels:
  - conda-forge
dependencies:
  - numpy
  - sympy
  - unyt <=2.8
  - boltons
  - lxml
  - pydantic <1.9.0
  - networkx
  - ele >=0.2.0
  - forcefield-utilities
  • classic: python 3.10 + pydantic 1.8.2

  • conda-libmamba-solver: python 3.11 + pydantic 0.18.2

This is an example of an underspecified input. There’s no python dependency (or version) listed in the environment file, so the solver has to figure it out. The solver doesn’t necessarily know which dependency is more “important”. classic will prioritize getting a more recent pydantic at the expense of an older python, and conda-libmamba-solver will prefer having python 3.11, even if it means going all the way down to pydantic 0.18.2 (which was packaged as noarch) and thus compatible with any Python version.

cudatoolkit present in a cpuonly environment#

Originally reported in issue #131

This is an example of a known limitation in how libsolv processes the track_features metadata. libsolv will only “see” the first level of track_features, which down-prioritize packages. If you depend on 2nd-order dependencies to track prioritized variants (which conda classic successfully processes), you will get mixed results. This can be solved at the packaging level, where all the variants rely on the package mutex directly, instead of relying on packages that depend on the mutex.

More information#

If you want to read (even more) about this, please check the following resources: