libmamba
vs classic
#
The libmamba
solver attempts to be a drop-in replacement for the classic
solver; however, there
are some differences which could not be avoided. These are the three primary reasons:
Fundamental differences between the underlying solver algorithms
Underlying implementation details
Conscious decisions made by developers to improve overall user experience
Should I use conda-libmamba-solver
?#
Use conda-libmamba-solver
if:
You want a faster solver with low-memory footprint.
Some of your environments do not solve quick enough, or at all, with
classic
.You are okay with slightly different (but metadata-compliant) solutions.
Note
Users most often find alternative solutions surprising when they request packages with very few restraints. If the given solution is not fully satisfying, try to restrict your request a bit more.
For example, if you run conda install scipy
and do not get the latest version, try using a more explicit command: conda install scipy=X.Y
.
The classic solver could be important to you if:
Backwards compatibility is important (i.e. environments must be solved exactly as they always have been)
The intentional deviations from the
classic
solver (see below) are not acceptable and you prefer the old behavior
These reasons could be especially important if you continue to use long lived environments that were initially created with the classic
solver.
Important
You can always use --solver=classic
to re-enable the classic
solver temporarily for specific operations, even after setting libmamba
as default.
Intentional deviations from classic
#
With the release of conda-libmamba-solver
, we took the opportunity to improve some aspects of the solver experience that were not possible to change in classic
due to backwards compatibility restraints. The main ones are:
conda-libmamba-solver
does not usecurrent_repodata.json
by default. Instead, it always uses the fullrepodata.json
files. You can still use it by setting--repodata-fn current_repodata.json
explicitly.conda-libmamba-solver
does not retry with--freeze-installed
by default. Instead, it has a tighter retry logic that progressively relaxes the constraints on the conflicting packages.conda-libmamba-solver
does not allow the user to override the configured pinned specs by specifying incompatible constraints in the CLI. Instead, it will error early. To override pinned specs, it needs to be done explicitly in the relevant configuration file(s) (e.g. temporarily commenting out the pin spec, or modifying the pin for a more recent version). Note that compatible CLI specs are still allowed, and will be used to select the best solution. For example, having a pinned spec forpython=3.8
will not prevent you from requestingpython=3.8.10
, butpython=3.9
will be rejected.conda-libmamba-solver
provides a way to hard-lock a given package to its currently installed version. To do so, specify only the name of the package as a pinned spec. Once installed, the solver will prevent any modifications to the package. Use with care, since this can be a source of conflicts. Adequately constrained pins are a more flexible alternative.
Technical differences between libmamba
and classic
#
We know conda-libmamba-solver
brings a faster solver to conda
, but why is that?
And, why couldn’t the classic
solver just become faster?
Note
The following sections provide deeper technical details about the reasons, both at the implementation and algorithmic level.
If you don’t care about that much detail, just know that:
Deep within, both
classic
andconda-libmamba
rely on C-based code to solve the SAT problem. However,classic
uses Python objects to define and manage the SAT clauses, which incurs a large overhead.libmamba
letslibsolv
do the heavy lifting, operating in C++ and C, respectively.conda-libmamba-solver
tries to delegate to thelibmamba
andlibsolv
compiled libraries as soon as possible to minimize the Python overhead.classic
has a more involved retry-logic than can incur in more time-consuming solver attempts, especially for existing environments.Both options use SAT solvers, but they invoke them differently.
classic
uses a multistep, multi-objective optimization scheme, which resembles a global optimization scheme.libsolv
opts for a backtracking alternative, closer to a local optimization scheme. This can result inlibmamba
choosing a different member of the whole solution ensemble.
Implementation differences#
Let’s first analyze how both solvers are implemented.
The classic
solver logic is distributed across several abstraction layers in conda
.
conda.cli.install
:This module contains the base implementation for
conda [env] install|remove|update|create
. It eventually delegates to theSolver
class, after some preparation tasks. This module can run up to 4 solver attempts by default: usecurrent_repodata.json
first, or retry withrepodata.json
, and with and without the--freeze-installed
flag.conda.core.solve.Solver
:This class provides a three-function API that interfaces with the
Transaction
system. Almost of all the logic falls under theSolver.solve_final_state()
method.At this step,
classic
downloads the channels metadata, collects information about the target environment and applies the command-line instructions provided by the user.The end result is a list of
MatchSpec
objects; in other words, a list of constraints that underlying solver must use to best select the needed packages from the channels.conda.resolve.Resolve
:This class receives the
MatchSpec
instructions from the higher levelSolver
class and transforms them into SAT clauses, as implemented in theconda.common.logic.Clauses
andconda.common._logic.Clauses
classes.Resolve.solve()
is the method that governs the algorithmic details of “solving the environment”.conda.common._logic._SatSolver
:Provides the parent class for all three SAT solver wrappers implemented as part of the
classic
logic (PycoSat, PyCryptoSat, PySat).The default one is
PycoSat
, but you can change it with thesat_solver
option in your configuration.conda.common._logic._PycoSatSolver
:This class wraps the
pycosat
bindings topicosat
, the underlying C library that actually solves the SAT clauses problem.
For conda-libmamba-solver
, we initially tried to provide an implementation at the _SatSolver
level,
but libsolv
(and hence libmamba
) didn’t expose a SAT-based API.
We ended up with an implementation a bit higher up in the abstraction tree:
conda.cli.install
:We always ignore
current_repodata.json
and implement the--freeze-installed
attempts closer to the solver so we don’t have to re-run the preparation steps.conda_libmamba_solver.solver.LibMambaSolver
:A
conda.core.solve.Solver
subclass that completely replaces theSolver.solve_final_state()
method. We used this opportunity to refactor some of the pre-solver logic (spread across different layers inclassic
) into a solver-agnostic module (conda_libmamba_solver.state
) with nicer-to-work-with helper objects. Our subclass instantiates thelibmamba
objects.libmambapy.Solver
:The
libmambapy
Python package is generated bypybind11
bindings to the underlyinglibmamba
C++ library. Some of the objects we rely on areSolver
(interfaces withlibsolv
), and theDatabase
object (handles the channel metadata and target environment state).libsolv
:libmamba
relies on this C project directly to handle the solving steps. The conda-specific logic is implemented in theconda.c
file.
The implementation details reveal some of the reasons for the performance differences:
classic
uses many Python layers before it finally reaches the compiled code (picosat
):Tens of
MatchSpec
objects reflect the input state: installed packages, system constraints and user-requested packagesThe channel index (repodata files) results in tens of thousands of
PackageRecord
objectsThe SAT clauses end up being expressed as tens or hundreds of thousands of
logic.Clauses
and_logic.Clauses
objects.The optimization algorithm in
Resolve.solve()
invokes picosat several times, switching between Python and C contexts very often, recreatingClauses
as necessary.
conda-libmamba-solver
, in contrast, switches to C++ pretty early in the abstraction tree.Only the
MatchSpec
objects are created in the Python layer, only to be immediately forwarded to its C++ counterparts.The SAT clauses are built and handled by
libsolv
, using a very memory-efficient approach based on 32-bit integers only. This allows the SAT problem to be treated in milliseconds.
Algorithmic details#
Retry logic#
classic
tries hard to minimally modify your environment, so by default, the flag --freeze-installed
will be applied.
This means all your installed packages will be constrained to their current installed version.
If the SAT solver couldn’t find a solution, then classic
will analyze which packages are causing the conflict.
If the conflicting packages were not explicitly requested by the user (in the current or previous operations in the target environment), their version constraint will be relaxed and a new solving attempt will be made.
If, despite the progressive constraint relaxation, the SAT solver cannot find a solution, the Solver
class will raise an exception to the conda.cli.install
module.
This will trigger a second round of attempts, without --freeze-installed
.
In simplified Python:
for repodata in ("current_repodata.json", "repodata.json"):
solver = Solver(repodata_fn=repodata)
for should_freeze in (True, False):
success = solver.solve(freeze_installed=True)
if success:
break
else:
raise SolverError()
class Solver:
"Super simplified version. Actual implementation is spread across many layers"
def solve(self, *args, **kwargs):
index = download_channel(channels, repodata_fn)
constraints = collect_metadata(target_environment, user_requested_packages)
while True:
sat_solver = SATSolver(index)
clauses = sat_solver.build_clauses(constraints) # expensive!
success = sat_solver.solve(clauses) # multi-step optimization
if success:
return True
else:
conflicts = sat_solver.find_conflicts()
initial_constraints = constraints.copy()
constraints.update(conflicts)
if initial_constraints == constraints:
return False
A similar retry logic is implemented in conda_libmamba_solver
, but libsolv
gives us the conflicting packages as part of the solving attempt result for
free, which allows us to iterate faster.
We don’t need a separate attempt to disable --freeze-installed
because our
retry logic handles conflicts and frozen packages in the same way.
Additionally, this retry logic can also be disabled or reduced with an
environment variable for extreme cases (very large environments).
This makes the overall logic simpler and faster, which compounds
on top of the lightning-fast libmamba
implementation.
SAT algorithms#
Given a set of MatchSpec
objects, classic
will apply a multistep, multi-objective optimization strategy that invokes the actual SAT solver several times:
conda.resolve.Resolve.solve()
will optimize several objective metrics. In no particular order, some of these rules are:Maximize the versions and build numbers of required packages
Minimize the number of
track_features
Prefer non-
noarch
overnoarch
if both are availableMinimize the number of necessary upgrades and/or removals
conda.common._logic.Clauses.minimize()
:This is used for each step above, and involves a series of SAT calls per minimization. All of these calls involve, at some point, passing Python objects over to the C context, which incurs some overhead.
In contrast, libmamba
delegates fully to libsolv
, which has its own logic
for conda-specific problems.
You can read more about it in the
mamba-org/mamba
documentation, but the most important part:
libsolv
is a backtracking SAT solver, inspired by minisat. This means that it explores “branches” of a solution until it finds one that satisfies the input constraints.If we understand
classic
’s approach as a global-like optimization strategy, and one could saylibsolv
’s better resembles a local optimization approach.This means that in the presence of several compatible solutions,
libsolv
might choose one that is different to the one proposed byclassic
.
Note
Tip
Large conda-forge migrations often rely on multiple coexisting build variants
to ease the transition (e.g. openssl
v1 to v3). This introduces several
alternative branches libsolv
can end up exploring and selecting,
perhaps with surprising results.
Being more explicit about the requested packages usually helps get obtaining
the expected solution; e.g. if you want to install scipy=1.0
(the latest
version), express that explicitly: conda install scipy=1.0
instead of
conda install scipy
.
Index reduction#
classic
prunes the channel metadata (internally referred to as the “index”) in every Resolve.solve()
call.
This reduces the search space by excluding packages that won’t ever be needed by the current set of input constraints.
Conversely, this performance optimization step can get longer and longer the larger the index gets.
In libsolv
, pruning is part of the filtering, sorting and selection mechanism that informs the solver (see policy.c
and selection.c
).
It runs in C, using memory-efficient data structures.
IO differences#
conda-libmamba-solver
uses the same IO stack as conda
classic. In the past, we relied on libmamba
’s IO for repodata fetching, but this is not the case anymore.
Practical examples of solver differences#
Python 3.11 + very old Pydantic#
Case study inspired by issue #115
The following environment file will give different solutions with classic
and conda-libmamba-solver
.
name: gmso
channels:
- conda-forge
dependencies:
- numpy
- sympy
- unyt <=2.8
- boltons
- lxml
- pydantic <1.9.0
- networkx
- ele >=0.2.0
- forcefield-utilities
classic
:python 3.10
+pydantic 1.8.2
conda-libmamba-solver
:python 3.11
+pydantic 0.18.2
This is an example of an underspecified input. There’s no python
dependency (or version) listed
in the environment file, so the solver has to figure it out. The solver doesn’t necessarily know
which dependency is more “important”. classic
will prioritize getting a more recent pydantic
at
the expense of an older python
, and conda-libmamba-solver
will prefer having python 3.11
,
even if it means going all the way down to pydantic 0.18.2
(which was packaged as noarch
) and
thus compatible with any Python version.
cudatoolkit present in a cpuonly
environment#
Originally reported in issue #131
This is an example of a known limitation in how libsolv
processes the track_features
metadata. libsolv
will only “see” the first level of track_features
, which down-prioritize packages. If you depend
on 2nd-order dependencies to track prioritized variants (which conda classic
successfully
processes), you will get mixed results. This can be solved at the packaging level, where all the
variants rely on the package mutex directly, instead of relying on packages that depend on the
mutex.
More information#
If you want to read (even more) about this, please check the following resources: