Theory of Operation¶
To track whether a package is indexed in the cache or not, conda-index uses a
table named stat, with a compound primary key (stage, path). Think of
packages moving from “upstream” to “downstream” by being duplicated in the
stat table for each stage.
The main stages are 'fs' which is called the upstream stage, and 'indexed'.
'fs' means that the artifact is on the filesystem. 'indexed' means that the
entry already exists in the database (same filename, same timestamp, same hash),
and its package metadata has been extracted to the index_json etc. tables.
Paths in 'fs' but not in 'indexed' need to be unpacked to have their
metadata added to the database. Paths in 'indexed' but not in 'fs' will be
ignored and left out of repodata.json.
First, conda-index adds all files in a subdir to the 'fs' upstream stage. Each
package has an entry ('fs', path, mtime, size, ...). This involves a
listdir() and stat() for each file in the index.
Next, conda-index looks for all changed_packages(): paths in the upstream
(fs) stage that are either missing from or have a different size, mtime than
those in the indexed stage.
The changed_packages() are examined one by one, and their metadata is stored
as json in various tables in conda-index’s database.
Finally, a join between the upstream stage, usually 'fs', and the
index_json table yields repodata_from_packages.json without any repodata
patches.
SELECT path, index_json
FROM stat JOIN index_json
USING (path)
WHERE stat.stage = :upstream_stage
The steps to create repodata.json, including any repodata patches, and to
create current_repodata.json with only the latest versions of each package,
are similar to pre-sqlite3 conda-index. The raw repodata_from_packages.json is
loaded, each record is sent through a patch function (if provided) than can
modify or exclude that record, and the result is serialized as repodata.json.
The other cached metadata tables are used to create channeldata.json, an
optional file that aggregates packages from every subdir into a channel listing.
Advanced Techniques¶
Other techniques are possible but generally require using the conda-index API
and are not available from the command line interface.
“Metadata only” stage¶
Sometimes it is useful to create an index without unpacking real packages from
the local filesystem; for example, when translating .whl package metadata to
conda repodata. As of version 0.12.0, conda-index adds a md or metadata
stage to support this mode. The md stage doesn’t participate in
changed_packages() or conda-index’s package extraction pipeline. Instead,
the user inserts stat table entries and metadata into conda-indexs database
either directly or by using conda-index APIs. Then, the output query is
changed to
SELECT path, index_json
FROM stat JOIN index_json
USING (path)
WHERE stat.stage in ('fs', 'md')
When it’s time to output repodata, packages that are in the fs or md stage,
and also have a row in index_json, are included.
Other Techniques¶
It is possible to index without calling stat() on each package, or without
even having all packages stored on the indexing machine. This can be done by
subclassing CondexIndexCache() and replacing the save_fs_state() and
changed_packages() methods.
Advanced users can use the CLI or the API to run conda_index on a partial
local package repository. It is possible to add a few local packages to a much
larger index instead of keeping every package on the machine running
conda-index.
For example, by running python -m conda_index --db postgresql --update-only [DIR], conda-index will add or update packages in [DIR] to repodata, while
keeping already-indexed packages in the output repodata.json. The output
repodata can then be copied to a server that has every package.
If --update-only is used, the stat table must be altered to remove packages
from repodata.json, e.g. DELETE FROM stat WHERE path = '<prefix>/<subdir>/package.conda' AND stage = 'fs'.
When using this option, care must be taken to never run conda-index without
--update-only or all the “missing” packages will be dropped from the index.