use conda.api instead of parallel calls to the conda binary #4775

keewis · 2021-01-07T01:45:39Z

Currently, our min_deps_check.py script uses the conda binary to analyze the package metadata. This is not very efficient because conda will download and parse repodata.json on each call. To speed up the script, we're currently running 8 of these calls (+ processing the results) in parallel using a ThreadPoolExecutor, but that increases the memory consumption (and my old laptop does not have that much memory).

conda provides the conda.api.SubdirData.query_all function which will cache the parsed repodata.json files between calls. Using that, my laptop can complete

python ci/min_deps_check.py ci/requirements/py36-bare-minimum.yml
python ci/min_deps_check.py ci/requirements/py36-min-all-deps.py

in about 30 seconds, even though the packages are analyzed sequentially.

The documentation states that that function maybe be changed without warning between minor versions, so we would have to pin conda to a x.y version to be able to use this.

Edit: the documentation does have that warning, but it also says:

There are 3 supported public modules. We support:

import conda.cli.python_api

import conda.api

import conda.exports

The first 2 should have very long-term stability. The third is guaranteed to be stable throughout the lifetime of a feature release series--i.e. minor version number.

so I guess we don't have to pin?

cc @crusaderky

_{By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a [test-upstream] tag to the first line of the commit message.}

keewis · 2021-01-08T01:58:38Z

the new code should generate the same report as the old code now, so this should be ready for review.

The tool says we can bump a few libraries (for example we could bump numpy to 1.17 in a few days), but I guess we should resolve #4179 first.

mathause · 2021-01-08T12:29:08Z

LGTM

shoyer

This is great, thanks!

crusaderky

This looks great

* upstream/master: (342 commits) fix decode for scale/ offset list (pydata#4802) Expand user dir paths (~) in open_mfdataset and to_zarr. (pydata#4795) add a version info step to the upstream-dev CI (pydata#4815) fix the ci trigger action (pydata#4805) scatter plot by order of the first appearance of hue (pydata#4723) don't skip the scheduled CI (pydata#4806) coords: retain str dtype (pydata#4759) Fix interval labels with units (pydata#4794) Always force dask arrays to float in missing.interp_func (pydata#4771) Print number of variables in repr (pydata#4762) install conda as a library in the minimum dependency check CI (pydata#4792) Migrate CI from azure pipelines to GitHub Actions (pydata#4730) use conda.api instead of parallel calls to the conda binary (pydata#4775) Speed up missing._get_interpolator (pydata#4776) Remove special case in guess_engines (pydata#4777) improve typing of OrderedSet (pydata#4774) CI: ignore some warnings (pydata#4773) DOC: update hyperlink for xskillscore (pydata#4778) drop support for python 3.6 (pydata#4720) Trigger upstream CI on cron schedule (by default) (pydata#4729) ...

use conda.api instead of parallel calls to the conda binary

8eae768

keewis marked this pull request as ready for review January 7, 2021 14:05

keewis changed the title ~~WIP: use conda.api instead of parallel calls to the conda binary~~ use conda.api instead of parallel calls to the conda binary Jan 7, 2021

keewis added 4 commits January 8, 2021 00:06

Merge branch 'master' into refactor-min_deps_check

df25aa3

don't select releases without release dates

904483a

update the format to be wide enough to fit matplotlib-base

897d9fd

don't verify the version using the filename

95f83fc

keewis requested a review from crusaderky January 8, 2021 14:19

filter invalid / missing dates before retrieving the metadata

3a7d729

shoyer approved these changes Jan 10, 2021

View reviewed changes

crusaderky approved these changes Jan 11, 2021

View reviewed changes

crusaderky merged commit db6f4be into pydata:master Jan 11, 2021

keewis deleted the refactor-min_deps_check branch January 11, 2021 11:46

keewis mentioned this pull request Jan 11, 2021

install conda as a library in the minimum dependency check CI #4792

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use conda.api instead of parallel calls to the conda binary #4775

use conda.api instead of parallel calls to the conda binary #4775

keewis commented Jan 7, 2021 •

edited

Loading

keewis commented Jan 8, 2021 •

edited

Loading

mathause commented Jan 8, 2021

shoyer left a comment

crusaderky left a comment

use conda.api instead of parallel calls to the conda binary #4775

use conda.api instead of parallel calls to the conda binary #4775

Conversation

keewis commented Jan 7, 2021 • edited Loading

keewis commented Jan 8, 2021 • edited Loading

mathause commented Jan 8, 2021

shoyer left a comment

Choose a reason for hiding this comment

crusaderky left a comment

Choose a reason for hiding this comment

keewis commented Jan 7, 2021 •

edited

Loading

keewis commented Jan 8, 2021 •

edited

Loading