Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download requirements in parallel #3981

Closed
wants to merge 6 commits into from

Conversation

thedrow
Copy link

@thedrow thedrow commented Sep 20, 2016

This is an attempt to resolve #825.
It uses multiple threads to prepare the requirements.
I have vendored the concurrent.futures backport to 2.7 for that purpose.
There are multiple issues here that needs to be resolved before this PR can be merged:

  • We need to provide an option to set the number of parallel jobs. Do we want to keep the old behaviour as well or simply have one extra thread that will perform the download if parallelism is not desired?
  • We need to find a way to clean up correctly when SIGINT is issued. Since the progress bar is initialized in a different thread we cannot set the signal handler as InterruptibleMixin did before. That functionality is currently commented out and needs to be somehow restored before we merge this.
  • The UI obviously doesn't work well when downloading multiple dependencies in parallel.
  • The futures library does not use relative imports. I edited the vendored code inline but I'm assuming that it will fail in CI.

Is it faster? I don't know. I tried installing Django CMS with it and this PR takes ~10 seconds less to install. That's just a quick test. We probably need to test this better somehow.

The results

pip 8.1.2:

time pip install django-cms --no-cache-dir
Collecting django-cms
  Downloading django_cms-3.4.0-py2.py3-none-any.whl (2.9MB)
    100% |████████████████████████████████| 2.9MB 885kB/s
Collecting django-treebeard>=4.0.1 (from django-cms)
  Downloading django-treebeard-4.0.1.tar.gz (93kB)
    100% |████████████████████████████████| 102kB 633kB/s
Collecting django-sekizai>=0.7 (from django-cms)
  Downloading django_sekizai-0.10.0-py2.py3-none-any.whl
Collecting django-classy-tags>=0.7.2 (from django-cms)
  Downloading django_classy_tags-0.8.0-py2.py3-none-any.whl
Collecting djangocms-admin-style>=1.0 (from django-cms)
  Downloading djangocms-admin-style-1.2.4.tar.gz (1.6MB)
    100% |████████████████████████████████| 1.6MB 561kB/s
Collecting django-formtools>=1.0 (from django-cms)
  Downloading django_formtools-1.0-py2.py3-none-any.whl (132kB)
    100% |████████████████████████████████| 133kB 1.1MB/s
Collecting Django<1.10,>=1.8 (from django-cms)
  Downloading Django-1.9.9-py2.py3-none-any.whl (6.6MB)
    100% |████████████████████████████████| 6.6MB 1.1MB/s
Installing collected packages: Django, django-treebeard, django-classy-tags, django-sekizai, djangocms-admin-style, django-formtools, django-cms
  Running setup.py install for django-treebeard ... done
  Running setup.py install for djangocms-admin-style ... done
Successfully installed Django-1.9.9 django-classy-tags-0.8.0 django-cms-3.4.0 django-formtools-1.0 django-sekizai-0.10.0 django-treebeard-4.0.1 djangocms-admin-style-1.2.4

real    0m42.124s
user    0m6.329s
sys 0m6.614s

This PR:

time pip install django-cms --no-cache-dir
Collecting django-cms
  Downloading django_cms-3.4.0-py2.py3-none-any.whl (2.9MB)
    100% |████████████████████████████████| 2.9MB 10.2MB/s
Collecting django-formtools>=1.0 (from django-cms)
Collecting django-classy-tags>=0.7.2 (from django-cms)
Collecting django-treebeard>=4.0.1 (from django-cms)
Collecting djangocms-admin-style>=1.0 (from django-cms)
Collecting Django<1.10,>=1.8 (from django-cms)
Collecting django-sekizai>=0.7 (from django-cms)
  Downloading Django-1.9.9-py2.py3-none-any.whl (6.6MB)
    4% |█▍                              | 286kB 668kB/s eta 0:00:10  Downloading django_formtools-1.0-py2.py3-none-any.whl (132kB)
  Downloading django-treebeard-4.0.1.tar.gz (93kB)
    7% |██▌                             | 10kB 2.4MB/s eta 0:00:01  Downloading djangocms-admin-style-1.2.4.tar.gz (1.6MB)
    10% |███▌                            | 10kB 1.6MB/s eta 0:00:01  Downloading django_sekizai-0.10.0-py2.py3-none-any.whl
    5% |█▉                              | 389kB 594kB/s eta 0:00:11  Downloading django_classy_tags-0.8.0-py2.py3-none-any.whl
    100% |████████████████████████████████| 133kB 265kB/s
    100% |████████████████████████████████| 102kB 189kB/s
    100% |████████████████████████████████| 1.6MB 689kB/s
    100% |████████████████████████████████| 6.6MB 1.3MB/s
Installing collected packages: Django, django-treebeard, django-formtools, django-classy-tags, djangocms-admin-style, django-sekizai, django-cms
  Running setup.py install for django-treebeard ... done
  Running setup.py install for djangocms-admin-style ... done
Successfully installed Django-1.9.9 django-classy-tags-0.8.0 django-cms-3.4.0 django-formtools-1.0 django-sekizai-0.10.0 django-treebeard-4.0.1 djangocms-admin-style-1.2.4

real    0m31.346s
user    0m6.092s
sys 0m6.219s

@dstufft
Copy link
Member

dstufft commented Sep 20, 2016

I tried to give this a quick look, but for some reason it appears you've made a bunch of random whitespace changes throughout the files which makes it harder to review.

@thedrow
Copy link
Author

thedrow commented Sep 20, 2016

@dstufft I tried making the pep8 check happier. I'll revert that for now.

@thedrow
Copy link
Author

thedrow commented Oct 1, 2016

@dstufft Any chance to get this reviewed?

@jacobsvante
Copy link

Wow, good job! We had a huge speed up for one of our projects. Went from 85 to 52 seconds on average without any flags. So almost 40% faster for us there. When adding the flag --no-cache-dir the install was almost 90 seconds faster, from 3:27 to 2:00 minutes! 57% faster!

In total 120 dependencies, with many of them being Flask and SQLAlchemy related.

@thedrow
Copy link
Author

thedrow commented Oct 1, 2016

The problem is that the UI needs to be rethought in order to work with installing in parallel.
@jmagnusson any idea why installation is slower with caching enabled?
Is it because reading from the disk is a blocking operation?
How can we fix that?

@jacobsvante
Copy link

@thedrow It's not slower with caching enabled, quite the opposite. With --no-cache-dir it took 120 seconds. Without this setting – i.e. using pip's cache – the time went down to 52 seconds.

But yeah, the UI needs to be rethought for this feature. Perhaps we can just use a single progress bar for the download progress of all dependencies, with text on the right of it explaining what's happening (and this will shift back and forth with info like Collecting X==1.0 (from Y==0.1) / Using cached X-1.0.x.y.z.whl as they want to be printed). If a warning occurs during download the progress bar is pushed down a row in the terminal. I'm thinking something like npm install:

npm WARN deprecated lodash.isarray@4.0.0: This package is deprecated. Use Array.isArray.
npm WARN deprecated lodash@1.0.2: lodash@<3.0.0 is no longer maintained. Upgrade to lodash@^4.0.0.
⸨░░░░░░░░░░░       ⸩ ⠙ fetchMetadata: verb cache add spec readable-stream@^2.0.4
pip WARN requested SQLAlchemy-Utils==0.31.8 from https://github.com/jmagnusson/sqlalchemy-utils/archive/81da7bb4e.zip#egg=SQLAlchemy-Utils==0.32.10, but installing version 0.32.9
⸨░░░░░░░░░░░       ⸩ ⠙ collect: phonenumbers==7.5.2 (from pkg==1.0.0)

@dstufft
Copy link
Member

dstufft commented Oct 5, 2016

I have been thinking about doing something like this for awhile, and not just for downloads, however I hadn't yet gotten around to it so thanks for contributing!

One thing that might be useful here, I had looked at possibly replacing our current progress bar library with tqdm which supports nested progress bars so we could do something like:

Downloading 10% |███▌
    Django-1.0.tar.gz 30% |█████
    six-1.9.tar.gz 100% |█████
....

You can see more examples of this on https://pypi.org/project/tqdm/. It's not a slam dunk though, one big issue would be we'd need a solution for tqdm/tqdm#228.

@thedrow
Copy link
Author

thedrow commented Oct 7, 2016

@dstufft Do you mind if the UI will just be dots for the MVP? That way the feature would land earlier.
How do I implement this?

@jacobsvante
Copy link

jacobsvante commented Oct 10, 2016

@dstufft What other parts are you imagining parallelism for? Would the installation part be helped by it?

@thedrow Looks pretty easy to implement nested progress bars with tqdm, going by the example here: https://pypi.org/project/tqdm/#nested-progress-bars. I'm just wondering what it will look like when there's more dependencies than can fit in one screen though...?

@thedrow
Copy link
Author

thedrow commented Oct 10, 2016

@jmagnusson I'd love for some help. I'm not really available to finish all of this including UI work.

@jacobsvante
Copy link

jacobsvante commented Oct 12, 2016

I know the feeling and right now it's just like that over here too. I'll keep this PR in mind for when my schedule clears up. Great work so far though!

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Apr 1, 2017
@thedrow
Copy link
Author

thedrow commented Apr 2, 2017

@dstufft I'm gonna close this for now as I don't have time to finish it. I hope that the basic approach will aid others.

@thedrow thedrow closed this Apr 2, 2017
@ghost ghost mentioned this pull request Aug 30, 2017
@casperdcl casperdcl mentioned this pull request Apr 8, 2018
8 tasks
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
@pradyunsg pradyunsg removed the needs rebase or merge PR has conflicts with current master label Apr 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel downloads
5 participants