Support git cloning in parallel #70
Labels
content discovery
Related to enumerating or specifying content to scan
enhancement
New feature or request
performance
Related to runtime performance
The
scan
command currently is able to automatically clone Git repositories when invoked with the--git-url
,--github-user
, or--github-org
arguments. This runs sequentially, and when you cast a large net (e.g., end up indirectly specifying 1000 repositories), cloning the input repositories takes the majority of the total time.It doesn't appear that cloning a single git repo at a time is either network, CPU, or memory-bound on any system I've used. It seems that the remote server that we are cloning the repo from is the bottleneck.
It would be better if Nosey Parker could clone Git repositories in parallel — maybe a limit of 4 at a time by default.
The text was updated successfully, but these errors were encountered: