Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of git status check in cargo package. #9478

Merged
merged 1 commit into from
May 11, 2021

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented May 11, 2021

The check for a dirty repository during packaging/publishing is quite slow. It was calling status_file for every packaged file, which is very expensive. I have a directory that had about 10,000 untracked files. Previously, cargo would hang for over 2 minutes without any output. With this PR, it finishes in 0.3 seconds.

The solution here is to collect the status information once, and then compare the package list against it.

One subtle point is that it does not use recurse_untracked_dirs, and instead relies on a primitive starts_with comparison, which I believe should be equivalent.

This still includes an inefficient n^2 algorithm, but I am too lazy to make a better approach.

I'm moderately confident this is pretty much the same as before (at least, all the scenarios I could think of).

@rust-highfive
Copy link

r? @Eh2406

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 11, 2021
@alexcrichton
Copy link
Member

@bors: r+

Seems like a nice win to me!

@bors
Copy link
Collaborator

bors commented May 11, 2021

📌 Commit a200640 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 11, 2021
@bors
Copy link
Collaborator

bors commented May 11, 2021

⌛ Testing commit a200640 with merge f99f965...

@bors
Copy link
Collaborator

bors commented May 11, 2021

☀️ Test successful - checks-actions
Approved by: alexcrichton
Pushing f99f965 to master...

@bors bors merged commit f99f965 into rust-lang:master May 11, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request May 12, 2021
Update cargo

8 commits in e51522ab3db23b0d8f1de54eb1f0113924896331..070e459c2d8b79c5b2ac5218064e7603329c92ae
2021-05-07 21:29:52 +0000 to 2021-05-11 18:12:23 +0000
- Fix rustdoc warnings (rust-lang/cargo#9468)
- Improve performance of git status check in `cargo package`. (rust-lang/cargo#9478)
- Link to the new rustc tests chapter. (rust-lang/cargo#9477)
- Bump index cache version to deal with semver metadata version mismatch. (rust-lang/cargo#9476)
- Fix Url::into_string deprecation warning (rust-lang/cargo#9475)
- Fix rust-lang/cargo#4482 and rust-lang/cargo#9449: set Fossil ignore and clean settings locally (rust-lang/cargo#9469)
- Improve two error messages (rust-lang/cargo#9472)
- Fix `cargo install` with a semver metadata version. (rust-lang/cargo#9467)
@ehuss ehuss added this to the 1.54.0 milestone Feb 6, 2022
@Byron
Copy link
Member

Byron commented Jun 13, 2022

Thanks for the performance improvement, I wasn't even aware :).

I'm moderately confident this is pretty much the same as before (at least, all the scenarios I could think of).

@ehuss, I think I have found a probably not too uncommon case where this isn't necessarily the case. The culprit, I believe, is this line…:

        status_opts
            .exclude_submodules(true)
                                    .include_ignored(true).include_untracked(true);

…which can cause the publish to fail because it's alarmed by untracked files which are ignored and wouldn't be published anyway.

As a concrete example, here is the publish attempt of a gitoxide crate:

[TRACE] Will run "cargo" "publish" "--manifest-path" "/Users/byron/dev/github.com/Byron/gitoxide/git-worktree/Cargo.toml" "--package" "git-worktree"
    Updating crates.io index
error: 2 files in the working directory contain changes that were not yet committed into git:

tests/fixtures/generated-do-not-edit/make_ignorecase_collisions/444887939-unix/link-to-X
tests/fixtures/generated-do-not-edit/make_ignorecase_collisions/444887939-unix/x

to proceed despite this and include the uncommitted changes, pass the `--allow-dirty` flag
Error: Could not successfully execute 'cargo publish'.

(note that tests/* is in git-worktree)

The top-level .gitignore file ignores the entire directory of generated files with **/generated-do-not-edit/ and the include directory in git-worktree/Cargo.toml does not include the tests either with include = ["src/**/*", "CHANGELOG.md"].

Now I wonder if a solution would be to pass all supposedly dirty files through the include/exclude filter coming from Cargo.toml?

@ehuss
Copy link
Contributor Author

ehuss commented Jun 13, 2022

I can't reproduce the problem. If you'd like, I recommend opening an issue with a reproduction (minimal if possible). Ignored files that aren't published should be excluded from the intersection here. Perhaps it is possible there is some kind of path normalization issue or something.

@Byron
Copy link
Member

Byron commented Jun 13, 2022

Thanks for the swift reply!

If I read the exclusion check correctly, then src_files does indeed include the files in the test directory, like they are pre-include/exclude, which is why they match with what's considered dirty/untracked.

There is something peculiar about these paths though:

lrwxr-xr-x 1 byron staff 13 Jun 11:42 tests/fixtures/generated-do-not-edit/make_ignorecase_collisions/444887939-unix/link-to-X -> X
➜  git-worktree git:(main) l tests/fixtures/generated-do-not-edit/make_ignorecase_collisions/444887939-unix/x
lrwxr-xr-x 1 byron staff 13 Jun 11:42 tests/fixtures/generated-do-not-edit/make_ignorecase_collisions/444887939-unix/x -> X

So it refers to itself and does so in a case-insensitive manner. Maybe this is why these files have not been excluded in the first place. In any case, it's probably enough for me to cook up a test that reproduces the issue. Thanks for your help thus far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants