Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit-graph: Write incremental files #184

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented May 3, 2019

This version is now ready for review.

The commit-graph is a valuable performance feature for repos with large commit histories, but suffers from the same problem as git repack: it rewrites the entire file every time. This can be slow when there are millions of commits, especially after we stopped reading from the commit-graph file during a write in 43d3561 (commit-graph write: don't die if the existing graph is corrupt).

Instead, create a "chain" of commit-graphs in the .git/objects/info/commit-graphs folder with name graph-{hash}.graph. The list of hashes is given by the commit-graph-chain file, and also in a "base graph chunk" in the commit-graph format. As we read a chain, we can verify that the hashes match the trailing hash of each commit-graph we read along the way and each hash below a level is expected by that graph file.

When writing, we don't always want to add a new level to the stack. This would eventually result in performance degradation, especially when searching for a commit (before we know its graph position). We decide to merge levels of the stack when the new commits we will write is less than half of the commits in the level above. This can be tweaked by the --size-multiple and --max-commits options.

The performance is necessarily amortized across multiple writes, so I tested by writing commit-graphs from the (non-rc) tags in the Linux repo. My test included 72 tags, and wrote everything reachable from the tag using --stdin-commits. Here are the overall perf numbers:

write --stdin-commits:         8m 12s
write --stdin-commits --split:    28s
write --split && verify --shallow: 60s

Updates in V3:

  • git commit-graph verify now works on commit-graph chains. We do a simple test to check the behavior of a new --shallow option.

  • When someone writes a flat commit-graph, we now expire the old chain according to the expire time.

  • The "max commits" limit is no longer enabled by default, but instead is enabled by a --max-commits=<n> option. Ignored if n=0.

Updates in V4:

Johannes pointed out some test failures on the Windows platform. We found that the tests were not running on Windows in the gitgitgadget PR builds, which is now resolved.

  • We need to close commit-graphs recursively down the chain. This prevented an unlink() from working because of an open handle.

  • Creating the alternates file used a path-specification that didn't work on Windows.

  • Renaming a file to the same name failed, but is probably related to the unlink() error mentioned above.

Updates in V5:

  • Responding to multiple items of feedback. Thanks Philip, Junio, and Ramsay!

  • Used the test coverage report to find holes in the test coverage. While adding tests, I found a bug in octopus merges. The fix is in the rewrite of "deduplicate_commits()" as "sort_and_scan_merged_commits()" and covered by the new tests.

Updates in V6:

  • Rebased onto ds/close-object-store and resolved conflicts around close_commit_graph().

  • Updated path normalization to be resilient to double-slashes and trailing slashes.

  • Added a prepare_alt_odb() call in load_commit_graph_one() for cross-alternate graph loads during 'verify' subcommands.

Thanks,
-Stolee

[1] git@43d3561
commit-graph write: don't die if the existing graph is corrupt

Cc: peff@peff.net, avarab@gmail.com, git@jeffhostetler.com, jrnieder@google.com, steadmon@google.com, johannes.schindelin@gmx.de, philipoakley@iee.org

@derrickstolee derrickstolee force-pushed the graph/incremental branch 4 times, most recently from 6ce0285 to 3c52385 Compare May 8, 2019 12:46
@derrickstolee derrickstolee changed the title Commit-graph: Write incremental files [RFC] Commit-graph: Write incremental files May 8, 2019
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented May 8, 2019

Submitted as pull.184.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented May 8, 2019

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, May 08 2019, Derrick Stolee via GitGitGadget wrote:

> This patch series is marked as RFC quality because it is missing some key
> features and tests, but hopefully starts a concrete discussion of how the
> incremental commit-graph writes can work. If this is a good direction, then
> it would replace ds/commit-graph-format-v2.

I have some comments on 12/17 that I'll put there.

I think it would be best to start by submitting 1-11 for inclusion so we
can get minor cleanups/refactoring out of the way. I've only skimmed
those patches, but they seem to be obviously correct, although the diff
move detection (and with -w) doesn't help much with them.

This next bit sounds petty, but I honestly don't mean it that way :)

One minor thing I want to note is 04/17. The change itself I 100% agree
on (in-tree docs are bad places for TODO lists), but the commit message
*still* says that a "verify" is just as slow as "write", even though I
noted a ~40% difference in [1].

Do I care about that tiny isolated thing? Nope. But I *do* think it's
indicative of a general thing that could be improved in these RFC
iterations that I found a bit frustrating in reading through
it. I.e. you're getting some of the "C[comments]", but then there's
re-rolled patches that don't address those things.

What we say in the commit message for 4/17 obviously doesn't matter much
at all. But there's other outstanding feedback on the last iteration
that from reading this one still mostly/entirely applies.

So I'll just leave this reply at "I have a lot of comments", but that
they're still sitting there.

1. https://public-inbox.org/git/87o94mql0a.fsf@evledraar.gmail.com/

builtin/fetch.c Outdated Show resolved Hide resolved
@derrickstolee derrickstolee changed the base branch from ab/commit-graph-fixes to ds/commit-graph-write-refactor May 22, 2019 18:17
@derrickstolee derrickstolee force-pushed the graph/incremental branch 2 times, most recently from a891a13 to 72fc0a1 Compare May 22, 2019 18:55
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented May 22, 2019

Submitted as pull.184.v2.git.gitgitgadget@gmail.com

@derrickstolee derrickstolee changed the title [RFC] Commit-graph: Write incremental files Commit-graph: Write incremental files Jun 3, 2019
@derrickstolee derrickstolee marked this pull request as ready for review June 3, 2019 15:19
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2019

Submitted as pull.184.v3.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2019

This patch series was integrated into pu via git@176fcc2.

@gitgitgadget gitgitgadget bot added the pu label Jun 5, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2019

Submitted as pull.184.v6.git.gitgitgadget@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 20, 2019

This patch series was integrated into pu via git@3e97402.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 20, 2019

This patch series was integrated into pu via git@28e1fac.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 21, 2019

This patch series was integrated into pu via git@fcb5eb2.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2019

This patch series was integrated into pu via git@55a58a6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 26, 2019

This patch series was integrated into pu via git@34048e9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This patch series was integrated into pu via git@bbcd30c.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@67a44b2.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@a389eb0.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@adf5366.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2019

This patch series was integrated into pu via git@76acd85.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2019

This patch series was integrated into pu via git@eff1a8c.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@4ba9254.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into next via git@5dee5ed.

@gitgitgadget gitgitgadget bot added the next label Jul 3, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@63a594c.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

This patch series was integrated into pu via git@0371b27.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2019

This patch series was integrated into pu via git@21fb063.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

This patch series was integrated into pu via git@d10b1d6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 19, 2019

This patch series was integrated into pu via git@92b1ea6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 19, 2019

This patch series was integrated into next via git@92b1ea6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 19, 2019

This patch series was integrated into master via git@92b1ea6.

@gitgitgadget gitgitgadget bot added the master label Jul 19, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 19, 2019

Closed via 92b1ea6.

@gitgitgadget gitgitgadget bot closed this Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant