Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Index: Integrate with merge, cherry-pick, rebase, and revert #1019

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions builtin/merge.c
Original file line number Diff line number Diff line change
Expand Up @@ -1276,6 +1276,9 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Allow 'git merge' to operate without expanding a sparse index, at least
> not immediately. The index still will be expanded in a few cases:
>
> 1. If the merge strategy is 'recursive', then we enable
>    command_requires_full_index at the start of the merge_recursive()
>    method. We expect sparse-index users to also have the 'ort' strategy
>    enabled.

What about `resolve`, `octopus`, `subtree` (which technically could be
implemented via either recursive or ort, such fun...) or a
user-defined strategy?

`resolve` and `octopus` would absolutely need a full index.  `subtree`
would if implemented via merge-recursive, and not if implemented via
merge-ort.

I'm not sure what to assume about user-defined strategies; I guess for
safety reasons and backward compatibility, we should always expand?
Or maybe there are no backward compatibility concerns since no one who
uses a sparse-index will attempt to use any pre-existing external
merge strategies (are there even any known ones in the wild or is this
still a theoretical capability?), and we can assume they will only use
ones written in the future?  Hmm...

> 2. If the merge results in a conflicted file, then we expand the index
>    before updating the working tree. The loop that iterates over the
>    worktree replaces index entries and tracks 'origintal_cache_nr' which
>    can become completely wrong if the index expands in the middle of the
>    operation. This safety valve is important before that loop starts. A
>    later change will focus this to only expand if we indeed have a
>    conflict outside of the sparse-checkout cone.

From reading the patch below, this is specific to ort, but that wasn't
clear on reading the commit message.

> Some test updates are required, including a mistaken 'git checkout -b'
> that did not specify the base branch, causing merges to be fast-forward
> merges.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/merge.c                          | 3 +++
>  merge-ort.c                              | 8 ++++++++
>  merge-recursive.c                        | 3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 8 ++++++--
>  4 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 22f23990b37..926de328fbb 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -1276,6 +1276,9 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(builtin_merge_usage, builtin_merge_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         /*
>          * Check if we are _not_ on a detached HEAD, i.e. if there is a
>          * current branch.
> diff --git a/merge-ort.c b/merge-ort.c
> index 6eb910d6f0c..8e754b769e1 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -4058,6 +4058,14 @@ static int record_conflicted_index_entries(struct merge_options *opt)
>         if (strmap_empty(&opt->priv->conflicted))
>                 return 0;
>
> +       /*
> +        * We are in a conflicted state. These conflicts might be inside
> +        * sparse-directory entries, so expand the index preemtively.

s/preemtively/preemptively/

> +        * Also, we set original_cache_nr below, but that might change if
> +        * index_name_pos() calls ask for paths within sparse directories.
> +        */
> +       ensure_full_index(index);
> +

This seems somewhat pessimistic; what if all the conflicts are within
the sparse-checkout?  Having conflicts contains within the
sparse-checkout seems likely, since we'd only get conflicts for files
modified by both sides of history, and sparse-checkouts are used when
users aren't going to modify files outside the sparse-checkout.

>         /* If any entries have skip_worktree set, we'll have to check 'em out */
>         state.force = 1;
>         state.quiet = 1;
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 3355d50e8ad..1f563cd6874 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -3750,6 +3750,9 @@ int merge_recursive(struct merge_options *opt,
>         assert(opt->ancestor == NULL ||
>                !strcmp(opt->ancestor, "constructed merge base"));
>
> +       prepare_repo_settings(opt->repo);
> +       opt->repo->settings.command_requires_full_index = 1;
> +
>         if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
>                 return -1;
>         clean = merge_recursive_internal(opt, h1, h2, merge_bases, result);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 3e01e70fa0b..781ebd9a656 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -52,7 +52,7 @@ test_expect_success 'setup' '
>                 git checkout -b base &&
>                 for dir in folder1 folder2 deep
>                 do
> -                       git checkout -b update-$dir &&
> +                       git checkout -b update-$dir base &&
>                         echo "updated $dir" >$dir/a &&
>                         git commit -a -m "update $dir" || return 1
>                 done &&
> @@ -652,7 +652,11 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/extra.txt &&
>         ensure_not_expanded add extra.txt &&
>         echo >>sparse-index/untracked.txt &&
> -       ensure_not_expanded add .
> +       ensure_not_expanded add . &&
> +
> +       ensure_not_expanded checkout -f update-deep &&
> +       ensure_not_expanded merge -s ort -m merge update-folder1 &&
> +       ensure_not_expanded merge -s ort -m merge update-folder2

Can we just set GIT_TEST_MERGE_ALGORITHM=ort at the beginning of the
test file and then avoid repeating `-s ort`?


>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 24, 2021 at 2:52 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Allow 'git merge' to operate without expanding a sparse index, at least
> not immediately. The index still will be expanded in a few cases:
>
> 1. If the merge strategy is 'recursive', then we enable
>    command_requires_full_index at the start of the merge_recursive()
>    method. We expect sparse-index users to also have the 'ort' strategy
>    enabled.
>
> 2. With the 'ort' strategy, if the merge results in a conflicted file,
>    then we expand the index before updating the working tree. The loop
>    that iterates over the worktree replaces index entries and tracks
>    'origintal_cache_nr' which can become completely wrong if the index
>    expands in the middle of the operation. This safety valve is
>    important before that loop starts. A later change will focus this
>    to only expand if we indeed have a conflict outside of the
>    sparse-checkout cone.
>
> 3. Other merge strategies are executed as a 'git merge-X' subcommand,
>    and those strategies are currently protected with the
>    'command_requires_full_index' guard.

Oh, indeed, it appears ag/merge-strategies-in-c didn't complete but
was discarded, as per the July 6 "What's cooking in git.git" email.
Well, that certainly makes things easier for you; thanks for
mentioning them in this item #3.

>
> Some test updates are required, including a mistaken 'git checkout -b'
> that did not specify the base branch, causing merges to be fast-forward
> merges.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/merge.c                          |  3 +++
>  merge-ort.c                              |  8 ++++++++
>  merge-recursive.c                        |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++++--
>  4 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 22f23990b37..926de328fbb 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -1276,6 +1276,9 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage_with_options(builtin_merge_usage, builtin_merge_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         /*
>          * Check if we are _not_ on a detached HEAD, i.e. if there is a
>          * current branch.
> diff --git a/merge-ort.c b/merge-ort.c
> index 6eb910d6f0c..8e754b769e1 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -4058,6 +4058,14 @@ static int record_conflicted_index_entries(struct merge_options *opt)
>         if (strmap_empty(&opt->priv->conflicted))
>                 return 0;
>
> +       /*
> +        * We are in a conflicted state. These conflicts might be inside
> +        * sparse-directory entries, so expand the index preemtively.

Same typo I pointed out in v1.

> +        * Also, we set original_cache_nr below, but that might change if
> +        * index_name_pos() calls ask for paths within sparse directories.
> +        */
> +       ensure_full_index(index);
> +
>         /* If any entries have skip_worktree set, we'll have to check 'em out */
>         state.force = 1;
>         state.quiet = 1;
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 3355d50e8ad..1f563cd6874 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -3750,6 +3750,9 @@ int merge_recursive(struct merge_options *opt,
>         assert(opt->ancestor == NULL ||
>                !strcmp(opt->ancestor, "constructed merge base"));
>
> +       prepare_repo_settings(opt->repo);
> +       opt->repo->settings.command_requires_full_index = 1;
> +
>         if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
>                 return -1;
>         clean = merge_recursive_internal(opt, h1, h2, merge_bases, result);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index ddc86bb4152..dc56252865c 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -47,7 +47,7 @@ test_expect_success 'setup' '
>                 git checkout -b base &&
>                 for dir in folder1 folder2 deep
>                 do
> -                       git checkout -b update-$dir &&
> +                       git checkout -b update-$dir base &&
>                         echo "updated $dir" >$dir/a &&
>                         git commit -a -m "update $dir" || return 1
>                 done &&
> @@ -647,7 +647,15 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/extra.txt &&
>         ensure_not_expanded add extra.txt &&
>         echo >>sparse-index/untracked.txt &&
> -       ensure_not_expanded add .
> +       ensure_not_expanded add . &&
> +
> +       ensure_not_expanded checkout -f update-deep &&
> +       (
> +               sane_unset GIT_TEST_MERGE_ALGORITHM &&
> +               git -C sparse-index config pull.twohead ort &&
> +               ensure_not_expanded merge -m merge update-folder1 &&
> +               ensure_not_expanded merge -m merge update-folder2
> +       )
>  '

Should you use test_config rather than git config here?

More importantly, why the subshell and unsetting of
GIT_TEST_MERGE_ALGORITHM and the special worrying about pull.twohead?
Wouldn't it be simpler to just set GIT_TEST_MERGE_ALGORITHM=ort,
perhaps at the beginning of the file?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 6:43 PM, Elijah Newren wrote:
> On Tue, Aug 24, 2021 at 2:52 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> +       /*
>> +        * We are in a conflicted state. These conflicts might be inside
>> +        * sparse-directory entries, so expand the index preemtively.
> 
> Same typo I pointed out in v1.

Sorry, this comment is edited in a later patch and I fixed it there. Will
fix here, too.

>> +       ensure_not_expanded checkout -f update-deep &&
>> +       (
>> +               sane_unset GIT_TEST_MERGE_ALGORITHM &&
>> +               git -C sparse-index config pull.twohead ort &&
>> +               ensure_not_expanded merge -m merge update-folder1 &&
>> +               ensure_not_expanded merge -m merge update-folder2
>> +       )
>>  '
> 
> Should you use test_config rather than git config here?

That's a better pattern. It's not technically _required_ for these
tests because the repositories are completely rewritten at the start
of each new test, but it's best to be a good example.

> More importantly, why the subshell and unsetting of
> GIT_TEST_MERGE_ALGORITHM and the special worrying about pull.twohead?
> Wouldn't it be simpler to just set GIT_TEST_MERGE_ALGORITHM=ort,
> perhaps at the beginning of the file?
 
I don't set GIT_TEST_MERGE_ALGORITHM at the beginning of the file so
the rest of the tests are covered with both 'ort' and 'recursive' in
the CI test suite.

Using the config more carefully matches how I expect the 'ort'
strategy to be specified in practice (very temporarily, as it will
soon be the default).

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/30/2021 1:18 PM, Derrick Stolee wrote:
> On 8/27/2021 6:43 PM, Elijah Newren wrote:
>> On Tue, Aug 24, 2021 at 2:52 PM Derrick Stolee via GitGitGadget
>>> +       ensure_not_expanded checkout -f update-deep &&
>>> +       (
>>> +               sane_unset GIT_TEST_MERGE_ALGORITHM &&
>>> +               git -C sparse-index config pull.twohead ort &&
>>> +               ensure_not_expanded merge -m merge update-folder1 &&
>>> +               ensure_not_expanded merge -m merge update-folder2
>>> +       )
>>>  '
>>
>> Should you use test_config rather than git config here?
> 
> That's a better pattern. It's not technically _required_ for these
> tests because the repositories are completely rewritten at the start
> of each new test, but it's best to be a good example.

Actually, test_config runs test_when_finished, and that results in
the following message and failure on macOS and Windows:

	BUG 'test_when_finished does nothing in a subshell'

So, I'll leave this as-is.

Thanks,
-Stolee

usage_with_options(builtin_merge_usage, builtin_merge_options);

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

/*
* Check if we are _not_ on a detached HEAD, i.e. if there is a
* current branch.
Expand Down
6 changes: 6 additions & 0 deletions builtin/rebase.c
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,9 @@ int cmd_rebase__interactive(int argc, const char **argv, const char *prefix)
argc = parse_options(argc, argv, prefix, options,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The hard work was already done with 'git merge' and the ORT strategy.
> Just add extra tests to see that we get the expected results in the
> non-conflict cases.

:-)

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/rebase.c                         |  6 ++++
>  builtin/revert.c                         |  3 ++
>  t/t1092-sparse-checkout-compatibility.sh | 41 ++++++++++++++++++++++--
>  3 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 33e09619005..27433d7c5a2 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -559,6 +559,9 @@ int cmd_rebase__interactive(int argc, const char **argv, const char *prefix)
>         argc = parse_options(argc, argv, prefix, options,
>                         builtin_rebase_interactive_usage, PARSE_OPT_KEEP_ARGV0);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         if (!is_null_oid(&squash_onto))
>                 opts.squash_onto = &squash_onto;
>
> @@ -1430,6 +1433,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>                 usage_with_options(builtin_rebase_usage,
>                                    builtin_rebase_options);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +

While rebase can use either merge-recursive or merge-ort under the
covers, this change is okay because merge-recursive.c now has code to
expand the index unconditionally, and merge-ort does so only
conditionally in the cases needed.  Right?  (Same for change below to
revert.c)

>         options.allow_empty_message = 1;
>         git_config(rebase_config, &options);
>         /* options.gpg_sign_opt will be either "-S" or NULL */
> diff --git a/builtin/revert.c b/builtin/revert.c
> index 237f2f18d4c..6c4c22691bd 100644
> --- a/builtin/revert.c
> +++ b/builtin/revert.c
> @@ -136,6 +136,9 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
>                         PARSE_OPT_KEEP_ARGV0 |
>                         PARSE_OPT_KEEP_UNKNOWN);
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         /* implies allow_empty */
>         if (opts->keep_redundant_commits)
>                 opts->allow_empty = 1;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a52d2edda54..c047b95b121 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -532,6 +532,38 @@ test_expect_success 'merge with conflict outside cone' '
>         test_all_match git rev-parse HEAD^{tree}
>  '
>
> +test_expect_success 'cherry-pick/rebase with conflict outside cone' '
> +       init_repos &&
> +
> +       for OPERATION in cherry-pick rebase
> +       do
> +               test_all_match git checkout -B tip &&
> +               test_all_match git reset --hard merge-left &&
> +               test_all_match git status --porcelain=v2 &&
> +               test_all_match test_must_fail git $OPERATION merge-right &&
> +               test_all_match git status --porcelain=v2 &&
> +
> +               # Resolve the conflict in different ways:
> +               # 1. Revert to the base
> +               test_all_match git checkout base -- deep/deeper2/a &&
> +               test_all_match git status --porcelain=v2 &&
> +
> +               # 2. Add the file with conflict markers
> +               test_all_match git add folder1/a &&
> +               test_all_match git status --porcelain=v2 &&
> +
> +               # 3. Rename the file to another sparse filename and
> +               #    accept conflict markers as resolved content.
> +               run_on_all mv folder2/a folder2/z &&
> +               test_all_match git add folder2 &&
> +               test_all_match git status --porcelain=v2 &&
> +
> +               test_all_match git $OPERATION --continue &&
> +               test_all_match git status --porcelain=v2 &&
> +               test_all_match git rev-parse HEAD^{tree} || return 1
> +       done
> +'
> +
>  test_expect_success 'merge with outside renames' '
>         init_repos &&
>
> @@ -670,9 +702,12 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/untracked.txt &&
>         ensure_not_expanded add . &&
>
> -       ensure_not_expanded checkout -f update-deep &&
> -       ensure_not_expanded merge -s ort -m merge update-folder1 &&
> -       ensure_not_expanded merge -s ort -m merge update-folder2
> +       for OPERATION in "merge -s ort -m merge" cherry-pick rebase
> +       do
> +               ensure_not_expanded checkout -f -B temp update-deep &&
> +               ensure_not_expanded $OPERATION update-folder1 &&
> +               ensure_not_expanded $OPERATION update-folder2 || return 1
> +       done

Won't this fail on linux-gcc in the automated testing, since it uses
GIT_TEST_MERGE_ALGORITHM=recursive and you didn't override the merge
strategy for cherry-pick and rebase?

To fix, you'd either need to put a GIT_TEST_MERGE_ALGORITHM=ort at the
beginning of the file, or add `--strategy ort` options to the
cherry-pick and rebase commands.


>  '
>
>  test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
> --
> gitgitgadget

builtin_rebase_interactive_usage, PARSE_OPT_KEEP_ARGV0);

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

if (!is_null_oid(&squash_onto))
opts.squash_onto = &squash_onto;

Expand Down Expand Up @@ -1430,6 +1433,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
usage_with_options(builtin_rebase_usage,
builtin_rebase_options);

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

options.allow_empty_message = 1;
git_config(rebase_config, &options);
/* options.gpg_sign_opt will be either "-S" or NULL */
Expand Down
3 changes: 3 additions & 0 deletions builtin/revert.c
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,9 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
PARSE_OPT_KEEP_ARGV0 |
PARSE_OPT_KEEP_UNKNOWN);

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

/* implies allow_empty */
if (opts->keep_redundant_commits)
opts->allow_empty = 1;
Expand Down
8 changes: 8 additions & 0 deletions diff.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "parse-options.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The diff_populate_filespec() method is used to describe the diff after a
> merge operation is complete, especially when a conflict appears. In
> order to avoid expanding a sparse index, the reuse_worktree_file() needs
> to be adapted to ignore files that are outside of the sparse-checkout
> cone. The file names and OIDs used for this check come from the merged
> tree in the case of the ORT strategy, not the index, hence the ability
> to look into these paths without having already expanded the index.

I'm confused; I thought the diffstat was only shown if the merge was
successful, in which case there would be no conflicts appearing.

Also, I'm not that familiar with the general diff machinery (just the
rename detection parts), but...if the diffstat only shows when the
merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to
HEAD)?  Why would we make use of the working tree at all in such a
case?  And, wouldn't using the working tree be dangerous...what if
there was a merge performed with a dirty working tree?

On a bit of a tangent, I know diffcore-rename.c calls into
diff_populate_filespec() as well, and I have some code doing merges in
a bare repository (where there obviously is no index).  It seemed to
be working, but given this commit message, now I'm wondering if I've
missed something fundamental either in that implementation or there's
something amiss in this patch.  Or both.  Maybe I need to dig into
diff_populate_filespec() more, but it seems you already have.  Any
pointers to orient me on why your changes are right here (and, if you
know, why diffcore-rename.c should or shouldn't be using
diff_populate_filespec() in a bare repo)?


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index a8113f17070..c457cfa0e59 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #include "parse-options.h"
>  #include "help.h"
>  #include "promisor-remote.h"
> +#include "dir.h"
>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -3900,6 +3901,13 @@ static int reuse_worktree_file(struct index_state *istate,
>         if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid))
>                 return 0;
>
> +       /*
> +        * If this path does not match our sparse-checkout definition,
> +        * then the file will not be in the working directory.
> +        */
> +       if (!path_in_sparse_checkout(name, istate))
> +               return 0;
> +
>         /*
>          * Similarly, if we'd have to convert the file contents anyway, that
>          * makes the optimization not worthwhile.
> --
> gitgitgadget
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/20/2021 5:32 PM, Elijah Newren wrote:
> On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The diff_populate_filespec() method is used to describe the diff after a
>> merge operation is complete, especially when a conflict appears. In
>> order to avoid expanding a sparse index, the reuse_worktree_file() needs
>> to be adapted to ignore files that are outside of the sparse-checkout
>> cone. The file names and OIDs used for this check come from the merged
>> tree in the case of the ORT strategy, not the index, hence the ability
>> to look into these paths without having already expanded the index.
> 
> I'm confused; I thought the diffstat was only shown if the merge was
> successful, in which case there would be no conflicts appearing.

That's my mistake. I'll edit the message accordingly.
 
> Also, I'm not that familiar with the general diff machinery (just the
> rename detection parts), but...if the diffstat only shows when the
> merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to
> HEAD)?  Why would we make use of the working tree at all in such a
> case?  And, wouldn't using the working tree be dangerous...what if
> there was a merge performed with a dirty working tree?
> 
> On a bit of a tangent, I know diffcore-rename.c calls into
> diff_populate_filespec() as well, and I have some code doing merges in
> a bare repository (where there obviously is no index).  It seemed to
> be working, but given this commit message, now I'm wondering if I've
> missed something fundamental either in that implementation or there's
> something amiss in this patch.  Or both.  Maybe I need to dig into
> diff_populate_filespec() more, but it seems you already have.  Any
> pointers to orient me on why your changes are right here (and, if you
> know, why diffcore-rename.c should or shouldn't be using
> diff_populate_filespec() in a bare repo)?

I think the cases you are thinking about are covered by this
condition before the one I'm inserting:

	/* We want to avoid the working directory if our caller
	 * doesn't need the data in a normal file, this system
	 * is rather slow with its stat/open/mmap/close syscalls,
	 * and the object is contained in a pack file.  The pack
	 * is probably already open and will be faster to obtain
	 * the data through than the working directory.  Loose
	 * objects however would tend to be slower as they need
	 * to be individually opened and inflated.
	 */
	if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid))
		return 0;

or after:

	/*
	 * Similarly, if we'd have to convert the file contents anyway, that
	 * makes the optimization not worthwhile.
	 */
	if (!want_file && would_convert_to_git(istate, name))
		return 0;

(This makes me think that I should move my new condition further down
so these two can be linked by context.)

Sounds like this is just an optimization, so it is fine to opt out of it
if we think the optimization isn't necessary. Outside of the sparse-checkout
cone qualifies.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 24, 2021 at 11:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/20/2021 5:32 PM, Elijah Newren wrote:
> > On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> The diff_populate_filespec() method is used to describe the diff after a
> >> merge operation is complete, especially when a conflict appears. In
> >> order to avoid expanding a sparse index, the reuse_worktree_file() needs
> >> to be adapted to ignore files that are outside of the sparse-checkout
> >> cone. The file names and OIDs used for this check come from the merged
> >> tree in the case of the ORT strategy, not the index, hence the ability
> >> to look into these paths without having already expanded the index.
> >
> > I'm confused; I thought the diffstat was only shown if the merge was
> > successful, in which case there would be no conflicts appearing.
>
> That's my mistake. I'll edit the message accordingly.
>
> > Also, I'm not that familiar with the general diff machinery (just the
> > rename detection parts), but...if the diffstat only shows when the
> > merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to
> > HEAD)?  Why would we make use of the working tree at all in such a
> > case?  And, wouldn't using the working tree be dangerous...what if
> > there was a merge performed with a dirty working tree?
> >
> > On a bit of a tangent, I know diffcore-rename.c calls into
> > diff_populate_filespec() as well, and I have some code doing merges in
> > a bare repository (where there obviously is no index).  It seemed to
> > be working, but given this commit message, now I'm wondering if I've
> > missed something fundamental either in that implementation or there's
> > something amiss in this patch.  Or both.  Maybe I need to dig into
> > diff_populate_filespec() more, but it seems you already have.  Any
> > pointers to orient me on why your changes are right here (and, if you
> > know, why diffcore-rename.c should or shouldn't be using
> > diff_populate_filespec() in a bare repo)?
>
> I think the cases you are thinking about are covered by this
> condition before the one I'm inserting:
>
>         /* We want to avoid the working directory if our caller
>          * doesn't need the data in a normal file, this system
>          * is rather slow with its stat/open/mmap/close syscalls,
>          * and the object is contained in a pack file.  The pack
>          * is probably already open and will be faster to obtain
>          * the data through than the working directory.  Loose
>          * objects however would tend to be slower as they need
>          * to be individually opened and inflated.
>          */
>         if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid))
>                 return 0;
>
> or after:
>
>         /*
>          * Similarly, if we'd have to convert the file contents anyway, that
>          * makes the optimization not worthwhile.
>          */
>         if (!want_file && would_convert_to_git(istate, name))
>                 return 0;
>
> (This makes me think that I should move my new condition further down
> so these two can be linked by context.)
>
> Sounds like this is just an optimization, so it is fine to opt out of it
> if we think the optimization isn't necessary. Outside of the sparse-checkout
> cone qualifies.

Ah, this is very helpful.  Thanks for digging up these details.

#include "help.h"
#include "promisor-remote.h"
#include "dir.h"

#ifdef NO_FAST_WORKING_DIRECTORY
#define FAST_WORKING_DIRECTORY 0
Expand Down Expand Up @@ -3907,6 +3908,13 @@ static int reuse_worktree_file(struct index_state *istate,
if (!want_file && would_convert_to_git(istate, name))
return 0;

/*
* If this path does not match our sparse-checkout definition,
* then the file will not be in the working directory.
*/
if (!path_in_sparse_checkout(name, istate))
return 0;

len = strlen(name);
pos = index_name_pos(istate, name, len);
if (pos < 0)
Expand Down
15 changes: 15 additions & 0 deletions merge-ort.c
Original file line number Diff line number Diff line change
Expand Up @@ -4058,6 +4058,21 @@ static int record_conflicted_index_entries(struct merge_options *opt)
if (strmap_empty(&opt->priv->conflicted))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Aug 24, 2021 at 2:52 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Merge conflicts happen often enough to want to avoid expanding a sparse
> index when they happen, as long as those conflicts are within the
> sparse-checkout cone. If a conflict exists outside of the
> sparse-checkout cone, then we still need to expand before iterating over
> the index entries. This is critical to do in advance because of how the
> original_cache_nr is tracked to allow inserting and replacing cache
> entries.
>
> Iterate over the conflicted files and check if any paths are outside of
> the sparse-checkout cone. If so, then expand the full index.
>
> Add a test that demonstrates that we do not expand the index, even when
> we hit a conflict within the sparse-checkout cone.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  merge-ort.c                              | 13 +++++++---
>  t/t1092-sparse-checkout-compatibility.sh | 30 ++++++++++++++++++++++--
>  2 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 8e754b769e1..805f7c41397 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -4060,11 +4060,18 @@ static int record_conflicted_index_entries(struct merge_options *opt)
>
>         /*
>          * We are in a conflicted state. These conflicts might be inside
> -        * sparse-directory entries, so expand the index preemtively.
> -        * Also, we set original_cache_nr below, but that might change if
> +        * sparse-directory entries, so check if any entries are outside
> +        * of the sparse-checkout cone preemptively.
> +        *
> +        * We set original_cache_nr below, but that might change if
>          * index_name_pos() calls ask for paths within sparse directories.
>          */
> -       ensure_full_index(index);
> +       strmap_for_each_entry(&opt->priv->conflicted, &iter, e) {
> +               if (!path_in_sparse_checkout(e->key, index)) {
> +                       ensure_full_index(index);
> +                       break;
> +               }
> +       }
>
>         /* If any entries have skip_worktree set, we'll have to check 'em out */
>         state.force = 1;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index dc56252865c..38afdf689a2 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -617,8 +617,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
>  ensure_not_expanded () {
>         rm -f trace2.txt &&
>         echo >>sparse-index/untracked.txt &&
> -       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index "$@" &&
> +
> +       if test "$1" = "!"
> +       then
> +               shift &&
> +               test_must_fail env \
> +                       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +                       git -C sparse-index "$@" || return 1
> +       else
> +               GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +                       git -C sparse-index "$@" || return 1
> +       fi &&
>         test_region ! index ensure_full_index trace2.txt
>  }
>
> @@ -658,6 +667,23 @@ test_expect_success 'sparse-index is not expanded' '
>         )
>  '
>
> +test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
> +       init_repos &&
> +
> +       for side in right left
> +       do
> +               git -C sparse-index checkout -b expand-$side base &&
> +               echo $side >sparse-index/deep/a &&
> +               git -C sparse-index commit -a -m "$side" || return 1
> +       done &&
> +
> +       (
> +               sane_unset GIT_TEST_MERGE_ALGORITHM &&
> +               git -C sparse-index config pull.twohead ort &&
> +               ensure_not_expanded ! merge -m merged expand-right
> +       )

These last five lines could just be replaced with the fourth, if you
just set GIT_TEST_MERGE_ALGORITHM=ort at the beginning of the file.

Are you worrying about testing with recursive in some of the testcases?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 6:47 PM, Elijah Newren wrote:
> On Tue, Aug 24, 2021 at 2:52 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> +       (
>> +               sane_unset GIT_TEST_MERGE_ALGORITHM &&
>> +               git -C sparse-index config pull.twohead ort &&
>> +               ensure_not_expanded ! merge -m merged expand-right
>> +       )
> 
> These last five lines could just be replaced with the fourth, if you
> just set GIT_TEST_MERGE_ALGORITHM=ort at the beginning of the file.
> 
> Are you worrying about testing with recursive in some of the testcases?

Yes. In fact, since I _stopped_ setting GIT_TEST_MERGE_ALGORITHM at the
start of the file since the previous version, I found and fixed a bug
which forms the new Patch 5 (sequencer: ensure full index if not ORT
strategy).

Thanks,
-Stolee

return 0;

/*
* We are in a conflicted state. These conflicts might be inside
* sparse-directory entries, so check if any entries are outside
* of the sparse-checkout cone preemptively.
*
* We set original_cache_nr below, but that might change if
* index_name_pos() calls ask for paths within sparse directories.
*/
strmap_for_each_entry(&opt->priv->conflicted, &iter, e) {
if (!path_in_sparse_checkout(e->key, index)) {
ensure_full_index(index);
break;
}
}

/* If any entries have skip_worktree set, we'll have to check 'em out */
state.force = 1;
state.quiet = 1;
Expand Down
3 changes: 3 additions & 0 deletions merge-recursive.c
Original file line number Diff line number Diff line change
Expand Up @@ -3750,6 +3750,9 @@ int merge_recursive(struct merge_options *opt,
assert(opt->ancestor == NULL ||
!strcmp(opt->ancestor, "constructed merge base"));

prepare_repo_settings(opt->repo);
opt->repo->settings.command_requires_full_index = 1;

if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
return -1;
clean = merge_recursive_internal(opt, h1, h2, merge_bases, result);
Expand Down
9 changes: 9 additions & 0 deletions sequencer.c
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,7 @@ static int do_recursive_merge(struct repository *r,
merge_switch_to_result(&o, head_tree, &result, 1, show_output);
clean = result.clean;
} else {
ensure_full_index(r->index);
clean = merge_trees(&o, head_tree, next_tree, base_tree);
if (is_rebase_i(opts) && clean <= 0)
fputs(o.obuf.buf, stdout);
Expand Down Expand Up @@ -2346,13 +2347,21 @@ static int read_and_refresh_cache(struct repository *r,
_(action_name(opts)));
}
refresh_index(r->index, REFRESH_QUIET|REFRESH_UNMERGED, NULL, NULL, NULL);

if (index_fd >= 0) {
if (write_locked_index(r->index, &index_lock,
COMMIT_LOCK | SKIP_IF_UNCHANGED)) {
return error(_("git %s: failed to refresh the index"),
_(action_name(opts)));
}
}

/*
* If we are resolving merges in any way other than "ort", then
* expand the sparse index.
*/
if (opts->strategy && strcmp(opts->strategy, "ort"))
ensure_full_index(r->index);
return 0;
}

Expand Down
92 changes: 82 additions & 10 deletions t/t1092-sparse-checkout-compatibility.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ test_expect_success 'setup' '
git checkout -b base &&
for dir in folder1 folder2 deep
do
git checkout -b update-$dir &&
git checkout -b update-$dir base &&
echo "updated $dir" >$dir/a &&
git commit -a -m "update $dir" || return 1
done &&
Expand Down Expand Up @@ -481,14 +481,17 @@ test_expect_success 'checkout and reset (mixed) [sparse]' '
test_sparse_match git reset update-folder2
'

test_expect_success 'merge' '
test_expect_success 'merge, cherry-pick, and rebase' '
init_repos &&

test_all_match git checkout -b merge update-deep &&
test_all_match git merge -m "folder1" update-folder1 &&
test_all_match git rev-parse HEAD^{tree} &&
test_all_match git merge -m "folder2" update-folder2 &&
test_all_match git rev-parse HEAD^{tree}
for OPERATION in "merge -m merge" cherry-pick rebase
do
test_all_match git checkout -B temp update-deep &&
test_all_match git $OPERATION update-folder1 &&
test_all_match git rev-parse HEAD^{tree} &&
test_all_match git $OPERATION update-folder2 &&
test_all_match git rev-parse HEAD^{tree} || return 1
done
'

# NEEDSWORK: This test is documenting current behavior, but that
Expand Down Expand Up @@ -524,6 +527,38 @@ test_expect_success 'merge with conflict outside cone' '
test_all_match git rev-parse HEAD^{tree}
'

test_expect_success 'cherry-pick/rebase with conflict outside cone' '
init_repos &&

for OPERATION in cherry-pick rebase
do
test_all_match git checkout -B tip &&
test_all_match git reset --hard merge-left &&
test_all_match git status --porcelain=v2 &&
test_all_match test_must_fail git $OPERATION merge-right &&
test_all_match git status --porcelain=v2 &&

# Resolve the conflict in different ways:
# 1. Revert to the base
test_all_match git checkout base -- deep/deeper2/a &&
test_all_match git status --porcelain=v2 &&

# 2. Add the file with conflict markers
test_all_match git add folder1/a &&
test_all_match git status --porcelain=v2 &&

# 3. Rename the file to another sparse filename and
# accept conflict markers as resolved content.
run_on_all mv folder2/a folder2/z &&
test_all_match git add folder2 &&
test_all_match git status --porcelain=v2 &&

test_all_match git $OPERATION --continue &&
test_all_match git status --porcelain=v2 &&
test_all_match git rev-parse HEAD^{tree} || return 1
done
'

test_expect_success 'merge with outside renames' '
init_repos &&

Expand Down Expand Up @@ -617,8 +652,17 @@ test_expect_success 'sparse-index is expanded and converted back' '
ensure_not_expanded () {
rm -f trace2.txt &&
echo >>sparse-index/untracked.txt &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index "$@" &&

if test "$1" = "!"
then
shift &&
test_must_fail env \
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index "$@" || return 1
else
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index "$@" || return 1
fi &&
test_region ! index ensure_full_index trace2.txt
}

Expand Down Expand Up @@ -647,7 +691,35 @@ test_expect_success 'sparse-index is not expanded' '
echo >>sparse-index/extra.txt &&
ensure_not_expanded add extra.txt &&
echo >>sparse-index/untracked.txt &&
ensure_not_expanded add .
ensure_not_expanded add . &&

ensure_not_expanded checkout -f update-deep &&
test_config -C sparse-index pull.twohead ort &&
(
sane_unset GIT_TEST_MERGE_ALGORITHM &&
for OPERATION in "merge -m merge" cherry-pick rebase
do
ensure_not_expanded merge -m merge update-folder1 &&
ensure_not_expanded merge -m merge update-folder2 || return 1
done
)
'

test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
init_repos &&

for side in right left
do
git -C sparse-index checkout -b expand-$side base &&
echo $side >sparse-index/deep/a &&
git -C sparse-index commit -a -m "$side" || return 1
done &&

(
sane_unset GIT_TEST_MERGE_ALGORITHM &&
git -C sparse-index config pull.twohead ort &&
ensure_not_expanded ! merge -m merged expand-right
)
'

# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
Expand Down