Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalar: two downstream improvements #1569

Closed

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 9, 2023

While updating git-for-windows/git and microsoft/git for the 2.42.0 release window, a few patches that we've been running in those forks for a while came to light as something that would be beneficial to the core Git project. Here are some that are focused on the 'scalar' command.

  • Patch 1 adds a --no-src option to scalar clone to appease users who want to use scalar but object to the creation of the src directory.
  • Patches 2 and 3 help when scalar reconfigure -a fails. Patch 2 is a possibly helpful change on its own for other uses in the future.

Updates in V3

  • Several commit message edits.
  • An important case that was dropped in v2's patch 2 is reintroduced (even though it is modified in patch 3).
  • An error message is added for corrupt Git repositories.

Updates in V2

Thanks, Junio, for the helpful review!

  • In Patch 1, the '--[no-]src' documentation is tightened and the tests check the contents of the repository worktree.
  • In Patch 2, the commit message is reworded to be more clear about positive values of the enum.
  • In Patch 2, the GIT_DIR_NONE option of the enum is never returned, so it does not need to exist. A case in scalar.c referenced it, so it is removed as part of the patch (though that case was removed later by patch 3 anyway).
  • In Patch 2, the discover_git_directory() wrapper is updated to return -1 instead of 1, as it did before this patch.
  • In Patch 3, the 'failed' variable is renamed to 'succeeded' and the cases that update the value are swapped. The return code is set to -1 for any error instead of having a custom value based on the return from error() or error_errno().

Thanks,
-Stolee

cc: gitster@pobox.com
cc: johannes.schindelin@gmx.de
cc: Oswald Buddenhagen oswald.buddenhagen@gmx.de

@derrickstolee derrickstolee self-assigned this Aug 9, 2023
@derrickstolee derrickstolee changed the title scalar: add --[no-]src option scalar: two downstream improvements Aug 9, 2023
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2023

Submitted as pull.1569.git.1692025937.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1569/derrickstolee/scalar-no-src-v1

To fetch this version to local tag pr-1569/derrickstolee/scalar-no-src-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1569/derrickstolee/scalar-no-src-v1

@@ -8,7 +8,8 @@ scalar - A tool for managing large Git repositories
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Some users have strong aversions to Scalar's opinion that the repository
> should be in a 'src' directory, even though it creates a clean slate for
> placing build outputs in adjacent directories.
>
> The --no-src option allows users to opt-out of the default behavior.
>
> While adding options, make sure the usage output by 'scalar clone -h'
> reports the same as the SYNOPSIS line in Documentation/scalar.txt.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/scalar.txt |  8 +++++++-
>  scalar.c                 | 11 +++++++++--
>  t/t9211-scalar-clone.sh  |  8 ++++++++
>  3 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/scalar.txt b/Documentation/scalar.txt
> index f33436c7f65..cd65b3e230d 100644
> --- a/Documentation/scalar.txt
> +++ b/Documentation/scalar.txt
> @@ -8,7 +8,8 @@ scalar - A tool for managing large Git repositories
>  SYNOPSIS
>  --------
>  [verse]
> -scalar clone [--single-branch] [--branch <main-branch>] [--full-clone] <url> [<enlistment>]
> +scalar clone [--single-branch] [--branch <main-branch>] [--full-clone]
> +	[--[no-]src] <url> [<enlistment>]
>  scalar list
>  scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
> @@ -80,6 +81,11 @@ remote-tracking branch for the branch this option was used for the initial
>  cloning. If the HEAD at the remote did not point at any branch when
>  `--single-branch` clone was made, no remote-tracking branch is created.
>  
> +--[no-]src::
> +	Specify if the repository should be created within a `src` directory
> +	within `<enlistment>`. This is the default behavior, so use
> +	`--no-src` to opt-out of the creation of the `src` directory.

While there is nothing incorrect in the above per-se, and the first
half of the description is perfectly good, but I find the latter
half places too much stress on the existence of the "src" directory.
As a mere mortal end-user, what is more important is not the
presence of an extra directory, but the fact that everything I have
is now moved one level down in the directory hierarchy to "src/"
directory.

	This is the default behavior; use `--no-src` to place the
	root of the working tree of the repository directly at
	`<enlistment>`.

or something along that line would have been easier to understand
for me.  It is not the creation of `src`, but that everything is
moved into it, is what some users may find unusual.

> +test_expect_success '`scalar clone --no-src`' '
> +	scalar clone --src "file://$(pwd)/to-clone" with-src &&
> +	scalar clone --no-src "file://$(pwd)/to-clone" without-src &&
> +
> +	test_path_is_dir with-src/src &&
> +	test_path_is_missing without-src/src
> +'

And another thing that may be interesting, from the above point of
view, is to compare these two:

	(cd with-src/src && ls ?*) >with &&
	(cd without && ls ?*) >without &&
	test_cmp with without

Both output should look something like

    cron.txt
    first.t
    second.t
    third.t

and the earlier confusion point I raised was that

	(cd with-src && ls ?*)

would not look like

    cron.txt
    first.t
    second.t
    src/
    third.t

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/14/2023 12:02 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> +--[no-]src::
>> +	Specify if the repository should be created within a `src` directory
>> +	within `<enlistment>`. This is the default behavior, so use
>> +	`--no-src` to opt-out of the creation of the `src` directory.
> 
> While there is nothing incorrect in the above per-se, and the first
> half of the description is perfectly good, but I find the latter
> half places too much stress on the existence of the "src" directory.
> As a mere mortal end-user, what is more important is not the
> presence of an extra directory, but the fact that everything I have
> is now moved one level down in the directory hierarchy to "src/"
> directory.
> 
> 	This is the default behavior; use `--no-src` to place the
> 	root of the working tree of the repository directly at
> 	`<enlistment>`.
> 
> or something along that line would have been easier to understand
> for me.  It is not the creation of `src`, but that everything is
> moved into it, is what some users may find unusual.

Your confusion makes sense. Focusing on the location of the cloned
repository is a good way to focus the option for the reader.
 
>> +test_expect_success '`scalar clone --no-src`' '
>> +	scalar clone --src "file://$(pwd)/to-clone" with-src &&
>> +	scalar clone --no-src "file://$(pwd)/to-clone" without-src &&
>> +
>> +	test_path_is_dir with-src/src &&
>> +	test_path_is_missing without-src/src
>> +'
> 
> And another thing that may be interesting, from the above point of
> view, is to compare these two:
> 
> 	(cd with-src/src && ls ?*) >with &&
> 	(cd without && ls ?*) >without &&
> 	test_cmp with without

Good idea.

Thanks,
-Stolee

@@ -1221,19 +1221,6 @@ static const char *allowed_bare_repo_to_string(
return NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> There are many reasons why discovering a Git directory may fail. In
> particular, 8959555cee7 (setup_git_directory(): add an owner check for
> the top-level directory, 2022-03-02) added ownership checks as a
> security precaution.
>
> Callers attempting to set up a Git directory may want to inform the user
> about the reason for the failure. For that, expose the enum
> discovery_result from within setup.c and into cache.h where
> discover_git_directory() is defined.
>
> I initially wanted to change the return type of discover_git_directory()
> to be this enum, but several callers rely upon the "zero means success".
> The two problems with this are:
>
> 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>    results are errors.

True. discover_git_directory() already knows that negative return
values from setup_git_directory_gently_1() signal errors while 0 or
positive are OK.

> 2. There are multiple successful states, so some positive results are
>    successful.

Makes it sound as if some positive results are not successes, but is
that really the case?

> Instead of updating all callers immediately, add a new method,
> discover_git_directory_reason(), and convert discover_git_directory() to
> be a thin shim on top of it.

It makes sense to insulate callers who only want to know if the
discovery was successful or not (there only are two existing callers
anyway) from the details.  And turning a thin wrapper to the new API
that gives richer return codes is the way to go.  Nicely designed.

> Because there are extra checks that discover_git_directory_reason() does
> after setup_git_directory_gently_1(), there are other modes that can be
> returned for failure states. Add these modes to the enum, but be sure to
> explicitly add them as BUG() states in the switch of
> setup_git_directory_gently().

Good.

> -enum discovery_result {
> -	GIT_DIR_NONE = 0,
> -	GIT_DIR_EXPLICIT,
> -	GIT_DIR_DISCOVERED,
> -	GIT_DIR_BARE,
> -	/* these are errors */
> -	GIT_DIR_HIT_CEILING = -1,
> -	GIT_DIR_HIT_MOUNT_POINT = -2,
> -	GIT_DIR_INVALID_GITFILE = -3,
> -	GIT_DIR_INVALID_OWNERSHIP = -4,
> -	GIT_DIR_DISALLOWED_BARE = -5,
> -};

So we promote this discovery_result, that was private implementation
detail inside the setup code, to a public interface.  Is GIT_DIR_
prefix still appropriate, or would it make more sense to have a
common substring derived from the word DISCOVERY in them?

> @@ -1385,21 +1372,22 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  	}
>  }
>  
> -int discover_git_directory(struct strbuf *commondir,
> -			   struct strbuf *gitdir)
> +enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
> +						    struct strbuf *gitdir)
>  {
>  	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
>  	size_t gitdir_offset = gitdir->len, cwd_len;
>  	size_t commondir_offset = commondir->len;
>  	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
> +	enum discovery_result result;
>  
>  	if (strbuf_getcwd(&dir))
> -		return -1;
> +		return GIT_DIR_CWD_FAILURE;

Makes sense.

>  	cwd_len = dir.len;
> -	if (setup_git_directory_gently_1(&dir, gitdir, NULL, 0) <= 0) {
> +	if ((result = setup_git_directory_gently_1(&dir, gitdir, NULL, 0)) <= 0) {

Can we split this into two simple statements?

	result = setup_git_directory_gently_1(...);
	if (result <= 0) {

>  		strbuf_release(&dir);
> -		return -1;
> +		return result;
>  	}

OK.

> @@ -1429,11 +1417,11 @@ int discover_git_directory(struct strbuf *commondir,
>  		strbuf_setlen(commondir, commondir_offset);
>  		strbuf_setlen(gitdir, gitdir_offset);
>  		clear_repository_format(&candidate);
> -		return -1;
> +		return GIT_DIR_INVALID_FORMAT;

OK, so this thing is new.  Earlier we thought we found a good
GIT_DIR but it turns out the repository is something we cannot use.
Over time we may acquire such a "now we found it, is it really good?"
sanity checks, but for now, this is the only case that turns what
gently_1() thought was good into a bad one.  OK.

>  	}
>  
>  	clear_repository_format(&candidate);
> -	return 0;
> +	return result;

And it makes perfect sense that everybody who passed such a "post
discovery check" are OK and we return the result from gently_1().

> @@ -1516,9 +1504,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		*nongit_ok = 1;
>  		break;
>  	case GIT_DIR_NONE:
> +	case GIT_DIR_CWD_FAILURE:
> +	case GIT_DIR_INVALID_FORMAT:
>  		/*
>  		 * As a safeguard against setup_git_directory_gently_1 returning
> -		 * this value, fallthrough to BUG. Otherwise it is possible to
> +		 * these values, fallthrough to BUG. Otherwise it is possible to
>  		 * set startup_info->have_repository to 1 when we did nothing to
>  		 * find a repository.
>  		 */

OK.

Not a new problem, but does anybody explicitly or implicitly return
DIR_NONE?  I didn't find any codepath that does so.  Presumably it
may have been arranged in the hope that a code structured like so:

	enum discovery_result res = GIT_DIR_NONE;

	if (some complex condition)
		res = ...;
	else if (another complex condition)
		res = ...;

	... sometime later ...
	if (res <= 0)
		we found a bad one

would ensure that "res" untouched by setup_git_directory_gently_1()
is still an error, but I am not sure if it is effective, given that
nobody uses GIT_DIR_NONE to assign or initialize anything.  And the
same effect can be had by leaving 'res' uninitialized---the compilers
are our friend.

Not a part of this review, but I wonder if it makes sense for us to
get rid of DIR_NONE.

> -int discover_git_directory(struct strbuf *commondir,
> -			   struct strbuf *gitdir);
> +static inline int discover_git_directory(struct strbuf *commondir,
> +					 struct strbuf *gitdir)
> +{
> +	return discover_git_directory_reason(commondir, gitdir) <= 0;
> +}

The _reason() thing is more or less like setup_git_directory_gently_1()
in that positives are success and everything else is an error.  And
the point of keeping this thin wrapper as discover_git_directory() is
to insulate the existing callers that discover_git_directory() returns
-1 for failure, while returning 0 for success.

So this wrapper smells very wrong.  It now flips the polarity of the
error return into a positive 1, no?  That does not achieve the goal
to insulate the callers from the change in implementation.

Other than that, as you wrote in the cover letter, I think it is an
excellent move to have an interface to expose details of what
discovery has found.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

>> 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>>    results are errors.
>
> True. discover_git_directory() already knows that negative return
> values from setup_git_directory_gently_1() signal errors while 0 or
> positive are OK.

NOnononono.  negative are not.  0 is not returned, so if we saw one,
it would be an error.  And positives are OK.

Sorry for the confusion.

@@ -657,6 +664,7 @@ static int cmd_reconfigure(int argc, const char **argv)
git_config(get_scalar_repos, &scalar_repos);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/scalar.c b/scalar.c
> index 938bb73f3ce..7d87d7ea724 100644
> --- a/scalar.c
> +++ b/scalar.c
> @@ -664,6 +664,7 @@ static int cmd_reconfigure(int argc, const char **argv)
>  	git_config(get_scalar_repos, &scalar_repos);
>  
>  	for (i = 0; i < scalar_repos.nr; i++) {
> +		int failed = 0;
>  		const char *dir = scalar_repos.items[i].string;

OK.  You need a variable that lets you tell if the repository this
round of the loop dealt with was good, and do not want to abort the
loop, so you cannot reuse the "res" outside of the loop.  Makes sort
of sense.

I wonder if it makes it simpler to initialize "failed" to true
and clear it when you see it succeeded, though.

> @@ -674,30 +675,51 @@ static int cmd_reconfigure(int argc, const char **argv)
>  
>  			if (errno != ENOENT) {
>  				warning_errno(_("could not switch to '%s'"), dir);
> +				failed = -1;
> +				goto loop_end;

Such a change lets you drop this assignment ...

>  			}
>  
>  			strbuf_addstr(&buf, dir);
>  			if (remove_deleted_enlistment(&buf))
> +				failed = error(_("could not remove stale "
> +						 "scalar.repo '%s'"), dir);

... and this one, but ...

>  			else
> -				warning(_("removing stale scalar.repo '%s'"),
> +				warning(_("removed stale scalar.repo '%s'"),
>  					dir);

... you'd need to drop, i.e. "failed = 0", here while you warn.  It
is a nice touch to update the message, by the way.

>  			strbuf_release(&buf);
> +			goto loop_end;
> +		}
> +
> +		switch (discover_git_directory_reason(&commondir, &gitdir)) {
> +		case GIT_DIR_INVALID_OWNERSHIP:
> +			warning(_("repository at '%s' has different owner"), dir);
> +			failed = -1;
> +			goto loop_end;
> +
> +		case GIT_DIR_DISCOVERED:
> +			break;

... and you'd need to drop i.e. "failed = 0", here, too. but
other assignments in the switch can go.

> +		default:
> +			warning(_("repository not found in '%s'"), dir);
> +			failed = -1;
> +			break;
> +		}
> +
> +		git_config_clear();
> +
> +		the_repository = &r;
> +		r.commondir = commondir.buf;
> +		r.gitdir = gitdir.buf;
> +
> +		if (set_recommended_config(1) < 0)
> +			failed = -1;

And the polarity of the check and assignment here needs flipping.

> +loop_end:
> +		if (failed) {
> +			res = failed;

This assignment is a bit misleading, as if the value in "failed"
actually matters, when it does not.  It is merely a "did we not
succeed this round, 0 or non-zero?" boolean.  It would have been
easier to see what was going on by saying

	if (failed) {
		res = -1;

here, I would think.

> +			warning(_("to unregister this repository from Scalar, run\n"
> +				  "\tgit config --global --unset --fixed-value scalar.repo \"%s\""),
> +				dir);
>  		}
>  	}

Other than that, this step nicely justifies why the previous step
[PATCH 2/3] is a good thing to do.

Thanks.

@derrickstolee derrickstolee force-pushed the scalar-no-src branch 2 times, most recently from 002b7c6 to 7ac7311 Compare August 22, 2023 16:42
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2023

Submitted as pull.1569.v2.git.1692725056.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1569/derrickstolee/scalar-no-src-v2

To fetch this version to local tag pr-1569/derrickstolee/scalar-no-src-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1569/derrickstolee/scalar-no-src-v2

scalar.c Outdated
@@ -679,9 +686,6 @@ static int cmd_reconfigure(int argc, const char **argv)
warning(_("removing stale scalar.repo '%s'"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> There are many reasons why discovering a Git directory may fail. In
> particular, 8959555cee7 (setup_git_directory(): add an owner check for
> the top-level directory, 2022-03-02) added ownership checks as a
> security precaution.
>
> Callers attempting to set up a Git directory may want to inform the user
> about the reason for the failure. For that, expose the enum
> discovery_result from within setup.c and into cache.h where
> discover_git_directory() is defined.
>
> I initially wanted to change the return type of discover_git_directory()
> to be this enum, but several callers rely upon the "zero means success".
> The two problems with this are:
>
> 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>    results are errors.
>
> 2. There are multiple successful states; positive results are
>    successful.
>
> It is worth noting that GIT_DIR_NONE is not returned, so we remove this
> option from the enum. We must be careful to keep the successful reasons
> as positive values, so they are given explicit positive values.
> Further, a use in scalar.c was previously impossible, so it is removed.
>
> Instead of updating all callers immediately, add a new method,
> discover_git_directory_reason(), and convert discover_git_directory() to
> be a thin shim on top of it.
>
> One thing that is important to note is that discover_git_directory()
> previously returned -1 on error, so let's continue that into the future.
> There is only one caller (in scalar.c) that depends on that signedness
> instead of a non-zero check, so clean that up, too.
>
> Because there are extra checks that discover_git_directory_reason() does
> after setup_git_directory_gently_1(), there are other modes that can be
> returned for failure states. Add these modes to the enum, but be sure to
> explicitly add them as BUG() states in the switch of
> setup_git_directory_gently().
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  scalar.c |  3 ---
>  setup.c  | 34 ++++++++++++----------------------
>  setup.h  | 35 ++++++++++++++++++++++++++++++++---
>  3 files changed, 44 insertions(+), 28 deletions(-)
>
> diff --git a/scalar.c b/scalar.c
> index 938bb73f3ce..02a38e845e1 100644
> --- a/scalar.c
> +++ b/scalar.c
> @@ -686,9 +686,6 @@ static int cmd_reconfigure(int argc, const char **argv)
>  				warning(_("removing stale scalar.repo '%s'"),
>  					dir);
>  			strbuf_release(&buf);
> -		} else if (discover_git_directory(&commondir, &gitdir) < 0) {
> -			warning_errno(_("git repository gone in '%s'"), dir);
> -			res = -1;
>  		} else {
>  			git_config_clear();
>  

In the original before this series, and also after applying [PATCH
3/3], the reconfiguration is a three-step process:

 - what if we cannot go to that directory?
 - what if the directory is not a usable repository?
 - now we are in a usable repository, let's reconfigure.

and this seems to lose the second step tentatively?  It is
resurrected in a more enhanced form in [PATCH 3/3] so it may not be
a huge deal, but it looks like it is not intended lossage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2023 3:30 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>

>> It is worth noting that GIT_DIR_NONE is not returned, so we remove this
>> option from the enum. We must be careful to keep the successful reasons
>> as positive values, so they are given explicit positive values.
>> Further, a use in scalar.c was previously impossible, so it is removed.

(Relevant bit from the message.)

>> diff --git a/scalar.c b/scalar.c
>> index 938bb73f3ce..02a38e845e1 100644
>> --- a/scalar.c
>> +++ b/scalar.c
>> @@ -686,9 +686,6 @@ static int cmd_reconfigure(int argc, const char **argv)
>>  				warning(_("removing stale scalar.repo '%s'"),
>>  					dir);
>>  			strbuf_release(&buf);
>> -		} else if (discover_git_directory(&commondir, &gitdir) < 0) {
>> -			warning_errno(_("git repository gone in '%s'"), dir);
>> -			res = -1;
>>  		} else {
>>  			git_config_clear();
>>  
> 
> In the original before this series, and also after applying [PATCH
> 3/3], the reconfiguration is a three-step process:
> 
>  - what if we cannot go to that directory?
>  - what if the directory is not a usable repository?
>  - now we are in a usable repository, let's reconfigure.
> 
> and this seems to lose the second step tentatively?  It is
> resurrected in a more enhanced form in [PATCH 3/3] so it may not be
> a huge deal, but it looks like it is not intended lossage.

I know what happened here. At some point during my edits, this line was
changed to "else if (!discover_git_directory(...))" which became an
unreachable case.

So, based on the intermediate patch that did not survive, it made sense,
but you are right that this hunk of this patch is behaving badly. Good
eye!

Thanks,
-Stolee

@@ -657,6 +664,7 @@ static int cmd_reconfigure(int argc, const char **argv)
git_config(get_scalar_repos, &scalar_repos);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -664,6 +664,7 @@ static int cmd_reconfigure(int argc, const char **argv)
>  	git_config(get_scalar_repos, &scalar_repos);
>  
>  	for (i = 0; i < scalar_repos.nr; i++) {
> +		int succeeded = 0;
>  		const char *dir = scalar_repos.items[i].string;
>  
>  		strbuf_reset(&commondir);
> @@ -674,27 +675,51 @@ static int cmd_reconfigure(int argc, const char **argv)
>  
>  			if (errno != ENOENT) {
>  				warning_errno(_("could not switch to '%s'"), dir);
> -				res = -1;
> -				continue;
> +				goto loop_end;

This is after seeing chdir(dir) failed.  If the user manually
removed the enlisted directory, ENOENT would be one of the most
likely errors.  If the user dropped a file to the place after it was
vacated, we may get ENOTDIR, which is also not so bad.

In any case, is it desirable to keep the enlistment still configured
by jumping to loop_end in these "other" error conditions?  If the
reason why we cannot chdir() into it is because of some tentative
glitch that may resolve by itself, retaining the enlistment data may
have value, because it can be reused without the user having to
recreate the enlistment when the "tentatively unavailable" directory
comes back online, I guess, but how realistic would such an error
be?

>  			}
>  
>  			strbuf_addstr(&buf, dir);
>  			if (remove_deleted_enlistment(&buf))
> -				res = error(_("could not remove stale "
> -					      "scalar.repo '%s'"), dir);
> -			else
> -				warning(_("removing stale scalar.repo '%s'"),
> +				error(_("could not remove stale "
> +					"scalar.repo '%s'"), dir);
> +			else {
> +				warning(_("removed stale scalar.repo '%s'"),
>  					dir);
> +				succeeded = 1;
> +			}
>  			strbuf_release(&buf);
> -		} else {
> -			git_config_clear();
> +			goto loop_end;
> +		}

So the above is "what if we fail to chdir()", which looked sensible.
Then comes the "what if we don't have a usable repository there?", which
was lost in [PATCH 2/3].

> +		switch (discover_git_directory_reason(&commondir, &gitdir)) {
> +		case GIT_DIR_INVALID_OWNERSHIP:
> +			warning(_("repository at '%s' has different owner"), dir);
> +			goto loop_end;
> +
> +		case GIT_DIR_DISCOVERED:
> +			succeeded = 1;
> +			break;
> +
> +		default:
> +			warning(_("repository not found in '%s'"), dir);
> +			break;

Among the error cases, INVALID_OWNERSHIP is one of the possibilities
that merits specialized message to the end-user.  I wonder if others
also deserve to be explained, though.

 - HIT_CEILING and HIT_MOUNT_POINT will happen when there is no
   usable repository between "dir" and the specified ceiling.

 - INVALID_GITFILE and INVALID_FORMAT are signs of some repository
   corruption.

 - DISALLOWED_BARE is unlikely to happen in the scalar context.

> +		}
> +
> +		git_config_clear();
> +
> +		the_repository = &r;
> +		r.commondir = commondir.buf;
> +		r.gitdir = gitdir.buf;
>  
> -			the_repository = &r;
> -			r.commondir = commondir.buf;
> -			r.gitdir = gitdir.buf;
> +		if (set_recommended_config(1) >= 0)
> +			succeeded = 1;
>  
> -			if (set_recommended_config(1) < 0)
> -				res = -1;
> +loop_end:
> +		if (!succeeded) {
> +			res = -1;
> +			warning(_("to unregister this repository from Scalar, run\n"
> +				  "\tgit config --global --unset --fixed-value scalar.repo \"%s\""),
> +				dir);

Ah, OK.  So the strategy is to punt on accepting the responsibility
for removing an inaccessible directory; rather, we just report that
we had trouble chdir() and let the user decide.  Which makes sense.

Thanks.

>  		}
>  	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2023 3:45 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> +		switch (discover_git_directory_reason(&commondir, &gitdir)) {
>> +		case GIT_DIR_INVALID_OWNERSHIP:
>> +			warning(_("repository at '%s' has different owner"), dir);
>> +			goto loop_end;
>> +
>> +		case GIT_DIR_DISCOVERED:
>> +			succeeded = 1;
>> +			break;
>> +
>> +		default:
>> +			warning(_("repository not found in '%s'"), dir);
>> +			break;
> 
> Among the error cases, INVALID_OWNERSHIP is one of the possibilities
> that merits specialized message to the end-user.  I wonder if others
> also deserve to be explained, though.

The specific choice of GIT_DIR_INVALID_OWNERSHIP is singled out
because it's a new-ish reason and is the most confusing to users
when things fail for this reason.
 
>  - HIT_CEILING and HIT_MOUNT_POINT will happen when there is no
>    usable repository between "dir" and the specified ceiling.

These are basically "didn't find a Git repo" but there are different
reasons why Git stopped looking. I'm not sure there is something more
valuable to indicate here than the "repository not found" message
that already exists.

>  - INVALID_GITFILE and INVALID_FORMAT are signs of some repository
>    corruption.

I can add a message for this kind of error, which seems helpful to
point out to a user.

>  - DISALLOWED_BARE is unlikely to happen in the scalar context.

I agree.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> On 8/22/2023 3:45 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>> +		switch (discover_git_directory_reason(&commondir, &gitdir)) {
>>> +		case GIT_DIR_INVALID_OWNERSHIP:
>>> +			warning(_("repository at '%s' has different owner"), dir);
>>> +			goto loop_end;
>>> +
>>> +		case GIT_DIR_DISCOVERED:
>>> +			succeeded = 1;
>>> +			break;
>>> +
>>> +		default:
>>> +			warning(_("repository not found in '%s'"), dir);
>>> +			break;
>> 
>> Among the error cases, INVALID_OWNERSHIP is one of the possibilities
>> that merits specialized message to the end-user.  I wonder if others
>> also deserve to be explained, though.
>
> The specific choice of GIT_DIR_INVALID_OWNERSHIP is singled out
> because it's a new-ish reason and is the most confusing to users
> when things fail for this reason.
>  
>>  - HIT_CEILING and HIT_MOUNT_POINT will happen when there is no
>>    usable repository between "dir" and the specified ceiling.
>
> These are basically "didn't find a Git repo" but there are different
> reasons why Git stopped looking. I'm not sure there is something more
> valuable to indicate here than the "repository not found" message
> that already exists.

OK.  I just know that "not found" will be greeted by "stupid Git, if
you go one level up, there is a .git/ directory!", now we have many
users than we used to have and people forget what they configured.

But I think it would apply much less to users who see "repository
not found" in the "scalar reconfigure" than ones who manually
created a repository, and then forgot that they shuffled the disks
around with cross mounting.  So I agree with you that it is not
essential to mention these reasons in this codepath (and
setup_git_directory() does have a reasonable message for at least
the mount-point case that covers the more general case).

>>  - INVALID_GITFILE and INVALID_FORMAT are signs of some repository
>>    corruption.
>
> I can add a message for this kind of error, which seems helpful to
> point out to a user.

Maybe.  We can do that in a follow-up topic separately.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2023

This branch is now known as ds/scalar-updates.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2023

This patch series was integrated into seen via git@f2823a1.

@gitgitgadget gitgitgadget bot added the seen label Aug 22, 2023
@@ -8,7 +8,8 @@ scalar - A tool for managing large Git repositories
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Oswald Buddenhagen wrote (reply to this):

just nitpicking the commit message:

On Tue, Aug 22, 2023 at 05:24:13PM +0000, Derrick Stolee via GitGitGadget wrote:
>From: Derrick Stolee <derrickstolee@github.com>
>
>Some users have strong aversions to Scalar's opinion that the repository
>should be in a 'src' directory, even though

>it
>
i'd use "this" here.

>creates a clean slate for
>placing build

>outputs
>
"artifacts" maybe.

>in adjacent directories.
>

>The
>
i'd insert "new" here.

>--no-src option allows users to

>opt-out
>
pedantically, there should be no dash here, as it's a regular verbal phrase. the dash should be used when it's turned into a noun, "to provide an opt-out". at least i think so ...

>of the default behavior.
>
>While adding options, make sure the usage output by 'scalar clone -h'
>reports the same as the SYNOPSIS line in Documentation/scalar.txt.
>

regards

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 23, 2023

User Oswald Buddenhagen <oswald.buddenhagen@gmx.de> has been added to the cc: list.

scalar.c Outdated
@@ -679,9 +686,6 @@ static int cmd_reconfigure(int argc, const char **argv)
warning(_("removing stale scalar.repo '%s'"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Oswald Buddenhagen wrote (reply to this):

On Tue, Aug 22, 2023 at 05:24:14PM +0000, Derrick Stolee via GitGitGadget wrote:
>From: Derrick Stolee <derrickstolee@github.com>
>
>There are many reasons why discovering a Git directory may fail. In
>particular, 8959555cee7 (setup_git_directory(): add an owner check for
>the top-level directory, 2022-03-02) added ownership checks as a
>security precaution.
>
>Callers attempting to set up a Git directory may want to inform the user
>about the reason for the failure. For that, expose the enum
>discovery_result from within setup.c

>and
>
"by moving it"

>into cache.h where
>discover_git_directory() is defined.
>
>I initially wanted to change the return type of discover_git_directory()
>to be this enum, but several callers rely upon the "zero means success".
>The two problems with this are:
>
>1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>   results are errors.
>
>2. There are multiple successful states; positive results are
>   successful.
>
>It is worth noting that GIT_DIR_NONE is not returned, so we remove this
>option from the enum. We must be careful to keep the successful reasons
>as positive values, so they are given explicit positive values.

>Further, a use in scalar.c was previously impossible, so it is removed.
>
i have no clue wha this means. what is "it"?

>Instead of updating all callers immediately, add a new method,
>discover_git_directory_reason(), and convert discover_git_directory() to
>be a thin shim on top of it.
>
is this really worth it, given that there are just two callers, and the adjustment is trivial?
if you insist, note that the function name can be easily misread as "discover_git_(directory_reason)", which is unhelpful. i typically use an _impl suffix in such "thin wrapper" scenarios.

regards

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2023

This patch series was integrated into seen via git@6065f8c.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2023

This patch series was integrated into seen via git@fe67b79.

Some users have strong aversions to Scalar's opinion that the repository
should be in a 'src' directory, even though this creates a clean slate
for placing build artifacts in adjacent directories.

The new --no-src option allows users to opt out of the default behavior.

While adding options, make sure the usage output by 'scalar clone -h'
reports the same as the SYNOPSIS line in Documentation/scalar.txt.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
There are many reasons why discovering a Git directory may fail. In
particular, 8959555 (setup_git_directory(): add an owner check for
the top-level directory, 2022-03-02) added ownership checks as a
security precaution.

Callers attempting to set up a Git directory may want to inform the user
about the reason for the failure. For that, expose the enum
discovery_result from within setup.c and move it into cache.h where
discover_git_directory() is defined.

I initially wanted to change the return type of discover_git_directory()
to be this enum, but several callers rely upon the "zero means success".
The two problems with this are:

1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
   results are errors.

2. There are multiple successful states; positive results are
   successful.

It is worth noting that GIT_DIR_NONE is not returned, so we remove this
option from the enum. We must be careful to keep the successful reasons
as positive values, so they are given explicit positive values.

Instead of updating all callers immediately, add a new method,
discover_git_directory_reason(), and convert discover_git_directory() to
be a thin shim on top of it.

One thing that is important to note is that discover_git_directory()
previously returned -1 on error, so let's continue that into the future.
There is only one caller (in scalar.c) that depends on that signedness
instead of a non-zero check, so clean that up, too.

Because there are extra checks that discover_git_directory_reason() does
after setup_git_directory_gently_1(), there are other modes that can be
returned for failure states. Add these modes to the enum, but be sure to
explicitly add them as BUG() states in the switch of
setup_git_directory_gently().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
When running 'scalar reconfigure -a', Scalar has warning messages about
the repository missing (or not containing a .git directory). Failures
can also happen while trying to modify the repository-local config for
that repository.

These warnings may seem confusing to users who don't understand what
they mean or how to stop them.

Add a warning that instructs the user how to remove the warning in
future installations.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2023

This patch series was integrated into seen via git@adaefe1.

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 28, 2023

Submitted as pull.1569.v3.git.1693230746.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1569/derrickstolee/scalar-no-src-v3

To fetch this version to local tag pr-1569/derrickstolee/scalar-no-src-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1569/derrickstolee/scalar-no-src-v3

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 28, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> While updating git-for-windows/git and microsoft/git for the 2.42.0 release
> window, a few patches that we've been running in those forks for a while
> came to light as something that would be beneficial to the core Git project.
> Here are some that are focused on the 'scalar' command.
>
>  * Patch 1 adds a --no-src option to scalar clone to appease users who want
>    to use scalar but object to the creation of the src directory.
>  * Patches 2 and 3 help when scalar reconfigure -a fails. Patch 2 is a
>    possibly helpful change on its own for other uses in the future.
>
>
> Updates in V3
> =============
>
>  * Several commit message edits.
>  * An important case that was dropped in v2's patch 2 is reintroduced (even
>    though it is modified in patch 3).
>  * An error message is added for corrupt Git repositories.

Thanks.  The updated series look good.  Especially the updates to
the commit message all looked excellent.

Will queue.  Let's merge it down to 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 28, 2023

This patch series was integrated into seen via git@975c983.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 28, 2023

This patch series was integrated into next via git@093e6bc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant