Skip to content

Commit

Permalink
Merge pull request #180: Sparse-checkout builtin: upstreamable version
Browse files Browse the repository at this point in the history
I'm creating this PR for a fourth time (see #163, #171, and #178 for earlier versions). This version is tracking my progress to create something that can be sent as an RFC upstream, but also can be used to start the sparse feature branch. This is based on v2.23.0.

Note: I currently have conflicts with the virtual filesystem feature, and I'll resolve those with a merge commit when I'm ready. I'm just creating this for tracking progress at the moment, but can also be a place for early feedback.

* git sparse-checkout disable to disable the sparse-checkout feature and return to a full checkout.
* git sparse-checkout init --cone, to initialize in cone mode.
* git sparse-checkout add when core.sparseCheckout=cone.

This includes performance improvements with the hashset-based matching algorithm, but I'll leave further enhancements as smaller steps on top. Here are a few things I want to try:

Track the maximum depth of a prefix pattern, so we know to not run hashes for deeper paths.
Use the "known exclude" bit in cone mode to stop hashing paths we know will not be included.
Use the "known include" bit in cone mode to stop hashing paths we know will be included. This is more difficult than "known exclude" because we need to distinguish directories from files when doing path matches so we don't give a directory a "known include" when it isn't a recursive pattern match.
  • Loading branch information
derrickstolee committed Aug 22, 2019
2 parents 66bf815 + 181fb3b commit 8d3da92
Show file tree
Hide file tree
Showing 18 changed files with 975 additions and 19 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@
/git-show-branch
/git-show-index
/git-show-ref
/git-sparse-checkout
/git-stage
/git-stash
/git-status
Expand Down
7 changes: 5 additions & 2 deletions Documentation/config/core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -660,8 +660,11 @@ core.gvfs::
--

core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
Enable "sparse checkout" feature. If "false", then sparse-checkout
is disabled. If "true", then sparse-checkout is enabled with the full
.gitignore pattern set. If "cone", then sparse-checkout is enabled with
a restricted pattern set. See linkgit:git-sparse-checkout[1] for more
information.

core.abbrev::
Set the length object names are abbreviated to. If
Expand Down
8 changes: 7 additions & 1 deletion Documentation/git-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ SYNOPSIS
[--dissociate] [--separate-git-dir <git dir>]
[--depth <depth>] [--[no-]single-branch] [--no-tags]
[--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
[--[no-]remote-submodules] [--jobs <n>] [--] <repository>
[--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
[<directory>]

DESCRIPTION
Expand Down Expand Up @@ -156,6 +156,12 @@ objects from the source repository into a pack in the cloned repository.
used, neither remote-tracking branches nor the related
configuration variables are created.

--sparse::
Initialize the sparse-checkout file so the working
directory starts with only the files in the root
of the repository. The sparse-checkout file can be
modified to grow the working directory as needed.

--mirror::
Set up a mirror of the source repository. This implies `--bare`.
Compared to `--bare`, `--mirror` not only maps local branches of the
Expand Down
2 changes: 1 addition & 1 deletion Documentation/git-read-tree.txt
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ support.
SEE ALSO
--------
linkgit:git-write-tree[1]; linkgit:git-ls-files[1];
linkgit:gitignore[5]
linkgit:gitignore[5]; linkgit:git-sparse-checkout[1];

GIT
---
Expand Down
146 changes: 146 additions & 0 deletions Documentation/git-sparse-checkout.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
git-sparse-checkout(1)
=======================

NAME
----
git-sparse-checkout - Initialize and modify the sparse-checkout
configuration, which reduces the checkout to a set of directories
given by a list of prefixes.


SYNOPSIS
--------
[verse]
'git sparse-checkout <subcommand> [options]'


DESCRIPTION
-----------

Initialize and modify the sparse-checkout configuration, which reduces
the checkout to a set of directories given by a list of prefixes.


COMMANDS
--------
'list'::
Provide a list of the contents in the sparse-checkout file.

'init'::
Enable the `core.sparseCheckout` setting. If the
sparse-checkout file does not exist, then populate it with
patterns that match every file in the root directory and
no other directories, then will remove all directories tracked
by Git. Add patterns to the sparse-checkout file to
repopulate the working directory.

'add'::
Add a set of patterns to the sparse-checkout file, as given over
stdin. Updates the working directory to match the new patterns.

'disable'::
Remove the sparse-checkout file, set `core.sparseCheckout` to
`false`, and restore the working directory to include all files.

SPARSE CHECKOUT
----------------

"Sparse checkout" allows populating the working directory sparsely.
It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
Git whether a file in the working directory is worth looking at. If
the skip-worktree bit is set, then the file is ignored in the working
directory. Git will not populate the contents of those files, which
makes a sparse checkout helpful when working in a repository with many
files, but only a few are important to the current user.

The `$GIT_DIR/info/sparse-checkout` file is used to define the
skip-worktree reference bitmap. When Git updates the working
directory, it resets the skip-worktree bit in the index based on this
file. If an entry
matches a pattern in this file, skip-worktree will not be set on
that entry. Otherwise, skip-worktree will be set.

Then it compares the new skip-worktree value with the previous one. If
skip-worktree turns from set to unset, it will add the corresponding
file back. If it turns from unset to set, that file will be removed.

To repopulate the working directory with all files, use the
`git sparse-checkout disable` command.

Sparse checkout support in 'git read-tree' and similar commands is
disabled by default. You need to set `core.sparseCheckout` to `true`
in order to have sparse checkout support.

## FULL PATTERN SET

By default, the sparse-checkout file uses the same syntax as `.gitignore`
files.

While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
files are in, you can also specify what files are _not_ in, using
negate patterns. For example, to remove the file `unwanted`:

----------------
/*
!unwanted
----------------

## CONE PATTERN SET

The full pattern set allows for arbitrary pattern matches and complicated
inclusion/exclusion rules. These can result in O(N*M) pattern matches when
updating the index, where N is the number of patterns and M is the number
of paths in the index. To combat this performance issue, a more restricted
pattern set is allowed when `core.spareCheckout` is set to `cone`.

The accepted patterns in the cone pattern set are:

1. *Recursive:* All paths inside a directory are included.

2. *Parent:* All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the
root directory are included. If a recursive pattern is added, then all
leading directories are added as parent patterns.

By default, when running `git sparse-checkout init`, the root directory is
added as a parent pattern. At this point, the sparse-checkout file contains
the following patterns:

```
/*
!/*/*
```

This says "include everything in root, but nothing two levels below root."
If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
`A/B` are added as parent patterns. The resulting sparse-checkout file is
now

```
/*
!/*/*
/A/*
!/A/*/*
/A/B/*
!/A/B/*/*
/A/B/C/*
```

Here, order matters, so the negative patterns are overridden by the positive
patterns that appear lower in the file.

If `core.sparseCheckout=cone`, then Git will parse the sparse-checkout file
expecting patterns of these types. Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash-
based algorithms to compute inclusion in the sparse-checkout.

SEE ALSO
--------

linkgit:git-read-tree[1]
linkgit:gitignore[5]

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -1136,6 +1136,7 @@ BUILTIN_OBJS += builtin/shortlog.o
BUILTIN_OBJS += builtin/show-branch.o
BUILTIN_OBJS += builtin/show-index.o
BUILTIN_OBJS += builtin/show-ref.o
BUILTIN_OBJS += builtin/sparse-checkout.o
BUILTIN_OBJS += builtin/stash.o
BUILTIN_OBJS += builtin/stripspace.o
BUILTIN_OBJS += builtin/submodule--helper.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix);
int cmd_show(int argc, const char **argv, const char *prefix);
int cmd_show_branch(int argc, const char **argv, const char *prefix);
int cmd_show_index(int argc, const char **argv, const char *prefix);
int cmd_sparse_checkout(int argc, const char **argv, const char *prefix);
int cmd_status(int argc, const char **argv, const char *prefix);
int cmd_stash(int argc, const char **argv, const char *prefix);
int cmd_stripspace(int argc, const char **argv, const char *prefix);
Expand Down
27 changes: 27 additions & 0 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ static const char *real_git_dir;
static char *option_upload_pack = "git-upload-pack";
static int option_verbosity;
static int option_progress = -1;
static int option_sparse_checkout;
static enum transport_family family;
static struct string_list option_config = STRING_LIST_INIT_NODUP;
static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
Expand Down Expand Up @@ -147,6 +148,8 @@ static struct option builtin_clone_options[] = {
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
N_("any cloned submodules will use their remote-tracking branch")),
OPT_BOOL(0, "sparse", &option_sparse_checkout,
N_("initialize sparse-checkout file to include only files at root")),
OPT_END()
};

Expand Down Expand Up @@ -734,6 +737,27 @@ static void update_head(const struct ref *our, const struct ref *remote,
}
}

static int git_sparse_checkout_init(const char *repo)
{
struct argv_array argv = ARGV_ARRAY_INIT;
int result = 0;
argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);

/*
* We must apply the setting in the current process
* for the later checkout to use the sparse-checkout file.
*/
core_sparse_checkout = SPARSE_CHECKOUT_FULL;

if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
error(_("failed to initialize sparse-checkout"));
result = 1;
}

argv_array_clear(&argv);
return result;
}

static int checkout(int submodule_progress)
{
struct object_id oid;
Expand Down Expand Up @@ -1108,6 +1132,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (option_required_reference.nr || option_optional_reference.nr)
setup_reference();

if (option_sparse_checkout && git_sparse_checkout_init(repo))
return 1;

remote = remote_get(option_origin);

strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
Expand Down
2 changes: 1 addition & 1 deletion builtin/reset.c
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
* directory so that they will have the right content and the next
* status call will show modified or untracked files correctly.
*/
if (core_apply_sparse_checkout && !file_exists(two->path))
if (core_sparse_checkout && !file_exists(two->path))
{
pos = cache_name_pos(two->path, strlen(two->path));
if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && (is_missing || !was_missing))
Expand Down
Loading

0 comments on commit 8d3da92

Please sign in to comment.