Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add caching to git describe #684

Merged
merged 17 commits into from
Feb 9, 2023

Conversation

yiweny
Copy link
Contributor

@yiweny yiweny commented Jan 30, 2023

Before this PR

gitVersion() or versionDetails() does not have caching, every time it is called, it will create a new 'VersionDetails' object and call the expensive git describe operation to get the version. This causes long configuration time in a lot of the repos where version gitVersion() is called under allprojects.

After this PR

Build service is used so if we call gitVersion() on all projects, we will not call the expensive git describe many times. Instead, the build service will have a concurrent hash map that stores the created VersionDetails object with a specific prefix as key the first time it is called. If gitVersion() is previously computed, it will return the cached value the next time it is called.

Possible downsides?

Build service is only supported from Gradle 6. This means this plugin will no longer support Gradle 5.

@changelog-app
Copy link

changelog-app bot commented Jan 30, 2023

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

  1. Caching is added to gitVersion() or versionDetails() to avoid long configuration time caused by repeatedly calling git describe on subprojects.
  2. The implementation uses BuildService which is introduced in Gradle 6.1. Therefore, any version before Gradle 6.1 won't be supported.

Check the box to generate changelog(s)

  • Generate changelog entry

@yiweny yiweny changed the title add caching to git describe [WIP] add caching to git describe Jan 30, 2023
@yiweny yiweny closed this Feb 3, 2023
@yiweny yiweny reopened this Feb 6, 2023
@yiweny yiweny changed the base branch from develop to yyuan/remove-dep-to-gradle5 February 6, 2023 21:11
@yiweny yiweny changed the title [WIP] add caching to git describe add caching to git describe Feb 7, 2023
@yiweny yiweny marked this pull request as ready for review February 7, 2023 00:42
@yiweny yiweny requested review from CRogers and bjlaub February 7, 2023 00:42
.getSharedServices()
.registerIfAbsent("GitVersionCacheService", GitVersionCacheService.class, spec -> {
// Provide some parameters
spec.getMaxParallelUsages().set(maxParallelUsage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set a max parallel usages at all? I think provided we make the class thread safe, it should be ok to allow max parallel access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this line.


final Git git = gitRepo(project);
if (project.getRootProject() == project) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be best to leave this in the root plugin, if possible. Just to separate out the code a bit and avoid this if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the GitVersionRootPlugin. The only reason why I deleted the root plugin is that we'd have to have the following in both plugins

        Provider<GitVersionCacheService> serviceProvider = project.getGradle()
                .getSharedServices()
                .registerIfAbsent("GitVersionCacheService", GitVersionCacheService.class, _spec -> {});

Git git = gitRepo(gitDir);
VersionDetails versionDetails =
TimingVersionDetails.wrap(timer, new VersionDetailsImpl(git, GitVersionArgs.fromGroovyClosure(args)));
versionDetailsMap.put(key, versionDetails);
Copy link
Contributor

@CRogers CRogers Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe. I'd make the map a ConcurrentHashMap and use .computeIfAbsent:

versionDetailsMap.computeIfAbsent(key, _ignored -> {
    // calculate version details for key
});

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have guava as a dependency here - perhaps just a LoadingCache?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cache that expires is unnecessary complexity for this use case imo, the number of items in this map should never grow too large, and evicting entries is probably not what we want.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well, the checkstyle config bans guava caches in favor of Caffeine anyway :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved by using the concurrent hash map instead of hash map here.

public final String getGitVersion(String project, Object args) {
File gitDir = getRootGitDir(new File(project));
GitVersionArgs gitVersionArgs = GitVersionArgs.fromGroovyClosure(args);
String key = gitDir.toString() + "|" + gitVersionArgs.getPrefix();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does git dir need to be part of the key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am thinking of the scenario where different projects have different work tree. I can remove this if it is considered an extreme edge case not likely to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ I removed the gitDir part in the key. If you think the above case might happen and it is necessary to keep the worktree as part of the key, please let me know.

@@ -0,0 +1,5 @@
type: improvement
improvement:
description: add caching to git describe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a better description here. I'd make it a break and say it removes support for Gradle 5 (and say what is the new minimum version we support). I'd also a bit more detail, saying how the pattern of calling gitversion() in each project used to lead to extremely long configuration times, however these are now cached so it's faster.

if (versionDetailsMap.containsKey(key)) {
return versionDetailsMap.get(key).getVersion();
}
Git git = gitRepo(gitDir);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could just be pushed down into VersionDetailsImpl? For a couple of reasons:

  1. It seems odd to initialize a JGit object in this class just to inject it into the VersionDetailsImpl instance
  2. If we ever do decide to eliminate JGit, it would be good to have the usage limited in scope to make refactoring easier

On the other hand, I wonder if the Git class is thread-safe? The JGit javadoc doesn't really specify and a quick search indicates... maybe? If it is, then I wonder if it would make sense to cache a single instance here in this class instead of creating one instance per VersionDetailsImpl.

My preference would probably be to move this inside VersionDetailsImpl and instead pass gitDir as a constructor arg; a quick glance at VersionDetailsImpl seems to indicate that it is using both JGit and Native Git to compute the description, which seems a little broken (see e.g.: https://github.com/palantir/gradle-git-version/blob/develop/src/main/java/com/palantir/gradle/gitversion/VersionDetailsImpl.java#L54-L60)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by passing gitDir to the constructor rather than Git object. @bjlaub Do you want me to delete the Native git part as well?

@yiweny yiweny force-pushed the yyuan/remove-dep-to-gradle5 branch from 8d686b5 to 0aa0914 Compare February 7, 2023 19:53
final Git git = gitRepo(project);
Provider<GitVersionCacheService> serviceProvider = project.getGradle()
.getSharedServices()
.registerIfAbsent("GitVersionCacheService", GitVersionCacheService.class, _spec -> {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a static method to get the Provider<GitVersionCacheService> from a Project which is shared by both plugins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could even put it on GitVersionCacheService

private static final Logger log = LoggerFactory.getLogger(GitVersionCacheService.class);

private final Timer timer = new Timer();
private final Map<GitVersionArgs, VersionDetails> versionDetailsMap = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a ConcurrentMap to make it obvious it's the concurrent version.

called under allprojects. \n\n## After this PR\nBuild service is used so if we
call `gitVersion()` on all projects, we will not call the expensive `git describe`
many times. Instead, the build service will have a concurrent hash map that stores
the created `VersionDetails` object with a specific prefix as key the first time
Copy link
Contributor

@CRogers CRogers Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence is too much detail for an end user - I'd leave out any details of how the caching happens (build service, hashmaps, prefixes etc) and just say it's now cached.

@@ -0,0 +1,16 @@
type: break
break:
description: "## Before this PR\n`gitVersion()` or `versionDetails()` does not have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also leave out the "Before this PR" bits and just have a couple of bullet points, one for the change and one describing the break.

@@ -1,126 +0,0 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, what's the reason for removing this? If there's a potential future where we remove jgit, don't we want to keep this code around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. I deleted because it won't be used if we do not call Native git.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it won't be used if we do not call Native git.

not sure i follow - don't we always try native git first, and then fall back to jgit if it fails?

@yiweny yiweny force-pushed the yyuan/add-build-service branch 2 times, most recently from 71d0fe6 to 7b734e5 Compare February 8, 2023 19:00
private static final Logger log = LoggerFactory.getLogger(GitVersionCacheService.class);

private final Timer timer = new Timer();
private final ConcurrentMap<GitVersionArgs, VersionDetails> versionDetailsMap = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out on a project and was wondering why it doesn't appear to produce a speedup on repos with large numbers of projects and calling gitversion() on each. I think this is the reason - GitVersionArgs does not implement hashCode() or equals() so a new entry is created on the map each time. I'd either just use String here (probably easiest), or make a new immutables type containing the string prefix and immutables will generate hashCode() and equals() for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched back to using the concatenation of gitDir path and prefix as the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not use the gitDir as part of the key, we'd fail the test subproject version test where the two projects have different worktree.

Copy link
Contributor

@CRogers CRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@yiweny yiweny merged commit ce40722 into yyuan/remove-dep-to-gradle5 Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants