Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URGENT: reduce size of repository or remove from Github by 2024-06-24 #19

Open
tcatapano opened this issue Jun 19, 2024 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@tcatapano
Copy link
Member

applies to https://github.com/plazi/treatments-rdf as well

per Github Support:

We've noticed that the repositories have grown to about 4 GB and 7 GB respectively, which is close to our recommended 5 GB maximum. However, due to their structure and contents, they have logical sizes of 33 GB and 46 GB for some Git operations, and are putting a lot of stress on our servers.

Git and GitHub are optimized to provide version control and code collaboration, which means each push of data to our servers triggers computation on our end to apply necessary metadata and structure things efficiently for that purpose.

That means there are many use cases, such as backups of non-text files, or database dumps, that are unsuitable for Git, and an inefficient strain on our infrastructure.

You can read more about this here:

https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#repository-size-limits

We have a tool called git-sizer that you might be interested in, that analyzes the various size metrics of a repository:

https://github.com/github/git-sizer

You can read more about it in this blog post:

https://github.blog/2018-03-05-measuring-the-many-sizes-of-a-git-repository/

As a result of their size and contents, these repos are consuming a non-trivial amount of resources and have the potential to negatively affect the GitHub service for other users.

We need to ask you to either bring down their size and make the contents more manageable, or if the project needs to stay in its current state, move them, off GitHub entirely.

You could bring down the size by rewriting repository history so that only the code to perform the operations is stored in each repo, and all of the output of the operations is stored elsewhere. By the looks of it the output does not need to be version controlled, and should be stored in an Amazon S3 bucket, or similar service, that is optimized for cloud storage purposes. This third party site offers some ideas for other hosting providers:

https://www.techradar.com/news/the-best-cloud-storage

and

One thing to be aware of - if you're trying to rewrite all the repo's history in one go, it's possible you'll hit our 2 GB push limit, and will need to break your push up into smaller chunks. This help article has some tips for doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants