25. 8. 2020 • 5 min read
Validated: Sep 2024
Repositories become larger over time due to new code, nevertheless, sometimes large files that should not be part of the project are committed as well. Having these files as part of your code base is not the best practice as they will only cause others to work with and download unnecessary large repositories.
Most common cases of unnecessary files:
Due to git’s great feature of keeping the whole history of changes, it’s not enough to just delete these files from the git repository as git still keeps all the files previously committed. This may then cause your repository to be too big for analyses. This limit of 200 MB is set for keeping the performance of Codeac on the highest possible level.
There are many ways on how to reduce the size of your repository and clean its history. Let's dive into the two main ones:
I am using as my GIT provider.
The BFG Repo-Cleaner is a simple and fast tool written in Scala for cleaning bad data out of your GIT repository history. Here is how to get started:
git clone --mirror https://github.com/tony/my-big-project.git
The --mirror
flag will clone a bare repository and your normal files won't be visible. However, it is a full copy of the GIT database of your repo, and at this point you should make a backup of it to ensure you don't lose anything.
java -jar bfg.jar --strip-blobs-bigger-than 10M my-big-project
The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff. Make sure your history has been updated, and then use the standard git gc
command to strip out the unwanted dirty data, which GIT will now recognise as surplus to requirements:
cd my-big-project
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Once you're happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror
flag, this push will update all refs on your remote server):
Rewriting repository history is a destructive operation. Make sure to have your repository backed up.
At this point, you're ready for everyone to ditch their old copies of the repository and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have a dirty history that you don't want to risk pushing back into your newly cleaned repo.
An alternative way to clean large files from your repository is to use git-filter-repo . It is a versatile tool for altering GIT history. Install git-filter-repo using a supported package manager or from source.
Clone a fresh copy of the repository using --bare
and --mirror
.
It is a full copy of the GIT database of your repo, and at this point you should make a backup of it to ensure you don't lose anything.
git filter-repo --strip-blobs-bigger-than 10M
Rewriting repository history is a destructive operation. Make sure to have your repository backed up.
Once large files have been removed, it is a best practice for everyone using the repository to make a new clone; otherwise, if someone does a force push, they will push the large files again and you’ll be back to where you started.
For more information see the GitHub documentation “Removing files from repository’s history” and "Removing sensitive data from a repository" Bitbucket documentation “Reduce repository size” GitLab documentation “Reduce repository size” .