Best Practices for Setting Up Git Repositories: A Comprehensive Guide
So, you're creating a new project and need to set up a fresh git
repository...
Open source projects have a standard set of components engineers expect to see in a repo.
Inclusion of all standard components is a positive signal of the repo maintainer's experience and professionalism.
Exclusion of any of these standard components may provoke a negative bias from the reader. Engineers often immediately navigate away from projects without these components, as the reader lacks signals of a project's health and reliability.
This guide will teach you how to:
- Set up a new, local repository
- Include standard components industry engineers expect to see
- Curate an open, collaborative environment (with proper guard rails)
- Consider continuous integration, release strategies, testing, linting, and more!
Afterwards, you will have a standardized repository to store your project files that looks like this example and be familiar with industry best practices.
Let us begin.
For this guide I will be using env vars to ease copy-paste.
Please export
the name of your repository and GitHub
username to take full advantage of this guide's contents:
export MY_AWESOME_REPO=<repo-name> && echo $MY_AWESOME_REPO
export GIT_USERNAME=<repo-name> && echo $GIT_USERNAME
I am assuming you are using a bash-like shell and plan to use GitHub as an online repo host.
Initializing a New Local Repo
"A library with a thousand downloads begins with a single commit." - idk, probably Torvalds
Repositories are essentially directories where every change made is tracked and stored for posterity.
Every project should use source control. Without source control, chaos reigns.
git
is the industry favorite tool for source control.
Other tools exist (e.g., Fossil, SVN, CVS, Mercurial, etc...), however they are rare to see out in the wild.
Run the following commands to create a new directory, navigate to it, and create an empty git
repo:
mkdir $MY_AWESOME_REPO
cd $MY_AWESOME_REPO
git init
Congradulations! You now have an empty git
repo ready to use for source control excellence!
Configuration and tracking of git
history is stored in the .git/
directory, which has the following structure:
.
└── .git
├── HEAD
├── config
├── description
├── hooks
├── info
├── objects
└── refs
If you have not already, please configure your name and email, as git
requires it.
Herein, we assume the reader knows basic git operations, such as
add
,
commit
,
and diff
.
This guide is not a git
tutorial,
rather it is a compilation of best practices and steps for setting up new repos.
Industry is shifting towards a preference for naming the default git
branch to main
.
To change your current branch's name from master
to main
and set the default branch name made by future $ git init
invocations, run the following:
git branch -m main
git config --global init.defaultBranch main
Add a LICENSE
"Lawsuits are never a fashionable thing to wear." - someone who always uses a
LICENSE
Software licensing is too often considered a second priority to functionality and overlooked by new coders.
However, one never knows how their code will be used - possibly years later - once it is in the public domain. Protection from liability is an essential facet of open source software and trivial to implement.
All you need to do is:
- Copy the following MIT License
- Paste it in a
LICENSE.md
file - Set the
[year] [fullname]
fields - Never worry about it again! (Maybe you'll update the
[year]
occasionally)
MIT License
Copyright (c) [year] [fullname]
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
I prefer a MIT License
for a variety of reasons, however there are a multitude of available licenses - some of which may be more appropriate for your project!
For instance, the "viral" GNU GPLv3 License
is useful if you want any project taking a dependecy on yours to use the same license.
Please take some time to familiarize yourself with the common set of licenses used by open source projects.
Add a README
"Where's your
README
?" - a frusturated engineer
Projects without a README
file are persona non grata in the open soure world.
Trust is the coin of the realm, and engineers do not trust projects without a README
!
Repo hosting websites (such as GitHub) believe in the README
file to such a degree where they automatically render README
s in every repo.
Everyone expects you to have one; README
s are built into the basic muscle memory of the organs of industry.
One of the first few commits in a new repository should create a new README.md
file.
I typically use the following basic template:
# <REPO_NAME>
<BRIEF_DESCRIPTION_OF_INTENT>
Simple scaffolding - such as the template above - is sufficient until you have built out more functionality in your project.
As your project grows, folk should be able to understand what your project is about and how to get started using it by reading the README
.
If a reader needs to dig around in other files to understand the basics of how to use your project, then those files should be linked to in the README
.
In a world of growing complexity, the simplest thing you can do to make your project more accessible is having a banger README
.
Add a .gitignore
File
"One man's trash is still a bad idea to commit." - ancient proverb
Certain files should never be included in source control, such as temporary files, secrets, or build assets.
Luckily, git
consults a special .gitignore
file which provides a list of files/patterns to ignore.
Projects will often have their own unique requirements and special cases to ignore. Typically, I start off new projects with this simple template, then build the list over time:
# common
lib/
tmp/
temp/
# neovim
*.swp
Add a CONTRIBUTING
Guide
"How can I help?" - a blessed soul
Folk love to contribute! Community contributions are the bedrock upon which open source projects lay.
Standards for contribution are typically kept in a special CONTRIBUTING.md
file.
This is the first place members of the community will look if they want to modify some aspect of your repo.
Feel free to copy the standard template I use - and modify it to fit your needs.
If you expect your project to grow in the number of contributors, then please consider making a code of conduct! Humans are complex; it's always a good idea to have a standard bar to which we should treat one another as emotions run high.
Create Remote Repo (GitHub)
"git != GitHub" - every git workshop ever
At some point, you will want others to be able to view your project (and to have an online backup). Online repository hosting services are plentiful and free to use, so there's no reason you shouldn't backup an open source project online.
GitHub is a popular hosting service, and the one we will be using for this guide.
Create a online repo on GitHub to host your code now.
(I'm assuming you're using the same name you set previously in your $MY_AWESOME_REPO
env var.)
Authentication from the git
CLI to GitHub can be done in a vairety of ways.
SSH is a standard, secure way to attest your identity and is preferred over HTTPS authenticion in industry.
This guide uses SSH authenticion, to set up SSH locally:
- Generate a new SSH key
- Start an
ssh-agent
session to bind keys to - Add your SSH key to your session
Replace
zsh
withbash
or whatever shell you're using!
ssh-keygen -t ed25519 -C "your_email@example.com"
ssh-agent zsh
ssh-add
Next, add your public key to your GitHub profile, then you're all set!
Assuming you've committed your previous changes, it is now time to set your remote
and push
!
git remote add origin git@github.com:${GIT_USERNAME}/${MY_AWESOME_REPO}.git
git push origin main
Now, navigate back to the online repo you created on GitHub to see your code!
Branch Protection Rules
"YOLO Considered Harmful" - Gen Z Dijkstra
Branch protection rules
enforce specific methods engineers may use to modify important branches (such as main
).
Enforcement of simple contribution rules are an excellent way to help keep code quality high and foster a collaborative deveopment environment. Without branch protection rules, merging can become a frightening, error-prone process.
Set up a branch protection rule on main now.
I recommend two options to always be enabled for rules on main
branches:
Require a pull request before merging
Require status checks to pass before merging
Pull requests (PRs) allow for spliting changes into related chunks and putting them up for review (by yourself and others).
Protecting main
by requiring pull requests is the most effective method to avoid merge thrashing among a set of collaborators.
Batching changes into discrete units helps engineers organize their workflows and have greater visibility into how a repo changes over time. History of which pull requests were merged when is an essential asset which pays divedens for posterity.
Please familiarize yourself on how to create a pull request and merge a pull request.
Every engineering team in my industry experience (and a majority of open source projects) use a pull request process. Thus, knowledge of how to make and review pull requests is prudent.
Trunk based development is the de facto industry standard of how to collaborate using git
.
Every engineer should know and follow trunk based development practices, whether they are developing alone or on a team.
Decide on the merging strategy you want to use for your project. Several methods of merging exist, with their own pros and cons. Take some time to familiarize yourself with the options available and understand their impact.
When working alone, I prefer to rebase branches locally then use GitHub's Rebase and merge
.
Use of Rebase and merge
maintains a linear git history and results in more "green squares" in your contribution history. n____n
When working in an enterprise setting, I prefer to rebase branches locally then use GitHub's Squash and merge
.
Squash commits are nice to reduce the commit log noise (i.e., main
will have thousands of commits instead of tens of thousands of commits).
Rebasing locally first will help maintain a linear git history.
The curious may consult futher reading on linear history (my preferred approach), as there exist pros and cons to this strategy.
Status checks for pull requests are usually defined via continuous integration pipelines, which we will explore next.
Scaffold Continuous Integration (CI)
Continuous integration (CI) is a standard development practice which contributes to the health and maintainability of software projects.
Here, we will set up a minimalistic scaffolding for a CI pipeline and use it as a status check on PRs. If the CI pipeline does not pass on a PR, then the PR should not be merged.
CI pipelines often build, lint, and test changes before passing.
However, in the absence of code, we will create a simple pipeline which only performs a checkout
of the repo.
We will use GitHub Actions to create a CI pipeline. Please create a pipeline file:
mkdir -p .github/workflows
touch .github/workflows/ci.yml
Next, copy the following into the new file.
name: CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2
Commit, push, and navigate to the Actions
tab in your GitHub repository to see the pipeline run.
As you add code and processes to build, lint, and test that code, you should update your CI pipe to reflect those processes. Ensure the CI pipeline enforces the standards you want to see in your codebase!
You should add a status badge to your README!
Everyone likes seeing a passing
CI status badge linked to the workflow:
Decide on a Release Strategy (CD)
"SHIP IT" - gangster squirrel
Continuous delivery (CD) is a method for shipping your project once it is ready for release.
For example, this webstie is deployed using this CD pipeline manifest.
Everytime I push to a branch whose name matches the release/*
pattern, a new version of the spacekatt-io
website is automagically built, tested, and published.
Release Flow is my preferred release strategy. (I'll leave it to the reader to decide if it counts as "CD".) Other release strategies exist, and one should consider what their release strategy should be from a project's inception.
As CD requires shippable code, setting up a CD pipeline is beyond the scope of this guide. However, please do create a GitHub Action manifest for a CD pipeline as soon as you start deploying code.
How you release a project will often depend on whether your project is a library or an application.
Libraries are often published to a package manager (using semantic versioning), such as NPM for Node.js packages. Applications are usually deployed to the cloud (i.e., on Azure, GCP, or AWS) over a set infrastructure defined as a declarative manifest (e.g., Terraform or Bicep to implement infrastructure as code).
Further Considerations
"Oh god, there's more?!" - the reader
Engineering fundamentals are essential factors to keep in mind while developing any project. My favorite guide to standard engineering practices is the CSE Engineering Fundamentals Checklist.
To start off your engineering fundamentals journey, please consider the following:
- Linting
- Linting is an automated process which enforces a standard coding style and format
- Linters make your code look pretty!
- Linters reduce the amount of time you waste formatting your code.
- Engineers bickering about formatting on PRs is a waste of everyone's time; linters eliminate the need for any formatting bikeshedding
- Consult language-specific style guides to help you set linter configs
- Testing
- Testing helps ensure your code does what you say it does!
- Testing helps prevent regression of functionality as changes accumulate
- Automated unit testing should be included in CI pipelines. PRs should not be merged unless all the tests pass
- Everyone loves seeing projects with high unit test code coverage
- Documentation
- Document design/architecture decisions so folk know what the heck you are doing!
- I like keeping docs in a
docs/
directory - The
README
should point folk to your documentation
- Onboarding Guides
- If you expect the group of people who are developing a project to grow, then you should create a guide to onboard those folk to the project
- Well written onboarding docs help save time and make your repo more accessible for contributions
Linting and testing are essential components of any runnable project, they should be one of the very first things you set up once you have running code.
Remember, every project has a different set of requirements. As you build out your project, research what practices are idiomatic for the domain/language you are developing in.
Conclusion
Standard components and practices introduced in this guide are language-agnostic and apply broadly to version control repositories. Engineers across industry expect (by matter of experience and practice) to see all of the elements described above.
After following this guide, your initial repo should look like the sample repo and have the following directory structure:
.
├── .github
│ └── workflows
│ └── ci.yml
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE.md
└── README.md
Thank you for investing in your learning by taking the time to familiarize yourself with git repository best practices.