Brief Overview of Git


TL;DR

Create a new Git repository:

$ git init

or copy a Git repository:

$ git clone <URL>

Submit changes:

$ git add <path> [<path> ...]
$ git commit
$ git push

Get and apply changes:

$ git pull

Git is a distributed version control system (DVCS) designed to handle everything from small to very large projects with speed and efficiency. It is a widely adopted version control system (VCS), and the tool we have chosen to use within SNO+ for all code development, along with hosting with GitHub. If you are unfamiliar with version control systems, refer to What is Version Control?.

Git has detailed documentation, and free book (Pro Git) available online in many format. Andy Mastbaum had written a fantastic overview of Git for SNO+, SNO+-doc-1462 it is suggested read, otherwise use this page for reference.

Your identity

Git has many configuration options. You are welcome to tweak your setup as you see fit. The first thing you should do when using Git, is to set your identity; it is important to know who committed the pushed the changes to the codebase.

$ git config --global user.name = "John Doe"
$ git config --global user.email = j.doe@lancaster.ac.uk

Creating a Git Repository

To make a your current working directory into a Git repository, you only need to run,

$ git init

which creates the Git directory, .git, is where all the metadata and object database for your project resides. Do not delete this directory, or change anything within it, unless you know what you are doing. It also creates and sets the current branch to master.

If you have a repository already setup somewhere else, another computer, GitHub, etc., that you would like a copy of, you clone it,

$ git clone <URL>

which automatically makes a copy of URLs master branch, sets up URL as the remote repository origin; for example,

$ git clone git@github.com:snoplus/rat.git

will make a copy (clone / local fork) of the SNO+ RAT repository.

Three States

It is important to know Git has three main states that you files can reside in: committed, modified, and staged.

  • committed

    The data is safely stored in your local database

  • modified

    You have changed the file, but have not committed it to you database yet

  • staged

    You have marked a modified file in its current version to be in the next commit (shapshot)

The working directory is a single snapshot from your database. When you checkout a snapshot, the files are pulled out of the compressed database in the Git directory and placed on disk for use and modification.

The staging area is a file within your Git directory (.git/index); it stores information of your next commit.

A simple workflow would be:

  1. You modify your files in your working directory

    $ vim changed-file.cxx
    
  2. Stage the files that you have modified, which adds the files to the staging area

    $ git add changed-file.cxx
    
  3. Commit the changes, which takes the files as they were when staged, and stores that snapshot permanently in your database

    $ git commit -m "Fix issue #42"
    

Just running git commit will open your editor for a commit message. This editor uses the default for your configuration, set using the EDITOR environment variable. If you wish to override this editor for Git only, you can set it via:

$ git config --global core.editor <editor-command>

Please write good commit messages, it should be a record of what was changed and why. How to Write a Git Commit Message is a blog-post which summaries the preferred method of writing commit messages.

In short, by treating the commit message as an email, with the first line being the subject,

  1. Separate subject from body with a blank line
  2. Limit the subject line to 50 characters
  3. Capitalise the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why rather than how

Status and Changes

  • git status

    Shows a summary of what files have been added, modified, staged, and any untracked files (usually new files). It even gives the relevant commands to add and remove files from the staging area, in case you have forgotten.

  • git diff [--staged] [<ref>] [<path> ...]

    Shows a summary of all the unstaged changes relative to ref. If ref is omitted, it is defaulted to HEAD. If --staged is given, it shows the staged changes, rather than the unstaged changes. If paths are given, only the changes to those paths are returned.

  • git diff <ref1> <ref2> [<path> ...]

    Shows a summary of the all the changes between ref1 and ref2. If paths are given, only the changes to those paths are returned.

  • git log

    Shows the commit history of the repository

Where refs are reference pointers to commits. This could be a hash, a tag name, a branch name, or a special identifier.

Undoing Things

A common mistake is forgetting a file, or have a typo in your most recent commit; you can correct the last commit with the --amend option. Just make the relevant changes and:

$ git commit --amend

However, if you have already pushed your commits to a server, it is suggested that you do not edit the history in any way. There is an advanced command (which will not be covered here), git rebase, which gives the user the ability to change anything in the history. Do not use this, unless you are knowledgeable, and are certain the changes you make will not affect any commits but the ones purely local to your version.

If you have staged a file by mistake, you can remove it with a quick:

$ git reset HEAD <file>

If you have modified a file and you decide that was not what you wanted, you can just checkout the file again; however, note your changes you made have been lost, make sure you want to do this.

$ git checkout -- <file>

Anything that has been committed in Git can almost always be recovered, even if you believe you have lost it; however, if you lose something you haven’t committed, it is lost forever — commit early, commit often!

Working Remote Repositories

These are remote repositories are versions of your project hosted somewhere else; usually these are your fork of the project hosted on some service like GitHub, the original/upstream project, or your collaborator’s version that you may wish to use something from.

We use remotes a lot in SNO+; it is the way we update our code to the upstream version, or submit our changes for others to use. In SNO+ a remote named origin usually refers to our own fork on GitHub, whereas upstream usually refers to the SNO+ version on GitHub.

To list what remotes you currently have set up, use git remote; you might find git remote -v more useful, as it lists the URLs of the remote repositories as well.

Adding Remotes

Adding a remote is fairly simple, providing you know the URL of the repository.

$ git remote add <remote-name> <url>

For example, to add the upstream development version of SNO+ RAT to your local version:

$ git remote add upstream git@github.com:snoplus/rat.git

Fetching and Pulling from Remotes

To get the data from a remote you need to fetch it,

$ git fetch [--all | <remote-name>]

if remote-name is omitted, then the tracked remote branch is implied; if --all is given, all remotes are fetched. Once you have the data fetched, you can then merge in the data to your repository.

This is a common practise, and if you have a local branch tracking a remote branch, you can simply run:

$ git pull

Which is shorthand for:

$ git fetch && git merge FETCH_HEAD

Pushing to Remotes

$ git push [<remote-name>] [<branch-name>]

Renaming and Removing Remotes

Sometimes you want to rename the remote,

$ git remote rename <old-remote-name> <new-remote-name>

this is the suggestion if you are using snoing to install rat-dev, as origin is set to the SNO+ version, rather than your fork:

$ git remote rename origin upstream
$ git remote add origin git@github.com:USERNAME/rat.git

Once you are done with a remote, you can deleted it, with either a short or long form:

$ git remote rm <remote-name>
$ git remote remove <remote-name>

Hash

You should be aware that everything is check-summed using a SHA-1 hash in your database; thus everything in Git can be referenced with a 40-character string of hexadecimal characters (0-9, a-f). It is unlikely you need to use all 40 characters for a unique reference, usually 7 characters is more than sufficient; i.e., using d0bc7a2 over d0bc7a204e36b47e8855d0f1e511c8c97d259323.

Tagging

Tags are a useful tool for referring to a specific point in the history of your repository; commonly used for versioning. A tag can be used like any other reference, e.g., be checked out,

$ git checkout 5.3.2

however it is usually better to check out the tag to a branch.

  • git tag lists available tags
  • git tag -a <tag> [<ref>] will create a tag, tag, for ref, or HEAD if ref is omitted
  • git show <tag> will print a summary of the tag, tag

Tags are not automatically shared when pushing, so you need to pass the --tags option, e.g.,

$ git push origin --tags

Branches

A lot of other VCSs store information as a list of file-based changes over time (diffs); whereas Git stores information as a list of snapshots of the state of the files, without stored unchanged files. By treating data like this, Git is like a mini filesystem; this gives rise to cheap and simple branching and merging.

The term ‘branch’ is used as you can think of the Git commit history as an ancestry tree; each commit is node, and a branch would indicate a divergence in the code.

In terms of the trivial local version control system, you would have one directory filled with timestamped subdirectory copies of different versions of the codebase (commits); the parent directory to these timestamped subdirectories would be akin to a branch.

Lets say you want to make a new feature, which would involve a lot of refactoring; you want to keep the code in its current state, as you want to continue develop. You would make a copy of that parent directory and continue editing both simultaneously; one master directory (branch), and another feature directory (branch).

This analogy should give you a feel of what branches are; another copy of your code for parallel development. In Git they are a little different to the above explanation, but are akin to it.

You start with a master branch, but wish to make some feature, whilst keeping the working version you have on master,

$ git branch <new-branch> [<ref>]

will create a new branch named new-branch, based off of ref, or HEAD if ref is omitted,

$ git checkout -b <new-branch> [<ref>]

would also change to the new branch, and is equivalent to:

$ git branch <new-branch> [<ref>]
$ git checkout <new-branch>
  • git branch will list all local branches
  • git branch -d <branch> will delete a local branch, branch, if it has been merged
  • git branch -D <branch> will delete a local branch, branch even if it hasn’t been merged
  • git checkout <branch> will change the current branch to branch if it exists, if it doesn’t but the remote origin/branch exists, then it is created and set to track origin/branch
  • git branch -u <upstream-branch> [<branch>] will set the upstream branch of branch (or current branch if omitted) to upstream-branch.

Merging

Once you have decided your feature is completed and you want to merge those changes into you master branch,

$ git checkout master
$ git merge <branch-name>

this also works with remote branches.

Conflicts

Development occurs fast, you should stay up-to-date and merge in upstream changes often to ensure there are minimal conflicts and incompatible changes.

$ git fetch upstream
$ git merge upstream/master

It is still possible that you will get the occasional conflict, even if you try to stay as up-to-date as you can.

When merging, Git will inform you there is a conflict, and where. This occurs when the same part in the same file has different changes. git status will five you more information, and instructions on what do.

The merge keeps both versions, navigating to the file you’ll see the conflicting code in the form:

<<<<<<<
version in the current branch (yours)
=======
version in the merging branch (theirs)
>>>>>>>

As shown, between <<<<<<< and ======= is the code in your branch, the current branch; whereas the code between ======= and >>>>>>> is the code in their branch, the branch you are merging into the current branch.

Edit this block of code (including the <<<<<<<, =======, and >>>>>>>), to reflect the resolution that is correct; then stage the file and commit.

Squashing

An untidy commit history can be hard to follow. If you have lots of small, trivial changes, you can commit them together as one squashed commit, which combines your changes and gives you an opportunity to give a meaningful commit message, summarising all the changes.

To merge your squashed into the current upstream master, in it’s own branch:

$ git fetch upstream
$ git checkout -b feature-squash upstream/master
$ git merge --squash feaure
$ git commit