What Are VCSs?
Version control systems (VCSs) are tools used to track changes to a folder and its contents in a series of snapshots. They also maintain metadata like who created each snapshot, messages associated with each snapshot, and so on.
While other VCSs exist, Git is the de facto standard for version control.
Data Model
Git has a well-thought-out model that enables maintaining history, supporting branches, and collaboration.
Snapshots
Git models the history of a collection of files and folders within some top-level directory as a series of snapshots.
- File — Blob
- Directory — Tree
- Snapshot — The top-level tree being tracked
History: Relating snapshots
In Git, a history is a directed acyclic graph (DAG) of snapshots. This means that each snapshot in Git refers to a set of “parents”, the snapshots that preceded it. Moreover, a snapshot might descend from multiple parents due to combining (merging) two parallel branches of development.
Data model: As pseudocode
It is instructive to see Git’s data model written down in pseudocode.
type blob = array<byte>
type tree = map<string, tree | blob>
type commit = struct {
parents: array<commit>
author: string
message: string
snapshot: tree
}
Objects and Content-Addressing
An “object” is a blob, tree, or commit. In Git data store, all objects are content-addressed by SHA-1 hash.
type object = blob | tree | commit
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
References
Git’s solution is human-readable names for SHA-1 hashes, called “references”. References are mutable pointers to commits. For example, the master
reference usually points to the latest commit in the main branch of development. Moreover, “where we currently are” is a special reference called “HEAD”.
references = map<string, string>
def update_reference(name, id):
references[name] = id
def read_reference(name):
return references[name]
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
return load(name_or_id)
Repositories
A Git repository is the data objects
and references
.
Staging Area
For example, imagine a scenario where you have debugging print statements added all over your code, along with a bugfix; you want to commit the bugfix while discarding all the print statements.
Git accommodates such scenarios by allowing you to specify which modifications should be included in the next snapshot through a mechanism called the “staging area”.
Command-Line Interface
Basics
git help <command>
: get help for a commandgit init
: create a new git repo with data stored in the.git
directorygit status
: tell what is going ongit add <filename>
: add files to staging areagit commit
: create a new commit- Write good commit messages!
- More reasons to write good commit messages!
git log
: show a flattened log of historygit log --all --graph --decorate
: visualizes history as a DAGgit diff <filename>
: show changes made to the staging areagit diff <revision> <filename>
: show differences in a file between snapshotsgit checkout <revision>
: update HEAD and current branch
Branching and Merging
git branch
: show branchesgit branch <name>
: create a branchgit checkout -b <name>
: create a branch and switch to itgit merge <revision>
: merge into current branchgit mergetool
: use a fancy tool to help resolve merge conflictsgit rebase
: rebase set of patches onto a new base
Remotes
git remote
: list remotesgit remote add <name> <url>
: add a remotegit push <remote> <local branch>:<remote branch>
: send objects to remote and update remote referencegit branch --set-upstream-to=<remote>/<remote branch>
: set up correspondence between local and remote branchgit fetch
: retrieve objects/references from a remotegit pull
: same asgit fetch; git merge
git clone
: download repository from remote
Undo
git config
: Git is highly customizablegit clone --depth=1
: shallow clone, without entire version historygit add -p
: interactive staginggit rebase -i
: interactive rebasinggit blame
: show who last edited which linegit stash
: temporarily remove modifications to working directorygit bisect
: binary search history.gitignore
: specify intentionally untracked files to ignore