Tuesday, February 2, 2016

Git & GitHub Concepts

This article provides an insight into how Git works, what is the role of GitHub and how the two are connected with each other. The aim is to arm you, the reader, with a clear understanding of how Git and GitHub essentially work so that the various Git commands start making sense.

1. Git

In this section you will look at and understand how Git works just by itself, insulated from the rest of the world. You will see that this understanding is critical in the use of Git in the more general situation of Git being used by many people, possibly working from remote geographical locations.

1.1 Repository

Git is a software, a program that is used for source code version management. It comes pre-installed on Mac machines; installing it on Ubuntu Linux is achieved by running "sudo apt-get install git" and likewise on other systems.

To get Git to do something useful, the first thing you have to have is a Git repository. In the standalone Git system, this is achieved by the command:

git init

You have to execute this command in the directory containing the files that you want to manage using Git. This command creates a ".git" subdirectory in the directory in which the command is run. And this ".git" subdirectory is the Git repository.

As an example, consider that you are a java programmer who has a directory called "workspace" that contains all your java projects. If you wanted to put all your java projects under Git version control, you would do:

cd workspace
git init

1.2 Commits & Heads

The thing to understanding Git is to have a conceptual picture of what resides in the ".git" subdirectory which is the Git repository. In the Git repository, there are two types of objects: commits and heads.

A Commit object represents a snapshot of what is in the directory containing files managed by the Git repository. As the files managed by a Git repository change, new snapshots can be taken by making more Commit objects in the repository. The command to create a Commit object is:

git commit [and some options here]

Over a period of time, the Git repository will appear as below:

C1 <--- C2 <--- C3 <--- C4

The Commit objects are related to their parents which are themselves snapshots at earlier points in time.

A Head object is a reference to a Commit object. For example:

C1 <--- C2 <--- C3 <--- C4
                                  |
                            master

By default, when a new repository is created, the latest Commit object is referred to by a Head object called "master". There is also a special name called HEAD that the Git software understands. This name HEAD represents a Head object that points to the currently active Commit object.

C1 <--- C2 <--- C3 <--- C4
                                  |
                              master
                                  |
                              HEAD

The HEAD can be changed by using the command:

git checkout [more options here]

2. GitHub

You have so far seen the use of a Git repository sitting in a standalone system isolated from the rest of the world. But that was only because the concepts to be understood were seen clearly in that context. Now we move into a more real-world situation where the same Git repository is shared by many people.

2.1 Remote Repository

GitHub provides a facility to create a "remote" Git repository that will be shared by different people. To create such a repository, you will need to get a GitHub account (which is free for public repositories) and login to github.com. Once you are logged in you can create a new empty repository that can be used by many different people.

2.2 Cloning

The way to use the repository on GitHub is to create a replica of it on your local file system. The command to do this is:

git clone <remote_repo_url>

The remote_repo_url is the url available from the GitHub interface for the repository. Use the "https" url provided by GitHub.

So, if you were a java programmer in a team that maintained its projects in a directory called workspace, you would execute the following commands:

cd workspace
git clone https://github.com/xyz/def

This would have the effect of creating a replica of the remote Git repostiory locally on your file system.

2.3 Tracking & Remote Tracking

It is at this point that several new things come into play that start complicating the simple picture of the standalone Git repository.

Here is what happens:

The Git software runs on your system alone. So, it has to keep track of two sets of things: the state of the repository on the remote location as well as the state of the repository on the local system. For this purpose, there are two sets of objects (actually Head objects) maintained.

Here is an example:

The remote repository has the structure:

                              master
                                   |
                              HEAD
                                  |
C1 <--- C2 <--- C3 <--- C4

Then the clone process creates the following repository structure on the local system:
                               origin / master
                                  |
                              origin / HEAD
                                  |
C1 <--- C2 <--- C3 <--- C4
                                  |
                              master
                                  |
                              HEAD

The heads origin/master and origin/HEAD are called remote tracking branches while the master and HEAD are called tracking branches. The term branch in this context is a reference to a Commit object and therefore synonymous with a Head object. The term branch has a more general meaning which comes into play with branching and merging operations.

The remote tracking branch is a read-only view into the repository and it tracks the state of the remote repository. In order to update the remote tracking branch, the Git operation is:

git fetch

This operation gets all the commits from the remote repository to the local one and updates the remote tracking objects: origin/master and origin/HEAD

The tracking branch is affected and modified by local commits. So, typically, over a period of time, the remote tracking branches and tracking branches point to different commit objects:

                                          origin / master
                                             |
                                         origin / HEAD
                                             |
C1 <--- C2 <--- C3 <--- C4 <--- C5
                                  |
                              master
                                  |
                              HEAD



To be continued...