Tech Tidbits

This article provides an insight into how Git works, what is the role of GitHub and how the two are connected with each other. The aim is to arm you, the reader, with a clear understanding of how Git and GitHub essentially work so that the various Git commands start making sense.

1. Git

In this section you will look at and understand how Git works just by itself, insulated from the rest of the world. You will see that this understanding is critical in the use of Git in the more general situation of Git being used by many people, possibly working from remote geographical locations.

1.1 Repository

Git is a software, a program that is used for source code version management. It comes pre-installed on Mac machines; installing it on Ubuntu Linux is achieved by running "sudo apt-get install git" and likewise on other systems.

To get Git to do something useful, the first thing you have to have is a Git repository. In the standalone Git system, this is achieved by the command:

git init

You have to execute this command in the directory containing the files that you want to manage using Git. This command creates a ".git" subdirectory in the directory in which the command is run. And this ".git" subdirectory is the Git repository.

As an example, consider that you are a java programmer who has a directory called "workspace" that contains all your java projects. If you wanted to put all your java projects under Git version control, you would do:

cd workspace
git init

1.2 Commits & Heads

The thing to understanding Git is to have a conceptual picture of what resides in the ".git" subdirectory which is the Git repository. In the Git repository, there are two types of objects: commits and heads.

A Commit object represents a snapshot of what is in the directory containing files managed by the Git repository. As the files managed by a Git repository change, new snapshots can be taken by making more Commit objects in the repository. The command to create a Commit object is:

git commit [and some options here]

Over a period of time, the Git repository will appear as below:

C1 <--- C2 <--- C3 <--- C4

The Commit objects are related to their parents which are themselves snapshots at earlier points in time.

A Head object is a reference to a Commit object. For example:

C1 <--- C2 <--- C3 <--- C4
|
master

By default, when a new repository is created, the latest Commit object is referred to by a Head object called "master". There is also a special name called HEAD that the Git software understands. This name HEAD represents a Head object that points to the currently active Commit object.

C1 <--- C2 <--- C3 <--- C4
|
    master
|
    HEAD

The HEAD can be changed by using the command:

git checkout [more options here]

2. GitHub

You have so far seen the use of a Git repository sitting in a standalone system isolated from the rest of the world. But that was only because the concepts to be understood were seen clearly in that context. Now we move into a more real-world situation where the same Git repository is shared by many people.

2.1 Remote Repository

GitHub provides a facility to create a "remote" Git repository that will be shared by different people. To create such a repository, you will need to get a GitHub account (which is free for public repositories) and login to github.com. Once you are logged in you can create a new empty repository that can be used by many different people.

2.2 Cloning

The way to use the repository on GitHub is to create a replica of it on your local file system. The command to do this is:

git clone <remote_repo_url>

The remote_repo_url is the url available from the GitHub interface for the repository. Use the "https" url provided by GitHub.

So, if you were a java programmer in a team that maintained its projects in a directory called workspace, you would execute the following commands:

cd workspace
git clone https://github.com/xyz/def

This would have the effect of creating a replica of the remote Git repostiory locally on your file system.

2.3 Tracking & Remote Tracking

It is at this point that several new things come into play that start complicating the simple picture of the standalone Git repository.

Here is what happens:

The Git software runs on your system alone. So, it has to keep track of two sets of things: the state of the repository on the remote location as well as the state of the repository on the local system. For this purpose, there are two sets of objects (actually Head objects) maintained.

Here is an example:

The remote repository has the structure:

    master
   |
    HEAD
|
C1 <--- C2 <--- C3 <--- C4

Then the clone process creates the following repository structure on the local system:
       origin / master
|
    origin / HEAD
|
C1 <--- C2 <--- C3 <--- C4
|
    master
|
    HEAD

The heads origin/master and origin/HEAD are called remote tracking branches while the master and HEAD are called tracking branches. The term branch in this context is a reference to a Commit object and therefore synonymous with a Head object. The term branch has a more general meaning which comes into play with branching and merging operations.

The remote tracking branch is a read-only view into the repository and it tracks the state of the remote repository. In order to update the remote tracking branch, the Git operation is:

git fetch

This operation gets all the commits from the remote repository to the local one and updates the remote tracking objects: origin/master and origin/HEAD

The tracking branch is affected and modified by local commits. So, typically, over a period of time, the remote tracking branches and tracking branches point to different commit objects:

          origin / master
       |
       origin / HEAD
       |
C1 <--- C2 <--- C3 <--- C4 <--- C5
|
    master
|
    HEAD

To be continued...

There are many articles on using SSH and they deal with the commands you have to run in order to use SSH but it has been difficult to find articles that give a sort of "under the hood" look at what is happening and connect the commands to that. This article is an attempt to plug that gap.

1. The Need for SSH

Ever since the advent of the internet, the concern for sensitive data being made accessible over the public internet has been a subject of concern. With more and more systems getting hosted on servers, the need for a secure mechanism to access data from servers remotely is ever more important. The Secure Shell protocol provides a solution for this problem.

2 Public and Private Keys

In order to make sense of a significant part of the subject matter of SSH, an understanding of public and private keys is essential. After all, the "secure" in SSH implies the use of these keys. Here is a simplified explanation:

Think of a private key as a special key that can only be used to lock a box and a public key that can only be used to open the box. The public key is duplicated and distributed freely to anyone with whom I want to send the contents of the box. The private key is jealously guarded.

So, if there is a server to whom I want to send some information, I will put the information in the box and lock it with my private key. When the server receives the box, it uses my public key to open it and get access to the contents.

3 The Encrypted Session

Before any user information or passwords are exchanged, SSH sets up an encrypted session so that all information exchanged between client and server is always encrypted. Here is how it happens:

The Server has the following:a) A 1024 bit host key-pair (i.e. private & public keys)
b) A 768-bit server key pair

The client has the following:
a) A list of servers (refered to as hosts) that it has connected to in the past (in ~/.ssh/known_hosts file)
b) A randomly generated 256-bit key

Establishing the session is as follows:
1. The server sends the host and server public keys to the client.
2. The client checks the keys against its stored list and warns in case of discrepancy
3. The client uses both the public keys from the server to encrypt the random number it has generated
4. It sends the encrypted random key to the server which decrypts it using its private keys
5. Now both the client and server have a random key that no one else has access to
6. All subsequent traffic between the client and server will be encrypted using this key

4. Authentication

4.1 Method 1: Password

Since the channel is encrypted, the password can be sent from the client to the server without concern for it being sniffed.

There are two issues with using passwords for authentication. Often, passwords are difficult to remember, especially when SSH will typically be used to connect with many servers, each having a different password. The other issue is that a keylogging program could be used to get access to a password.

For these reasons, there is another mechanism for authentication with SSH.

4.2 Method 2: RSA Keys

On most unix-based systems there is a very useful program called keygen that generates public-private key pairs (Windows based systems use PuTTYgen). The user (who will use ssh client) uses keygen to generate a personal key pair for himself.

Any host (server) that the user wants to connect to must have access to the public key that the user has generated. The server has an authorized_keys file (~/.ssh/authorized_keys) into which the user pastes his public key.

The advantage of this mechanism is that the password is that the password is never exchanged between the client and server.

5. Logging In

In this section, we are assuming that the ssh server is running on the host that you are connecting to.

5.1 Using Password Authentication

Logging in using password authentication is very simple. In a terminal window, type the following command (excluding the $ sign):

$ ssh user@host
e.g. $ ssh bob@csu.dub.ef

The <user> in the above command is a valid user on the host that you are connecting to. The system will prompt for a password. On supplying the correct password, you will be logged in to the secure shell.

5.2 Authentication Using RSA Keys

In this case, the complication is in the set up process. If the set up is done, the command is exactly the same except that you will not be prompted for a password:

$ ssh user@host

The keyfile.pem contains the private key for the user whose corresponding public key is installed on the server.

Setting up for RSA Authentication:

1. Generate your personal key pair:
$ ssh-keygen -t rsa

You will be prompted for the file where to store the key pair and for a passphrase. To keep things simple, choose the default by pressing "Enter". The public key will be saved in ~/.ssh/id_rsa.pub while the private key will be saved in ~/.ssh/id_rsa

2. Copy your public key to the server

$ cat ~/.ssh/id_rsa.pub

This should generate a long line that wraps in the terminal window. Copy this line to the server and paste it at the end of ~/.ssh/authorized_keys on a new line at the end of the file. If the file doesn't already exist, you can create it and then paste the public key in it.

3. At this point you are ready to connect to the server over ssh using the RSA key for authentication using the command below and you will be logged in without being prompted for a password.

$ ssh user@host

5.3 Troubleshooting: Authentication Using RSA Keys

Following the above steps is not likely to start the RSA authentication working. There are file and directory permissions that have to be set before it will work.

Server:
~ (/home/username) : Permissions: drwxr-xr-x
~/.ssh : Permissions: drwx------
~/.ssh/authorized_keys : Permissions: -rw-------

Client:
~ (/home/username) : Permissions: drwxrwxrwx (or more restrictive)
~/.ssh : Permissions: drwx------
~/.ssh/id_rsa : Permissions: -rw-------
If this still does not work, you need to work on the server for further troubleshooting:

In /etc/ssh/sshd_config ensure that the following line is uncommented:
AuthorizedKeysFile .ssh/authorized_keys

If this does not work, look into the following logs to get further insight into the problem:

/var/log/authd.log

A typical message reported in this log that points you in the right direction is:

Authentication refused: bad ownership or modes for directory /home/user/.ssh