Discovering Hamming Codes

July 18th, 2010 admin Posted in Information, Tidbits, Tutorials | No Comments »

Digital data, transmitted over a communication medium (wireless, optical fiber, copper wire), or stored in some storage medium (such as computer memory or hard disk), is prone to bit-flips and errors. For example, if the message “10110101000101010″ means “BILL JOHN” and communication channel noise flips a bit, the message received may be “10010101000101010″, meaning, “KILL JOHN”. Now, that could create a problem. The problem also exists in data that is sitting untouched on a digital storage medium. Have you ever noticed that if you open some photo file on your computer, after years of storage, they develop strange colors and often do not display fully? This could be due to some bit errors in the stored 1s and 0s that represent the image file data.

One way to avoid (or, at least drastically reduce) this problem is to either send or store multiple copies of the information. Say you send the message “BILL JOHN” 3 times. One of the times the error converts this to “KILL JOHN”, but two of the times the message transmits successfully. Then you know that most likely the intended message was “BILL JOHN”. Similarly, your bank probably stores your account information of multiple storage locations to avoid this and other problems (such as, catastrophic data loss due to a fire).  However, redundancy is not very efficient, and may be overkill. Further, it may not be always possible to apply (say the communication takes a long time and repeat communication is not possible).

Another solution is to add a small amount of extra information to the message being communicated or stored to validate the correctness of the data. This is, in effect, redundancy, but is often a more space-efficient form of it. Its purpose is primarily to detect errors in the message data, not to handle catastrophic destruction of data (which is the primary purpose of full redundancy). The most common way to detect errors in transmitted data is by the use of parity bits. The data is first divided into blocks of bits, say 7 or 15-bit blocks. Next, for each block of bits a parity bit is added, thus making the block size 8 or 16 bits. For example, if the original data is “10110101000101010″, and the block size is 7 +1 parity bit, then the data is first extended to make it a multiple of 7. So, say the message is changed to “000010110101000101010″. Then, it is broken into 7-bit blocks – “0000101″, “1010100″, “0101010″. And finally, for each 7-bit block one more bit is added. Say, it is added to the right end. The parity bit is dependent on the data bits. The parity bit is set to a 1 or a 0 depending on the number of 1s in the message. If there are an odd number of 1s in the message, the parity bit is set to 1 to make the total parity of the 8-bit block even. I am assuming even parity in this example; the parity may also be set up to be odd. Going back to the example, if there are already an even number of 1s in the 7-bit data block the parity bit is set to 0. The recipient knows what the block size is, where the parity bit appears, and what kind of parity (Even or Odd Parity) is being used. This is part of a pre-established communication agreement. For example, “0000101″ has 2 1s, that is, an even number of 1s. The parity bit is set to 0 and appended to these 7 bits to make the transmission block, “00001010“. Upon transmission, say there is a bit flip and the message becomes “00101010″. The receiver can then figure out that the parity, which is supposed to be even is now odd, and therefore indicates an error. The receiver can then request a retransmission. Notice that the parity bit itself may be transmitted in error, and that triggers an error as well (even though in reality the data bits were all transmitted fine). The size of the block is set to be small enough so that the probability of a bit flip is very very low. More importantly, the probability of two flips is infinitesimally small. Notice that if there are two flips, however, a single bit parity scheme as described above will not work. There is then a need for a more involved parity scheme. If the channel or storage medium has a propensity for unidirectional flips (that is, say it can only flip a 1 to a 0, bit never a 0 to a 1), then a counter may be maintained along with the data to count the number of 1s in the data block. If the number of 1s changes, then the receiver can detect an error. It is a little tricky once you realize that the number of 1s counted must include the number of 1s in the counter itself! The counter bits need to be transmitted as well, and are subject to bit flips as well.

In some situations, there may not be a second copy to rerequest the data upon detecting the error. Or in case of a communication channel, it may take too long to request a replay. And regardless of the application, it seems mathematically challenging to come up with a way to send a message such that not only is the error detected, it is also corrected by the receiver. This was the challenge that Richard Hamming took up an solved in an ingenious way. Here I would like to think through the same problem and come with a solution, which will then help us think through some of Hamming’s thoughts in the 1930s and 40s.  We will stick to the single bit error scenario. We can always reduce the block size sufficiently that only a single bit error can occur with any meaningful probability.

The first insight that led to the discovery of the Hamming code seems to be that some extra meta data (just like parity) needs to be transmitted in addition to the main data. In the presence of this extra data the receiver must be able to identify a bit flip in any bit in the transferred message (including a bit flip in the meta bits).  How do we uniquely identify a bit that flipped? We need to identify the position of the bit that flipped. In a message with d bits of data and p bits of meta data, the total number of transmitted bits is d+p. To identify a unique position where the flip may have occurred, we need log2(d+p) bits. The genius of Hamming was in recognizing that the bits being transmitted should be grouped into several groups such that each bit was a member of a unique set of groups. Further, the binary representation of the bit positions was the simplest way to identify the groups.

Say the message were 10 bits long. That is, d+p=10. The 10 positions can be represented as:

0001 – message bit 1
0010 – message bit 2
0011 – message bit 3
0100 – message bit 4
0101 – message bit 5
0110 – message bit 6
0111 – message bit 7
1000 – message bit 8
1001 – message bit 9
1010 – message bit 10

Notice that we start from 0001, and not 0000. This is because we need to make sure all numbers have 1 (or 0). This means to represent 8 positions (d+p=8), we actually need 4 bits. So the number of groups = log2(d+p+1). Further, notice that the binary representation of every message-bit-position has 1s in unique power-of-2 positions. This is obvious. After all, that is how we represent the positions uniquely. But the insight Hamming was able to draw was that each of the representations (left column above) with a 1 in a certain power-of-2 position could be clubbed together into a group. That is, in the example above, message bits with a 1 in the 2^0 position could be grouped into Club 0. Message bits with a 1 in the 2^1 position could be grouped into Club 1. Notice that some of the numbers which were members of Club 1 were also members Club 0 (for example message bit 3). Similarly, Club 2 and Club 3 could be formed. Further, exactly 4 clubs were needed to identify every bit (including the meta bits uniquely).

0 | 0 | 0 | 1 | – message bit 1  – Club 0
0 | 0 | 1 |0 | – message bit 2 – Club 1
0 | 0 | 1 | 1 | – message bit 3 – Club 0,1
0 | 1 | 0 | 0 | – message bit 4 – Club 2
0 | 1 | 0 | 1 | – message bit 5 – Club 0, 2
0 | 1 | 1 | 0 | – message bit 6 – Club 1, 2
0 | 1 | 1 | 1 | – message bit 7 – Club 0, 1, 2
1 | 0 | 0 | 0 | – message bit 8 – Club 3
1 | 0 | 0 | 1 | – message bit 9 – Club 0, 3
1 | 0 | 1 | 0 | – message bit 10 – Club 1, 3

Notice that no two bit-position-representations can belong to the same set of Clubs. After all, every bit position is represented uniquely in binary, and a 1 in the binary representation corresponds to a “key” to a certain club. No two binary representations are the same and so no two bit-positions can belong to the same club combination.

The groups/clubs thus formed are:

Club 0 – message bits 1, 3, 5, 7, 9
Club 1 – message bits 2, 3, 6, 7, 10
Club 2 – message bits 4, 5, 6, 7
Club 3 – message bits 8, 9, 10

Now, Hamming must have realized that he has created clubs with careful membership such that a unique member could be located if we knew which clubs he belonged to. So, a flipping bit can be located if that flip can identify all the clubs the bit belongs to. The clubs must start with some common property, and the flip must change that property for each club that it belongs to. This would help identify the clubs in which the flipped bit has membership, and by that unique combination, we can identify the flipped bit.

That unique property that Hamming gave each club was parity. Since no two clubs has the same membership, each club needed at least 1 unique parity bit. Since we want to keep the number of parity bits as small as possible to keep the message overhead as small as possible, the smallest solution would need as many parity bits as clubs. And we know that for transmitting d+n bits, we need log2(d+p+1) bits to represent the positions and, therefore, log2(d+p+1) clubs. This leads to the following observation:

The minimum overhead bits/parity bits = log2(d+p+1). But we assumed that the number of parity bits = p.

p >= log2(d+p+1)

Now we know how many groups, which bits make up the groups and the minimum number of parity bits. In the above example with d+p=10, p=4. Thus for 6 data bits we need to transmit 4 parity bits. But we must recognize that the parity bits cannot be simply lumped together into any casually selected positions. This is because without careful selection we may not have at least 1 parity bit per group. For example if the first 4 bit positions of a 10 bit message are used for parity, there would be no parity bits in Club 3. If we instead used the last 4 bits in a 10-bit message for parity, each club would have at least one parity bit, but all 3 of Club 3’s members would be parity bits. This creates a dependency on filling out the parity bits on the sender side. First parity bit 7 would have to be established based on Club 2 which has 3 data bits and only 1 parity bit. Then the remaining parity bits in Club 0 and Club 1 would have to be established, followed finally by the parity for Club 3. Hamming realized that it would be much easier to ensure that one parity bit belonged to 1 club, no more no less. All the members of a club are data bits except the one parity. This helps speed up setting the parity bit – there is no dependency and fixed order in which the parity bits need to be established. The bit positions in the message which belong t0 only 1 club are positions with a single 1 and all else 0s. These are also the power-of-two bit positions. Bit position 1, 2, 4 and 8. By making these bits parity bits, there is another advantage. At the receiving end, once parity is established for the various clubs, if it turns out that club 2 and club 1 are not maintaining parity, then to identify the bit position which flipped all that is needed is to calculate 2^2 + 2^1 = 6. Bit at position 6 is the only bit which is a member of Club 2 and 1 and no other club.

References:
I found this video of a class, held in University of New South Wales, Australia, and taught by Professor Richard Buckland, to be very useful in helping me understand how Hamming Codes work. The Hamming Code material starts at minute 8:00. http://www.youtube.com/watch?v=kE7V7UI4jpk

Leave a Reply