Monday, September 14, 2015

Working with binary in Go (golang).

I have recently been working with raw IP packet generation in Go and came across a bit of code in the "golang.org/x/net/ipv4/header.go" file that I believe could use some explanation.  It uses a bit of binary manipulation which some people find confusing.  The code in question is on line 71 and can be found here.  Lets look at this line and walk through its explanation.

b[0] = byte(Version<<4 | (hdrlen >> 2 & 0x0f))

First off, in an IPv4 header, the first byte of data is actually broken up in to two 4 bit fields.  The first 4 bit field is the IP version and the second 4 bit field is the number of 32-bit words in the header.  The smallest header a valid IPv4 packet can have is 5 32-bit words or 20 bytes.

Second, network byte order is big-endian, meaning that the biggest bit values are first, thus:

128 (decimal) = 1000 0000 (binary big-endian)

Lets look at the first part of the code so we can understand this expression:

Version<<4

To figure this out we first need to see that on line #25 the Version constant is defined to be 4.  Thus this expression expands to 4<<4 which is to say 4 times 2, 4 times, or 4*2*2*2*2.  This gives a value of

64 in decimal or 0100 0000 in binary big-endian

Now if you were to look at just the 4 left most bits of that byte, you would see 0100.  And if that is all you had, that would equal 4, which is the IP version.  The reason for the calculations is to get the right bit in this byte set, so that when you break the byte up in to two 4 bit blocks, the values are in the right places.

Now lets look at the second part of the expression which is actually two expressions:

(hdrlen >> 2 & 0x0f)

We see on line #69 that hdrlen is computed, and has a minimum value of 20.  So for the case where hdrlen equals 20 the first part of this expression could be expanded to 20 >> 2 which is to say 20 divided by 2, 2 times, or 20/2/2.  This gives a value of:

5 decimal or 0000 0101 in binary big-endian

Now we have the "&" operator, which is a bitwise AND.  What the bitwise AND gives us, is all of the bits that are a 1 in both values.  So lets look at both sides of the bitwise AND in binary big-endian.

hdrlen >> 2 = 5 = 0000 0101
0x0f (hex) = 15 decimal = 0000 1111

Now lets line up the binary and do the bitwise AND operation, thus giving you just the bits where the values are 1.  This is a type of mask to make sure we ONLY populate the 4 right most bits.  Remember the 4 left most bits are part of the version not part of the header length and we do not want to accidentally have them in this answer.

0000 0101 (header length)
0000 1111 (mask)
==========&
0000 0101

The last thing to do is join the two values together.  Remember the code was:

b[0] = byte(Version<<4 | (hdrlen >> 2 & 0x0f))

To this we use the "|" operator, which is a bitwise OR.  The bitwise OR gives us all of the bits that have a value of 1 in either of the two values.  So the first value is the Version from up above and the second value is the header length that we just did. Lets line them up in binary and compute the bitwise OR.

0100 0000 (version)
0000 0101 (header length)
========|
0100 0101

Thus the first byte of the b[0] of the slice is:

0100 0101 binary big-endian or 69 decimal or 0x45 hex

You can see that if you broke that byte up in to two 4 bit fields you would have:

0100 = IP version 4
0101 = 5 32-bit word header (a header that is 20 bytes in size)

So all of this binary manipulation was done to make sure we could get the right bits in the right parts of the byte so that we can treat a single byte of data as two separate 4 bit fields.