3.6 From Bytes to Integers

The bytes we have been working with are unsigned whole numbers (or integers) in the range 0..255. We saw how poke sees the contents of the files as a sequence of bytes, and how each byte can be addressed using an offset. Mapping bytes using the map operator @ gives us these values, which are denoted in poke with literals like 10UB or 0x0aUB.

This very limited range of values have consequences when it comes to do arithmetic with bytes. Suppose for example we wanted to calculate the average of the first three byte values stored in foo.o. We could do something like:

(poke) a0 = byte @ 0#B
(poke) a1 = byte @ 1#B
(poke) a2 = byte @ 2#B
(poke) a0
0x7fUB
(poke) a1
0x45UB
(poke) a2
0x4cUB
(poke) (a0 + a1 + a2) / 3UB
5UB

That is obviously the wrong answer. What happened? Let’s do it step by step. First, we add the first two bytes:

(poke) a0 + a1
0xc4UB

Which is all right. 0xc4 is 0x7f plus 0x45. But, let’s add now the third byte:

(poke) a0 + a1 + a2
0x10UB

That’s no good. Adding the value of the third byte (0x4c) we overflow the range of valid values for a byte value. The calculation went banana at this point.

Another obvious problem is that we surely will want to store integers bigger than 255 in our files. Clearly we need a way to encode them somehow, and since all we have in a file are bytes, the integers will have to be composed of them.

Integers bigger than 255 can be encoded by interpreting consecutive byte values in a certain way. First, let’s consider a single byte. If we print a byte value using binary rather than decimal or hexadecimal, we will observe that eight bits are what it takes to encode the numbers between 0 and 0xff (255) using a natural binary encoding:

(poke) .set obase 2
(poke) 0UB
0b00000000UB
(poke) 0xFFUB
0b11111111UB

This is the reason why people say bytes are “composed” of eight bits, or that the width of a byte is eight bits. But this way of talking doesn’t really reflect the view that the operating system has of devices like files or memory buffers: both disk and memory controllers provide and consume bytes, i.e. little unsigned numbers in the range 0..255. At that level, bytes are indivisible. We will see later that poke provides ways to work on the “sub-byte” level, but that is just really an artifact to make our life easier: underneath, all that goes in and out are bytes.

Anyhow, if we were to “concatenate” the binary representation of two consecutive bytes, we would end with a much bigger range of possible numbers, in the range 0b00000000_00000000..0b11111111_111111113, or 0x0000..0xffff in hexadecimal. poke provides a bit-concatenation operator ::: that does exactly that:

(poke) 0x1UB
0b00000001UB
(poke) 0x1UB ::: 0x1UB
0b0000000100000001UH

Note how the suffix of the resulting number is now UH. This indicates that the number is no longer a byte value: it is too big for that. The H in this new suffix means “half”, and it is a traditional way to call an integer that is encoded using two bytes, or 16 bits.

So, using our method of encoding bigger numbers concatenating bytes, what would be the “half” integer composed of two bytes at the beginning of foo.o?

(poke) .set obase 16
(poke) (byte @ 0#B):::(byte @ 1#B)
0x7f45UH

Now, let’s go back to the syntax we used to map a byte value. In the invocation of the map operator byte @ 0#B the operand at the left (in this case byte) tells the operator what kind of value to map. This is called a type specifier; byte is the type specifier for a single byte value, and byte[3] is the type specifier for a group of three byte values arranged in an array.

As it happens, byte is a synonym for another slightly more interesting type specifier: uint<8>. You can probably infer the meaning already: a byte is an unsigned integer which is 8 bits big. We can of course use this alternate specifier in a mapping operation, achieving exactly the same result than if we were using byte:

(poke) uint<8> @ 0#B
0x7fUB

You may be wondering: is it possible to use a similar type specifier for mapping bigger integers, like these “halves” that are composed of two bytes? Yeah, it is indeed possible:

(poke) uint<16> @ 0#B
0x7f45UH

Mapping an unsigned integer of 16-bits at the offset 0 gives us an unsigned “half” value, as expected.

You can easily build bigger and bigger numbers concatenating more and more bytes. Three bytes? sure:

(poke) uint<24> @ 0#B
(uint<24>) 0x7f454c

Note that in this case poke uses a prefix instead of a suffix to indicate that the given value is 24-bits long. This is because only a limited number of suffixes (which are more concise and more readable than the prefix form) are available, corresponding to common or typical widths.

Four bytes?

(poke) uint<32> @ 0#B
0x7f454c46U

Certain integer widths are so often used that easier-to-type synonyms for their type specifiers are provided. We already know byte for uint<8>. Similarly, ushort is a synonym for uint<16>, uint is a synonym for uint<32> and ulong is a synonym for uint<64>. Try them!

GNU poke supports integers up to eight bytes, i.e. up to 64-bits. This may change in the future, as we are planning to support arbitrarily large integers.


Footnotes

(3)

poke allows to insert underscore characters _ anywhere in number literals. The only purpose of these characters is to improve readability, and they are totally ignored by poke, i.e. they do not alter the value of the number.