3.9 Weird Integers

Up to this point we have been playing with integers that are built using a whole number of bytes. However, we have seen that the type specifier for an integer has the form int<N> or uint<N> for signed and unsigned variants, where N is the width of the integer, in bits. We have used bit-sizes that are multiple of 8, which is the size of a byte. So, why is this so? Why is N not measured in bytes instead?

The reason is that poke is not limited to integers composed of a whole number of bytes. You can actually have integers having any number of bits, between 1 and 64. So yes, int<3> is a type specifier for signed 3-bit integers, and uint<17> is a type specifier for unsigned 17-bit integers.

We call integers like this weird integers.

The vast majority of programming languages do not provide any support for weird integers. In the few cases they do, it is often in a very limited and specific way, like bitmap fields in C structs. Such constructions are often vague, obscure, and often their semantics depend on the specific implementation of the language, and/or the characteristics of the system where you run your program.

In poke, on the contrary, weird numbers are first class citizens, and they don’t differ in any way from “normal” integers composed of a whole number of bytes. Their interpretation is also well defined, and they keep the same semantics regardless of the characteristics of the computer on which poke is running.

3.9.1 Incomplete Bytes

Let’s consider first weird numbers that span for more than one byte. For example, an unsigned integer of 12 bits. Let’s visualize the written form of this number, i.e. the sequence of its constituent bytes as they appear in the underlying IO space:

  byte 0  |  byte 1
+---------+----+----+
|::::::::::::::|    |
+---------+----+----+
|   uint<12>   |

All right, the first byte is used in its entirely, but only half of the second byte is used to conform the value of the number. The other half of the second byte has no influence of the value of the 12 bits number.

Now, we talk about the “second half of the byte”, but what do that means exactly? We know that bytes in memory and files (bytes in IO spaces) are indivisible at the system level: bytes are read and written one at a time, as little integers in the range 0..255. However, we can create the useful fiction that each byte is composed by bits, which are the digits in the binary representation of the byte value.

So, we can look at a byte as composed of a group of eight bits, like this:

           byte
+-------------------------+
| b7 b6 b5 b4 b3 b2 b1 b0 |
+-------------------------+

Note how we decided to number the bits in descending order from left to right. This is because these bits correspond to the base of the polynomial equivalent to the binary value of the byte, i.e. the value of the byte is b7*2^7+b6*2^6+b5*2^5+b4*2^4+b3*2^3+b2*2^2+b1*2^1+b0*2^0. In other words: at the bit level poke always uses a big endian interpretation, and the bit that “comes first” in this imaginary stream of bits is the most significant bit in the binary representation of the number. Please note that this is just a convention, set by the poke authors: the opposite could have been chosen, but it would have been a bit confusing, as we would have to picture binary numbers in reverse order!

With this new way of looking at bytes, we can now visualize what we mean exactly with the “first half” and “second half” of the trailing byte, in our 12 bits unsigned number:

           byte 0         |           byte 1
+-------------------------+-------------+-------------+
| a7 a6 a5 a4 a3 a2 a1 a0   b7 b6 b5 b4 :             |
+-------------------------+-------------+-------------+
|                uint<12>               |

Thus the first half of byte 1 is the sequence of bits b7 b6 b5 b4. The second half, which is not pictured since it doesn’t contribute to the value of the number, would be b3 b2 b1 b0.

So what would be the value of the 12-bit integer? Exactly like with non-weird numbers, this depends on the current selected endianness, which determines the ordering of bytes.

If the current endianness is big, then byte 0 provides the most significant bits of the result number, and the used portion of byte 1 provides the least significant bits of the result number:

0b a7 a6 a5 a4 a3 a2 a1 a0 b7 b6 b5 b4

However, if the current selected endianness is little, then the used portion of byte 1 provides the most significant bits of the result number, and byte 0 provides the least significant bits of the result number:

0b b7 b6 b5 b4 a7 a6 a5 a4 a3 a2 a1 a0

Let’s see this in action. Let’s take a look to the value of the first two bytes in foo.o, in binary:

(poke) .set obase 2
(poke) byte @ 0#B
0b01111111UB
(poke) byte @ 1#B
0b01000101UB

Looking at these bytes as sequences of bits, we have:

        byte @ 0#B       |        byte @ 1#B
+-------------------------+-------------+-------------+
|  0  1  1  1  1  1  1  1    0  1  0  0 :  0  1  0  1
+-------------------------+-------------+-------------+
|                uint<12>               |

Let’s map our weird number at offset 0 bytes, using big endian:

(poke) .set endian big
(poke) uint<12> @ 0#B
(uint<12>) 0b011111110100

That matches what we explained before: the most significant bits of the unsigned 12 bits number come from the byte at offset 0, i.e. 01111111, whereas the least significant bits come from the byte at offset 1, i.e. 0100.

Now let’s map it using little endian:

(poke) uint<12> @ 0#B
(uint<12>) 0b010001111111

This time the most significant bits of the unsigned 12 bits number come from the byte at offset 1, i.e. 0100, whereas the least significant bits come from the byte at offset 0, i.e. 01111111.

An important thing to note is that non-weird numbers, i.e. numbers built with a whole number of bytes, are basically a particular case of weird numbers where the last byte in the written form (in the IO space) provides all its bits. The rules are exactly the same in all cases, which makes it easy to obtain predictable and natural results when building integers using poke.

3.9.2 Quantum Bytenics

The second kind of weird numbers are integers using less than 8 bits. These “sub-byte” numbers do not use all the bits of their containing byte. Consider for example the written form of an unsigned integer of size 5 bits:

    byte
+-----+----+
|:::::|    |
+-----+----+
uint<5>

Now let’s view the byte as a sequence of bits:

             byte
+----------------+----------+
| b7 b6 b5 b4 b3 |          |
+----------------+----------+
|    uint<5>     |

What is the value of this number? Applying the general rules for building integers from bytes, we can easily see that regardless of the current endianness the value, in binary, is:

0b b7 b6 b5 b4 b3

Let’s see this in poke:

(poke) .set obase 2
(poke) .set endian big
(poke) byte @ 0#B
0b01111111UB
(poke) uint<5> @ 0#B
(uint<5>) 0b01111
(poke) .set endian little
(poke) uint<5> @ 0#B
(uint<5>) 0b01111

3.9.3 Signed Weird Numbers

In the section discussing negative integers, we saw how the difference between a signed number and an unsigned number is basically a different interpretation of the most significant bit. Exactly the same applies to weird numbers.

Let’s summon our unsigned 12-bit integer at the beginning of the file foo.o:

(poke) .set endian big
(poke) uint<12> @ 0#B
(uint<12>) 0b011111110100

The most significant bit of the resulting value (not of its written form) indicates that this number would be positive if we were mapping the corresponding signed value. Let’s see:

(poke) int<12> @ 0#B
(int<12>) 0b010001111111
(poke) .set obase 10
(poke) int<12> @ 0#B
(int<12>) 1151

Let’s make it a bit more interesting, and change the value of the first byte in the file so we get a negative number:

(poke) .set obase 2
(poke) byte @ 0#B = 0b1111_1111
(poke) int<12> @ 0#B
(int<12>) 0b111111110100
(poke) .set obase 10
(poke) int<12> @ 0#B
(int<12>) -12

Now, let’s switch to little endian:

(poke) .set endian little
(poke) .set obase 2
(poke) int<12> @ 0#B
(int<12>) 0b010011111111
(poke) .set obase 10
(poke) int<12>  0#B
(int<12>) 1279