Up to this point we have been playing with integers that are built
using a whole number of bytes. However, we have seen that the type
specifier for an integer has the form int<N>
or uint<N>
for signed and unsigned variants, where N
is the width of the
integer, in bits. We have used bit-sizes that are multiple of 8,
which is the size of a byte. So, why is this so? Why is N
not
measured in bytes instead?
The reason is that poke is not limited to integers composed of a whole
number of bytes. You can actually have integers composed of any
number of bits, between 1 and 64. So yes, int<3>
is a type
specifier for signed 3-bit integers, and uint<17>
is a type
specifier for unsigned 17-bit integers.
We call integers like this weird integers.
The vast majority of programming languages do not provide any support for weird integers. In the few cases they do, it is often in a very limited and specific way, like bitmap fields in C structs. Such constructions are often vague, obscure, and often their semantics depend on the specific implementation of the language, and/or the characteristics of the system where you run your program.
In poke, on the contrary, weird numbers are first class citizens, and they don’t differ in any way from “normal” integers which are composed of a whole number of bytes. Their interpretation is also well defined, and they keep the same semantics regardless of the characteristics of the computer on which poke is running.
Let’s consider first weird numbers that span for more than one byte. For example, an unsigned integer of 12 bits. Let’s visualize the written form of this number, i.e. the sequence of its constituent bytes as they appear in the underlying IO space:
byte 0 | byte 1 +---------+----+----+ |::::::::::::::| | +---------+----+----+ | uint<12> |
All right, the first byte is used in its entirely, but only half of the second byte is used to conform the value of the number. The other half of the second byte has no influence of the value of the 12 bits number.
Now, we talk about the “second half of the byte”, but what do that
means exactly? We know that bytes in memory and files (bytes in IO
spaces) are indivisible at the system level: bytes are read and
written one at a time, as little integers in the range 0..255
.
However, we can create the useful fiction that each byte is composed
by bits, which are the digits in the binary representation of
the byte value.
So, we can look at a byte as composed of a group of eight bits, like this:
byte +-------------------------+ | b7 b6 b5 b4 b3 b2 b1 b0 | +-------------------------+
Note how we decided to number the bits in descending order from left
to right. This is because these bits correspond to the base of the
polynomial equivalent to the binary value of the byte, i.e. the
value of the byte is
b7*2^7+b6*2^6+b5*2^5+b4*2^4+b3*2^3+b2*2^2+b1*2^1+b0*2^0
. In
other words: at the bit level poke always uses a big endian
interpretation, and the bit that “comes first” in this imaginary
stream of bits is the most significant bit in the binary
representation of the number. Please note that this is just a
convention imposed by the poke authors: the opposite could have been
chosen, but it would have been a bit confusing, as we would have to
picture binary numbers in reverse order!
With this new way of looking at bytes, we can now visualize what we mean exactly with the “first half” and “second half” of the trailing byte, in our 12 bits unsigned number:
byte 0 | byte 1 +-------------------------+-------------+-------------+ | a7 a6 a5 a4 a3 a2 a1 a0 b7 b6 b5 b4 : | +-------------------------+-------------+-------------+ | uint<12> |
Thus the first half of byte 1
is the sequence of bits b7
b6 b5 b4
. The second half, which is not pictured since it doesn’t
contribute to the value of the number, would be b3 b2 b1 b0
.
So what would be the value of the 12-bit integer? Exactly like with non-weird numbers, this depends on the current selected endianness, which determines the ordering of bytes.
If the current endianness is big, then byte 0
provides the most
significant bits of the result number, and the used portion of
byte 1
provides the least significant bits of the result
number:
0b a7 a6 a5 a4 a3 a2 a1 a0 b7 b6 b5 b4
However, if the current selected endianness is little, then the used
portion of byte 1
provides the most significant bits of the
result number, and byte 0
provides the least significant bits
of the result number:
0b b7 b6 b5 b4 a7 a6 a5 a4 a3 a2 a1 a0
Let’s see this in action. Let’s take a look to the value of the first two bytes in foo.o, in binary:
(poke) .set obase 2 (poke) byte @ 0#B 0b01111111UB (poke) byte @ 1#B 0b01000101UB
Looking at these bytes as sequences of bits, we have:
byte @ 0#B | byte @ 1#B +-------------------------+-------------+-------------+ | 0 1 1 1 1 1 1 1 0 1 0 0 : 0 1 0 1 +-------------------------+-------------+-------------+ | uint<12> |
Let’s map our weird number at offset 0 bytes, using big endian:
(poke) .set endian big (poke) uint<12> @ 0#B 0b011111110100 as uint<12>
That matches what we explained before: the most significant bits of
the unsigned 12 bits number come from the byte at offset 0,
i.e. 01111111
, whereas the least significant bits come from the
byte at offset 1, i.e. 0100
.
Now let’s map it using little endian:
(poke) uint<12> @ 0#B 0b010001111111 as uint<12>
This time the most significant bits of the unsigned 12 bits number
come from the byte at offset 1, i.e. 0100
, whereas the least
significant bits come from the byte at offset 0, i.e. 01111111
.
An important thing to note is that non-weird numbers, i.e. numbers built with a whole number of bytes, are basically a particular case of weird numbers where the last byte in the written form (in the IO space) provides all its bits. The rules are exactly the same in all cases, which makes it easy to obtain predictable and natural results when building integers using poke.
The second kind of weird numbers are integers using less than 8 bits. These “sub-byte” numbers do not use all the bits of their containing byte. Consider for example the written form of an unsigned integer of size 5 bits:
byte +-----+----+ |:::::| | +-----+----+ uint<5>
Now let’s view the byte as a sequence of bits:
byte +----------------+----------+ | b7 b6 b5 b4 b3 | | +----------------+----------+ | uint<5> |
What is the value of this number? Applying the general rules for building integers from bytes, we can easily see that regardless of the current endianness the value, in binary, is:
0b b7 b6 b5 b4 b3
Let’s see this in poke:
(poke) .set obase 2 (poke) .set endian big (poke) byte @ 0#B 0b01111111UB (poke) uint<5> @ 0#B 0b01111 as uint<5> (poke) .set endian little (poke) uint<5> @ 0#B 0b01111 as uint<5>
In the section discussing negative integers, we saw how the difference between a signed number and an unsigned number is basically a different interpretation of the most significant bit. Exactly the same applies to weird numbers.
Let’s summon our unsigned 12-bit integer at the beginning of the file foo.o:
(poke) .set endian big (poke) uint<12> @ 0#B 0b011111110100 as uint<12>
The most significant bit of the resulting value (not of its written form) indicates that this number would be positive if we were mapping the corresponding signed value. Let’s see:
(poke) int<12> @ 0#B 0b010001111111 as int<12> (poke) .set obase 10 (poke) int<12> @ 0#B 1151 as int<12>
Let’s make it a bit more interesting, and change the value of the first byte in the file so we get a negative number:
(poke) .set obase 2 (poke) byte @ 0#B = 0b1111_1111 (poke) int<12> @ 0#B 0b111111110100 as int<12> (poke) .set obase 10 (poke) int<12> @ 0#B -12 as int<12>
Now, let’s switch to little endian:
(poke) .set endian little (poke) .set obase 2 (poke) int<12> @ 0#B 0b010011111111 as int<12> (poke) .set obase 10 (poke) int<12> 0#B 1279 as int<12>