3.10 Unaligned Integers

We have mentioned above that the data stored in computers, that we edit with poke, is arranged as a sequence of bytes. The entities we edit with poke (that we call IO devices) are presented to us as IO spaces. Up to now, we have accessed this IO space in terms of bytes, in commands like dump :from 32#B and in expressions like 2UB + byte @ 0#B. We said that mapped integers are built from bytes read from the IO space.

However, the IO space that poke offers to us is actually a space of bits, not a space of bytes, and the poke values are mapped on this space of bits. The following figure shows this:

poke values      |        uint<16> @ 2#b        |
-----------      |                               |
IO space     |b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|
-----------  |               |               |               |
IO device    |     byte0     |     byte1     |     byte2     |

The main consequence of this, that you can see in the figure above, is that we can use offsets in mapping operations that are not aligned to bytes. You can specify an offset in bits, instead of bytes, using the #b suffix instead of #B. Little b means bits, and big B means bytes.

Let’s map an unaligned 16 bit unsigned integer in foo.o:

(poke) dump :from 0#B :size 3#B
76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
00000000: 7f45 4c                                  .EL
(poke) .set obase 2
(poke) byte @ 0#B
0b01111111UB
(poke) byte @ 1#B
0b01000101UB
(poke) byte @ 2#B
0b01001100UB
(poke) .set endian big
(poke) uint<16> @ 2#b
0b1111110100010101UH

Graphically:

poke values      |        uint<16> @ 2#b        |
-----------      |                               |
IO space     |0|1|1|1|1|1|1|1|0|1|0|0|0|1|0|1|0|1|0|0|1|1|0|0|
-----------  |               |               |               |
IO device    |     0x7f      |      0x45     |      0x4c     |

These three levels of abstractions make it very easy and natural to work with unaligned data. Imagine for example that you are poking packages in a network protocol that is bit-oriented. This means that the packages will generally not be aligned to byte boundaries, but still the payload stored in the packages contains integers of several sizes. Other conventional binary editors or programming languages, that are almost always byte oriented, would require us to “unpack” the network data to a different, byte oriented, representation before messing with it. poke, on the contrary, allows you to directly map these integers as if they were aligned to byte boundaries, and work with them.

However, when one tries to determine the correspondence between a given poke value and the underlying bytes in the IO device, things can get complicated. This is particularly true when we map what we called “weird numbers”, i.e. numbers with partial bytes. As we saw, the rules to build these numbers were expressed in terms of bytes.

In order to ease the visualization of the process used to build integer values (especially if they are weird numbers, i.e. integers with partial bytes) one can imagine an additional layer of “virtual bytes” above the space of bits provided by the IO space. Graphically:

poke values       |        uint<16> @ 2#b        |
-----------       |                               |
Virtual bytes     | virt. byte1   |  virt. byte2  |
-----------       |               |               |
IO space      |0|1|1|1|1|1|1|1|0|1|0|0|0|1|0|1|0|1|0|0|1|1|0|0|
-----------   |               |               |               |
IO device     |     0x7f      |      0x45     |      0x4c     |

It is very important to understand that the IO space is an abstraction provided by poke. The underlying file, or memory buffer, or whatever, is actually a sequence of bytes; poke translates the operations on integers, bits, bytes, etc into the corresponding byte operations, and this translation is far from trivial. Fortunately, we can let poke do the dirty job for us, and abstract ourselves from that complexity.