The bytes we have been working with are unsigned whole numbers (or
integers) in the range 0..255
. We saw how poke sees the
contents of the files as a sequence of bytes, and how each byte can be
addressed using an offset. Mapping bytes using the map operator
@
gives us these values, which are denoted in poke with
literals like 10UB
or 0x0aUB
.
This very limited range of values have consequences when it comes to do arithmetic with bytes. Suppose for example we wanted to calculate the average of the first three byte values stored in foo.o. We could do something like:
(poke) a0 = byte @ 0#B (poke) a1 = byte @ 1#B (poke) a2 = byte @ 2#B (poke) a0 0x7fUB (poke) a1 0x45UB (poke) a2 0x4cUB (poke) (a0 + a1 + a2) / 3UB 5UB
That is obviously the wrong answer. What happened? Let’s do it step by step. First, we add the first two bytes:
(poke) a0 + a1 0xc4UB
Which is all right. 0xc4 is 0x7f plus 0x45. But, let’s add now the third byte:
(poke) a0 + a1 + a2 0x10UB
That’s no good. Adding the value of the third byte (0x4c) we overflow the range of valid values for a byte value. The calculation went banana at this point.
Another obvious problem is that we surely will want to store integers bigger than 255 in our files. Clearly we need a way to encode them somehow, and since all we have in a file are bytes, the integers will have to be composed of them.
Integers bigger than 255 can be encoded by interpreting consecutive byte values in a certain way. First, let’s consider a single byte. If we print a byte value using binary rather than decimal or hexadecimal, we will observe that eight bits are what it takes to encode the numbers between 0 and 0xff (255) using a natural binary encoding:
(poke) .set obase 2 (poke) 0UB 0b00000000UB (poke) 0xFFUB 0b11111111UB
This is the reason why people say bytes are “composed” of eight
bits, or that the width of a byte is eight bits. But this way of
talking doesn’t really reflect the view that the operating system has
of devices like files or memory buffers: both disk and memory
controllers provide and consume bytes, i.e. little unsigned numbers in
the range 0..255
. At that level, bytes are indivisible. We
will see later that poke provides ways to work on the “sub-byte”
level, but that is just really an artifact to make our life easier:
underneath, all that goes in and out are bytes.
Anyhow, if we were to “concatenate” the binary representation of two
consecutive bytes, we would end with a much bigger range of possible
numbers, in the range
0b00000000_00000000..0b11111111_11111111
3, or 0x0000..0xffff
in hexadecimal. poke provides
a bit-concatenation operator :::
that does exactly that:
(poke) 0x1UB 0b00000001UB (poke) 0x1UB ::: 0x1UB 0b0000000100000001UH
Note how the suffix of the resulting number is now UH
. This
indicates that the number is no longer a byte value: it is too big for
that. The H
in this new suffix means “half”, and it is a
traditional way to call an integer that is encoded using two bytes, or
16 bits.
So, using our method of encoding bigger numbers concatenating bytes, what would be the “half” integer composed of two bytes at the beginning of foo.o?
(poke) .set obase 16 (poke) (byte @ 0#B):::(byte @ 1#B) 0x7f45UH
Now, let’s go back to the syntax we used to map a byte value. In the
invocation of the map operator byte @ 0#B
the operand at the
left (in this case byte
) tells the operator what kind of value
to map. This is called a type specifier; byte
is the
type specifier for a single byte value, and byte[3]
is the type
specifier for a group of three byte values arranged in an array.
As it happens, byte
is a synonym for another slightly more
interesting type specifier: uint<8>
. You can probably infer
the meaning already: a byte is an unsigned integer which is 8 bits
big. We can of course use this alternate specifier in a mapping
operation, achieving exactly the same result than if we were using
byte
:
(poke) uint<8> @ 0#B 0x7fUB
You may be wondering: is it possible to use a similar type specifier for mapping bigger integers, like these “halves” that are composed of two bytes? Yeah, it is indeed possible:
(poke) uint<16> @ 0#B 0x7f45UH
Mapping an unsigned integer of 16-bits at the offset 0 gives us an unsigned “half” value, as expected.
You can easily build bigger and bigger numbers concatenating more and more bytes. Three bytes? sure:
(poke) uint<24> @ 0#B 0x7f454c as uint<24>
Note that in this case poke uses a prefix instead of a suffix to indicate that the given value is 24-bits long. This is because only a limited number of suffixes (which are more concise and more readable than the prefix form) are available, corresponding to common or typical widths.
Four bytes?
(poke) uint<32> @ 0#B 0x7f454c46U
Certain integer widths are so often used that easier-to-type synonyms
for their type specifiers are provided. We already know byte
for uint<8>
. Similarly, ushort
is a synonym for
uint<16>
, uint
is a synonym for uint<32>
and
ulong
is a synonym for uint<64>
. Try them!
GNU poke supports integers up to eight bytes, i.e. up to 64-bits. This may change in the future, as we are planning to support arbitrarily large integers.
poke allows
to insert underscore characters _
anywhere in number literals.
The only purpose of these characters is to improve readability, and
they are totally ignored by poke, i.e. they do not alter the value of
the number.