When talking about whole numbers (integers) we should distinguish
between their value (such as 123) and their written form that we
would use when writing the number on a piece of paper, such as
123
.
The written form of a number is composed of digits, arranged in
certain order. We all know that the ordering of the digits in the
written form of a number is important: if we write 123
we are
referring to a different value than if we write 321
. The
mathematical reason for this is that depending on the position they
occupy in the written form, each digit contributes with a different
“weight” to the total value of the number. This is always the case,
regardless of the numerical base used to denote the number.
For example, the value of the number 123 (whose written form is
123
) is calculated as 1*10^2+2*10^1+3*10^0
. If we swap
the last two digits in the written form of the number, 132
, we
have 1*10^2+3*10^1+2*10^0
, which results in a different value:
132. When we consider other numerical bases, the bases in the
polynomial change accordingly, but the correspondence between written
form and value stands: for example, the value of 0x123 is calculated
as 1*16^2+2*16^1+3*16^0
.
The “higher” a digit is in the polynomial, the more
significant it is, i.e. the more weight it has on the value of the
number where it appears. In the written number 123
, for
example, the digit 1 is the most significant digit of the
number, and the digit 3 is the least significant digit.
This distinction between the written form of a number and its value is
very important. Just like in certain languages letters are read
right-to-left (Arabic) or even down-to-up (Japanese) we could
certainly conceive a language in which the digits of numbers were
arranged from right-to-left instead of left-to-right. In such a
language the written representation of 123 would be 321
, not
123
. In other words: the least significant digit would come
first, not last, in the written form of the number.
Now when it comes to store numbers in computers, rather than writing them on a paper, the role of the paper is played by the computer’s memory, be it ephemeral (like RAM) or persistent (like a spinning hard disk or a Flash memory), which is organized as a sequence of bytes. Since we are composing numbers with bytes, it makes sense to have each byte to play the role of a digit in the written form of the bigger number. Since bytes can have values from 0 to 255, the base is 256. But what is the “written form” for our byte-composed numbers?
In the last section we tried to compose bigger integers by
concatenating bytes together and interpreting the result. In doing
so, we assumed (quite naturally) that in the written form of the
resulting integer the bytes are ordered in the same order than they
appear in the file, i.e. we assume that the written form of the number
b1*256^2+b2*256^1+b3*256^0
would be b1b2b3
, where
b1
, b2
and b3
are bytes. In other words, given a
written form b1b2b3
, b1
would be the most significant
byte (digit) and b3
would be the least significant byte
(digit). In our world of IO spaces, the “written form” is the
disposition of the bytes in the IO space (file, memory buffer, etc)
being edited.
That interpretation of the written form is exactly what the bit-concatenation operator implements:
(poke) dump :from 0#B :size 3#B 76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF 00000000: 7f45 4c .EL (poke) var b1 = byte @ 0#B (poke) var b2 = byte @ 1#B (poke) var b3 = byte @ 2#B (poke) b1:::b2:::b3 0x7f454c as uint<24>
However, much like in certain human languages the written form is read
from right to left, some computers also read numbers from right to
left in their “written form”. Actually, turns out that most
modern computers do it like that. This means that, in these
computers, given the written form b1b2b3
(i.e. given a file
where b1
comes first, followed by b2
and then b3
)
the most significant byte is b3
and the least significant byte
is b1
. Therefore, the value of the number would be
b3*256^2+b2*256^1+b3*256^0
.
So, given the written form of a bigger number b1b2b3
(i.e. some
ordering of bytes implied by the file they are stored in) there are at
least two ways to interpret them to calculate the value of the number.
When the written form is read from left to right, we talk about a
big endian interpretation. When the written form is read from
right to left, we talk about a little endian interpretation.
Given the first three bytes in foo.o
, we can determine the
value of the integer composed of these three bytes in both
interpretations:
(poke) b1:::b2:::b3 0x7f454c as uint<24> (poke) b3:::b2:::b1 0x4c457f as uint<24>
Remember how the type specifier byte
is just a synonym of
uint<8>
, and how we can use type specifiers like
uint<24>
and uint<32>
to map bigger integers? When we
do that, like in:
(poke) uint<24> @ 0#B 0x7f454c as uint<24>
Poke should somehow decide what kind of interpretation to use,
i.e. how to read the “written form” of the number. As you can see
from the example, poke uses the left-to-right interpretation, or
big-endian, by default. But you can change it using a new
dot-command: .set endian
:
(poke) .set endian little (poke) uint<24> @ 0#B 0x4c457f as uint<24>
The currently used interpretation (also called endianness) is shown if you invoke the dot-command without an argument4:
(poke) .set endian little
Different systems use different endianness. Into a given system, it is to be expected that most files will be encoded following the same conventions. Therefore poke provides you a way to set the endianness to whatever endianness is in the system. You do it this way:
(poke) .set endian host