As handy as the .set endian
dot-command may be, it is also
important to be able to change the current endianness programmatically
from a Poke program. For that purpose, the PKL compiler provides a
couple of built-in functions: get_endian
and set_endian
.
Their definitions, along with the specific supported values, look like:
var ENDIAN_LITTLE = 0; var ENDIAN_BIG = 1; fun get_endian = int: { ... } fun set_endian = (int endian) int: { ... }
Accessing the current endianness programmatically is especially useful in situations where the data being poked features a different structure, depending on the endianness.
A good (or bad) example of this is the way registers are encoded in eBPF instructions. eBPF is the in-kernel virtual machine of Linux, and features an ISA with ten general-purpose registers. eBPF instructions generally use two registers, namely the source register and the destination register. Each register is encoded using 4 bits, and the fields encoding registers are consecutive in the instructions.
Typical. However, for reasons we won’t be discussing here the order of the source and destination register fields is switched depending on the endianness.
In big-endian systems the order is:
dst:4 src:4
Whereas in little-endian systems the order is:
src:4 dst:4
In Poke, the obvious way of representing data whose structure depends on some condition is using an union. In this case, it could read like this:
type BPF_Insn_Regs = union { struct { BPF_Reg src; BPF_Reg dst; } le : get_endian == ENDIAN_LITTLE; struct { BPF_Reg dst; BPF_Reg src; } be; };
Note the call to the get_endian
function (which takes no
arguments and thus can be called Algol68-style, without specifying an
empty argument list) in the constraint of the union alternative. This
way, the register fields will have the right order corresponding to
the current endianness.
Nifty. However, there is an ever better way to denote the structure of these fields. This is it:
type BPF_Insn_Regs = struct { var little_p = (get_endian == ENDIAN_LITTLE); BPF_Reg src @ !little_p * 4#b; BPF_Reg dst @ little_p * 4#b; };
This version, where the ordering of the fields is implemented using
field labels, is not only more compact, but also has the virtue of not
requiring additional “intermediate” fields like le
and
be
above. It also shows how convenient can be to declare
variables inside structs.
Let’s see it in action:
(poke) BPF_Insn_Regs @ 1#B BPF_Insn_Regs {src=#<%r4>,dst=#<%r5>} (poke) .set endian big (poke) BPF_Insn_Regs @ 1#B BPF_Insn_Regs {src=#<%r5>,dst=#<%r4>}
Changing the current endianness in constraint expressions is useful
when dealing with binary formats that specify the endianness of the
data that follows using some sort of tag. This is the case of ELF,
for example.
The first few bytes in an ELF header conform what is known as the
e_ident
. One of these bytes is called ei_data
and its
value specifies the endianness of the data stored in the ELF file.
This is how we handle this in Poke:
fun elf_endian = (int endian) byte: { if (endian == ENDIAN_LITTLE) return ELFDATA2LSB; else return ELFDAT2MSB; } [...] type Elf64_Ehdr = struct { struct { byte[4] ei_mag : ei_mag[0] == 0x7fUB && ei_mag[1] == 'E' && ei_mag[2] == 'L' && ei_mag[3] == 'F'; byte ei_class; byte ei_data : (ei_data != ELFDATANONE && set_endian (elf_endian (ei_data))); byte ei_version; byte ei_osabi; byte ei_abiversion; byte[6] ei_pad; offset<byte,B> ei_nident; } e_ident; [...] };
Note how set_endian
returns an integer value… it is always
1
. This is to facilitate its usage in field constraint
expressions.