From Bytes to Characters (GNU poke Manual)

Next: ASCII Strings, Previous: Character Sets, Up: Basic Editing [Contents][Index]

3.17 From Bytes to Characters

Character Literals
Classifying Characters
Non-printable Characters

3.17.1 Character Literals

poke has built-in support for ASCII, and its simple encoding: each ASCII code is encoded using one byte. Let’s see:

(poke) 'a'
0x61UB

We presented poke with the character a, and it answered with its corresponding code in the ASCII character set, which is 0x61. In fact, ’a’ and 0x61UB are just two ways to write exactly the same byte value in poke:

(poke) 'a' == 0x61UB
1
(poke) 'a' + 1
0x62U

In order to make this more explicit, poke provides yet another synonym for the type specifier uint<8>: char.

3.17.2 Classifying Characters

When working with characters it is very useful to have some acquaintance of the ASCII character set, which is designed in a very clever way with the goal of easing certain code calculations. See Table of ASCII Codes for a table of ASCII codes in different numeration bases.

Consider for example the problem of determining whether a byte we map from an IO space is a digit. Looking at the ASCI table, we observe that digits all have consecutive codes, so we can do:

(poke) var b = byte @ 23#B
(poke) b >= '0' && b <= '9'
1

Now that we know that b is a digit, how could we calculate its digit value? If we look at the ASCII table again, we will find that the character codes for digits are not only consecutive: they are also ordered in ascending order 0, 1, … Therefore, we can do:

(poke) b
0x37UB
(poke) b - '0'
7UB

b contains the ASCII code 0x37UB, which corresponds to the character 7, which is a digit.

How would we check whether b is a letter? Looking at the ASCII table, we find that lower-case letters are encoded consecutively, and the same applies to upper-case letters. This leads to repeat the trick again:

(poke) (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z')
0

3.17.3 Non-printable Characters

Not all ASCII code are printable using the glyph that are usually supported in terminals. If you look at the table in Table of ASCII Codes, you will find codes for characters described as “start of text”, “vertical tab”, and so on.

These character codes, which are commonly known as non-printable characters, can be represented in poke using its octal code:

(poke) '\002'
0x2UB

This is of course no different than using 2UB directly, but in some contexts the “character like” notation may be desirable, to stress the fact that the byte value is used as an ASCII character.

Some of the non-printable characters also have alternative notations. This includes new-line and horizontal tab:

(poke) '\n'
0xaUB
(poke) '\t'
0x9UB

These \ constructions in character literals are called escape sequences. See Characters for a complete list of allowed escapes in character literals.