The only reasonable way to translate all the messages of a function and
store the result in a message catalog file which can be read by the
catopen
function is to write all the message text to the
translator and let her/him translate them all. I.e., we must have a
file with entries which associate the set/message tuple with a specific
translation. This file format is specified in the X/Open standard and
is as follows:
$
followed by a whitespace character are comment and are also ignored.
$set
followed by a whitespace character an additional argument
is required to follow. This argument can either be:
How to use the symbolic names is explained in section How to use the catgets
interface.
It is an error if a symbol name appears more than once. All following messages are placed in a set with this number.
$delset
followed by a whitespace character an additional argument
is required to follow. This argument can either be:
In both cases all messages in the specified set will be removed. They
will not appear in the output. But if this set is later again selected
with a $set
command again messages could be added and these
messages will appear in the output.
$quote
, the quoting character used for this input file is
changed to the first non-whitespace character following
$quote
. If no non-whitespace character is present before the
line ends quoting is disabled.
By default no quoting character is used. In this mode strings are
terminated with the first unescaped line break. If there is a
$quote
sequence present newline need not be escaped. Instead a
string is terminated with the first unescaped appearance of the quote
character.
A common usage of this feature would be to set the quote character to
"
. Then any appearance of the "
in the strings must
be escaped using the backslash (i.e., \"
must be written).
If the start of the line is a number the message number is obvious. It is an error if the same message number already appeared for this set.
If the leading token was an identifier the message number gets
automatically assigned. The value is the current maximum message
number for this set plus one. It is an error if the identifier was
already used for a message in this set. It is OK to reuse the
identifier for a message in another thread. How to use the symbolic
identifiers will be explained below (see How to use the catgets
interface). There is
one limitation with the identifier: it must not be Set
. The
reason will be explained below.
The text of the messages can contain escape characters. The usual bunch
of characters known from the ISO C language are recognized
(\n
, \t
, \v
, \b
, \r
, \f
,
\\
, and \nnn
, where nnn is the octal coding of
a character code).
Important: The handling of identifiers instead of numbers for the set and messages is a GNU extension. Systems strictly following the X/Open specification do not have this feature. An example for a message catalog file is this:
$ This is a leading comment. $quote " $set SetOne 1 Message with ID 1. two " Message with ID \"two\", which gets the value 2 assigned" $set SetTwo $ Since the last set got the number 1 assigned this set has number 2. 4000 "The numbers can be arbitrary, they need not start at one."
This small example shows various aspects:
$
followed by
a whitespace.
"
. Otherwise the quotes in the
message definition would have to be omitted and in this case the
message with the identifier two
would lose its leading whitespace.
While this file format is pretty easy it is not the best possible for
use in a running program. The catopen
function would have to
parse the file and handle syntactic errors gracefully. This is not so
easy and the whole process is pretty slow. Therefore the catgets
functions expect the data in another more compact and ready-to-use file
format. There is a special program gencat
which is explained in
detail in the next section.
Files in this other format are not human readable. To be easy to use by programs it is a binary file. But the format is byte order independent so translation files can be shared by systems of arbitrary architecture (as long as they use the GNU C Library).
Details about the binary file format are not important to know since
these files are always created by the gencat
program. The
sources of the GNU C Library also provide the sources for the
gencat
program and so the interested reader can look through
these source files to learn about the file format.