12.6 Streams in Internationalized Applications

ISO C90 introduced the new type wchar_t to allow handling larger character sets. What was missing was a possibility to output strings of wchar_t directly. One had to convert them into multibyte strings using mbstowcs (there was no mbsrtowcs yet) and then use the normal stream functions. While this is doable it is very cumbersome since performing the conversions is not trivial and greatly increases program complexity and size.

The Unix standard early on (I think in XPG4.2) introduced two additional format specifiers for the printf and scanf families of functions. Printing and reading of single wide characters was made possible using the %C specifier and wide character strings can be handled with %S. These modifiers behave just like %c and %s only that they expect the corresponding argument to have the wide character type and that the wide character and string are transformed into/from multibyte strings before being used.

This was a beginning but it is still not good enough. Not always is it desirable to use printf and scanf. The other, smaller and faster functions cannot handle wide characters. Second, it is not possible to have a format string for printf and scanf consisting of wide characters. The result is that format strings would have to be generated if they have to contain non-basic characters.

In the Amendment 1 to ISO C90 a whole new set of functions was added to solve the problem. Most of the stream functions got a counterpart which take a wide character or wide character string instead of a character or string respectively. The new functions operate on the same streams (like stdout). This is different from the model of the C++ runtime library where separate streams for wide and normal I/O are used.

Being able to use the same stream for wide and normal operations comes with a restriction: a stream can be used either for wide operations or for normal operations. Once it is decided there is no way back. Only a call to freopen or freopen64 can reset the orientation. The orientation can be decided in three ways:

It is important to never mix the use of wide and not wide operations on a stream. There are no diagnostics issued. The application behavior will simply be strange or the application will simply crash. The fwide function can help avoid this.

Function: int fwide (FILE *stream, int mode)

Preliminary: | MT-Safe | AS-Unsafe corrupt | AC-Unsafe lock | See POSIX Safety Concepts.

The fwide function can be used to set and query the state of the orientation of the stream stream. If the mode parameter has a positive value the streams get wide oriented, for negative values narrow oriented. It is not possible to overwrite previous orientations with fwide. I.e., if the stream stream was already oriented before the call nothing is done.

If mode is zero the current orientation state is queried and nothing is changed.

The fwide function returns a negative value, zero, or a positive value if the stream is narrow, not at all, or wide oriented respectively.

This function was introduced in Amendment 1 to ISO C90 and is declared in wchar.h.

It is generally a good idea to orient a stream as early as possible. This can prevent surprise especially for the standard streams stdin, stdout, and stderr. If some library function in some situations uses one of these streams and this use orients the stream in a different way the rest of the application expects it one might end up with hard to reproduce errors. Remember that no errors are signal if the streams are used incorrectly. Leaving a stream unoriented after creation is normally only necessary for library functions which create streams which can be used in different contexts.

When writing code which uses streams and which can be used in different contexts it is important to query the orientation of the stream before using it (unless the rules of the library interface demand a specific orientation). The following little, silly function illustrates this.

void
print_f (FILE *fp)
{
  if (fwide (fp, 0) > 0)
    /* Positive return value means wide orientation.  */
    fputwc (L'f', fp);
  else
    fputc ('f', fp);
}

Note that in this case the function print_f decides about the orientation of the stream if it was unoriented before (will not happen if the advice above is followed).

The encoding used for the wchar_t values is unspecified and the user must not make any assumptions about it. For I/O of wchar_t values this means that it is impossible to write these values directly to the stream. This is not what follows from the ISO C locale model either. What happens instead is that the bytes read from or written to the underlying media are first converted into the internal encoding chosen by the implementation for wchar_t. The external encoding is determined by the LC_CTYPE category of the current locale or by the ‘ccs’ part of the mode specification given to fopen, fopen64, freopen, or freopen64. How and when the conversion happens is unspecified and it happens invisibly to the user.

Since a stream is created in the unoriented state it has at that point no conversion associated with it. The conversion which will be used is determined by the LC_CTYPE category selected at the time the stream is oriented. If the locales are changed at the runtime this might produce surprising results unless one pays attention. This is just another good reason to orient the stream explicitly as soon as possible, perhaps with a call to fwide.