| United States-English |
|
|
|
![]() |
Common Desktop Environment: Internationalization Programmer's Guide > Chapter 3 Internationalization
and Distributed NetworksInterchange Concepts |
|
This section describes the way 8-bit user names and 8-bit data can be communicated on a network for communications utilities, such as ftp, mail, or interclient communication between the desktop clients. There are three primary considerations for communicating data:
If the remote host uses the same code set as the local host, the following is true:
If the remote host's code set is different from that of the local host, the following two cases may apply. The conversion needed is dependent on the specific protocol used.
In a network environment, the code sets of the communicating systems and the protocols of communication determine the transformation of user-specified data so that it can be sent to the remote system in a meaningful way. The user data (not user names) may need to be transformed from the sender's code set to the receiver's code set, or 8-bit data may need to be transformed into a 7-bit form to conform to protocols. A uniform interface is needed to accomplish this. In the following examples, using the iconv interface is illustrated by explaining how to use iconv_open(), iconv(), and iconv_close(). To do the conversion, iconv_open() must be followed by iconv(). The terms 7-bit interchange and 8-bit interchange are used to refer to any interchange encoding used for 7-bit and 8-bit data, respectively.
The locale_codeset refers to the code set being used locally by the application. Note that while the nl_langinfo(CODESET) function may be used to obtain the code set associated with the current locale, it is implementation-dependent whether any conversion names match the return from the nl_langinfo(CODESET) function. Table 3-1 outlines how iconv can be used to perform conversions for various conditions. Specific protocols may dictate other conversions needed. Table 3-1 Using iconv to Perform Conversions
Code sets can be classified into two categories: stateful encodings and stateless encodings. Stateful encoding uses sequences of control codes, such as shift-in/shift-out, to change character sets associated with specific code values. For instance, under compound text, the control sequence "ESC$(B" can be used to indicate the start of Japanese 16-bit data in a data stream of characters, and "ESC(B" can be used to indicate the end of this double-byte character data and the start of 8-bit ASCII data. Under this stateful encoding, the bit value 0x43 could not be interpreted without knowing the shift state. The EBCDIC Asian code sets use shift-in/shift-out controls to swap between double- and single- byte encodings, respectively. Converters that are written to do the conversion of stateful encodings to other code sets tend to be a little complex due to the extra processing needed. Stateless code sets are those that can be classified as one of two types:
The term multibyte code sets is also used to refer to any code set that needs one or more bytes to encode a character; multibyte code sets are considered stateless.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||