Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX 11.0 - 11i Internationalization Features White Paper > Chapter 2 Encoding Characters

Unicode 2.1 Support [11.0 patch, 11i v1]

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

HP-UX provides system-level support for the Unicode 2.1/ISO 10646 character set. Hewlett-Packard’s support for Unicode provides the basis for enabling heterogeneous interoperability for all locales.

ISO 10646 is an industry standard for defining a single encoding that uniquely encodes all the world’s characters. Unicode 2.1 is the companion specification to ISO 10646. Unicode support conforms with existing X/Open (OpenGroup), POSIX, ISO C and other relevant UNIX-based standards.

HP-UX 11.0 supports Unicode/ISO 10646 by using the UTF-8 (Universal Transformation Format - 8) representation for persistent storage. UTF-8 is an industry-recognized 8-bit multibyte format representation for Unicode. This representation allows for successful data transmission over 8-bit networking protocols as well as safe storage and retrieval within a historically byte-oriented operating system such as HP-UX.

For internal processing, HP-UX uses the four-octet (32-bit) canonical form specified in ISO 10646. This support allows parity with current HP-UX wchar_t implementation, that has been based on a 32-bit representation.

Full systems level support is available for all locales provided in the release.

For more information on the Unicode features of the Asian System Environment, refer to the /usr/share/doc/ASX-UTF8 directory.

The following tables display a select subset of locale binaries that are provided for 32-bit application processing:

Table 2-13 Base utf8 Locales for 32-bit Application Processing

Locale

 
C.utf8C UTF-8
univ.utf8universal

 

Table 2-14 European utf8 Locales for 32-bit Application Processing

Locale

Language (Region)
fr_CA.utf8French (Canada)
fr_FR.utf8French (France)
de_DE.utf8German (Germany)
it_IT.utf8Italian (Italy)
es_ES.utf8Spanish (Spain)
sv_SE.utf8Swedish (Sweden)

 

Table 2-15 Asian utf8 Locales for 32-bit Application Processing

Locale

Language (Region)
ja_JP.utf8Japanese (Japan)
ko_KR.utf8Korean (Korea)
zh_CN.utf8Simplified Chinese (China)
zh_HK.utf8Traditional Chinese (Hong Kong)
zh_TW.utf8Traditional Chinese (Taiwan)

 

To enable Unicode support in applications, set the environment variable to a desired utf8 locale.

Locales are installed based on the current language filesets already installed on the target system. For example, if the system uses the International German, the German Unicode locale (de_DE.utf8) is installed.

Source files for ALL supported locales (34 total) are also supplied for 64- or 32-bit applications.

To build Unicode locales, use the localedef command. Refer to the localedef(1M) man page. Systems must have the kernel parameters MAXDSIZ, MAXTSIZ, and SHMMAX set to at least 100 MB to ensure adequate swap space allowance for a successful localedef compilation of these locales.

Unicode Euro Enhancement

This release provides expanded Unicode support to align the character repertoire with the ISO 8859-15 locales that are being provided for euro support. This support ensures full interoperability with the newly added support for the ISO 8859-15 codeset.

Specific enhancements are provided to allow euro display and input capabilities though Xlib and new fonts.

Size Requirement

Unicode support requires additional disk space depending on the language used.

The following tables provide the size requirements for specific languages. The base Unicode offering installed on all systems is approximately 10 MB.

Table 2-16 Unicode European Locales and Localized Files

Language

Size
French & French Canadian8.4 MB
German4.2 MB
Italian4.2 MB
Spanish4.2 MB
Swedish4.2 MB

 

Table 2-17 Unicode Asian Locales and Localized Files

Language (Region)

Size
Japanese (Japan)3.4 MB
Korean (Korea)2.4 MB
Simplified Chinese (China)2.5 MB
Traditional Chinese (Hong Kong)1.7 MB
Traditional Chinese (Taiwan)4.2 MB

 

Performance

Applications using Unicode support should see performance comparable to that of other multibyte codesets. For those applications moving from a single-byte codeset to Unicode, some performance impact will be observed for some types of character-based operations.

Streams PTY Driver [11i v1]

UTF-8 is supported on the Streams PTY driver line discipline (ldterm) module. The user does not interact with the Streams PTY driver directly; it runs underneath the dtterm window. The Streams PTY driver is responsible for providing a UTF-8 communication channel while dtterm is responsible for processing the UTF-8 code and displaying the characters on the screen.

Refer to the eucset (1), ldterm (7), and lp (1) model script for details.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2001-2003, 2005 Hewlett-Packard Development Company, L.P.