Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX 11.0 - 11i Internationalization Features White Paper > Chapter 2 Encoding Characters

Converting Between Encodings

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Corrected Character Mappings to iconv (1) and iconv (3C) [11.0 patch, 11i v1]

This release contains defect fixes for incorrect character mappings. The corrections concern the Simplified Chinese, Traditional Chinese, Japanese, and Korean characters of HP-UX.

Corrected character converter mappings allow for improved interoperability when sending or receiving converted character data to and from Unicode-aware systems.

Correction for Simplified Chinese

A patch corrects an incorrect character mapping that occurs when converting between hp15CN and Unicode (UCS2)/UTF-8 for Simplified Chinese.

Specifically, the Simplified Chinese character Double Vertical Line mapped incorrectly when converting between hp15CN and UCS2/UTF-8. This character was being mapped to the Parallel To character, which is a different character.

Table 2-19 “Changes in iconv Tables for Simplified Chinese” summarizes the change applied to the iconv tables:

Table 2-19 Changes in iconv Tables for Simplified Chinese

hp15CNIncorrect UCS2Correct UCS2Character Name
0xA1CE 0x2225Parallel To
0xA1AC0x22250x2016Double Vertical Line

 

The hp15CN=ucs2 and ucs2=hp15CN iconv converter tables are affected. These tables are shared by both UCS2 and UTF-8 conversions.

No compatibility problems are anticipated. However, if compatibility concerns arise with regard to persistent data stored either in Unicode (UCS2) or UTF-8 on an HP-UX system, it is possible to generate a simple conversion script to search for each occurrence of an incorrect value in either UCS2 or UTF-8 and convert it to the correct value, based on the mapping in Table 2-20 “Mapping Between Old and New Unicode Characters for Simplified Chinese”.

Table 2-20 Mapping Between Old and New Unicode Characters for Simplified Chinese

Old UCS2UCS2Old UTF-8UTF-8Char Name
0x22250x20160xe288a50xe28096Double Vertical Line

 

Correction for Traditional Chinese

A patch corrects several incorrect character mappings that occur when converting between Big-5/EUC and Unicode (UCS2)/UTF-8 for Traditional Chinese.

In the case of big5 coding to and from UCS2/UTF-8, the Ideographic Space character was absent in the Unicode conversion table mapping:

Table 2-21 Changes in iconv Tables for big5/Unicode

big5Incorrect UCS2Correct UCS2 Char Name
0xA140  0x3000Ideographic Space

 

The following table summarizes the changes applied for conversions between eucTW and UCS2.

Table 2-22 Changes in iconv Tables for eucTW/Unicode

eucTWIncorrect UCS2Correct UCS2Character Name
0xa1a60x30fb0x2022Bullet
0xa1b70x20140x2013EN Dash
0xa1b90x20130x2014EM Dash
0xa1b60xfe310xff5cFullwidth Vertical Line
0xa1b8 0xfe32 0xfe31 Presentation Form Vertical EN Dash
0xa1ea 0x2032 0x2035 Reversed Prime
0xa1eb0x20350x2032Prime
0xa2b90x22640x2266Less-than Over Equal to
0xa2ba0x22650x2267Greater-than Over Equal to
0xa2c20xfe660xfe65Small Greater-than
0xa2c30xfe650xfe66Small Equals Sign
0xa2de0xff5c0x2223Divides
0xa2e10xfe670xff0f Full-width Solidus
0xa2e40xffe50x00a5Yen Sign
0xa2e60xffe00x00a2Cent Sign
0xa2e70xffe10x00a3Pound Sign

 

The iconv conversions between eucTW and UCS2 or UTF-8 may be affected.

Big-5 conversions with UCS2/UTF-8 are not directly impacted since only a missing table entry has been added.

The eucTW=ucs2, ucs2=eucTW, big5=ucs2 and ucs2=big5 iconv converter tables are affected. These tables are shared by both UCS2 and UTF-8 conversions.

No compatibility problems are anticipated. However, if compatibility concerns arise with regard to persistent data stored either in Unicode (UCS2) or UTF-8 on an HP-UX system, it is possible to generate a simple conversion script to search for each occurrence of an incorrect value in either UCS2 or UTF-8 and convert it to the correct value, based on the mappings in Table 2-23 “Mapping Between Old and New Unicode Characters for Traditional Chinese”.

Table 2-23 Mapping Between Old and New Unicode Characters for Traditional Chinese

Old UCS2UCS2Old UTF-8UTF-8Char Name
0x30fb0x20220xe383bb0xe280a2Bullet
0x20140x20130xe280940xe28093EN Dash
0x20130x20140xe280930xe28094EM Dash
0xfe310xff5c0xefb8b10xefbd9cFullwidth Vertical Line
0xfe320xfe310xefb8b20xefb8b1Presentation Form Vertical EN Dash
0x20320x20350xe280b20xe280b5Reversed Prime
0x20350x20320xe280b50xe280b2Prime
0x22640x22660xe289a40xe289a6Less-than Over Equal to
0x22650x22670xe289a50xe289a7Greater-than Over Equal to
0xfe660xfe650xefb9a60xefb9a5Small Greater-than
0xfe650xfe660xefb9a50xefb9a6Small Equals Sign
0xff5c0x22230xefbd9c0xe288a3Divides
0xfe670xff0f0xefb9a70xefbc8fFull-width Solidus
0xffe50x00a50xefbfa50xc2a5Yen Sign
0xffe00x00a20xefbfa00xc2a2Cent Sign
0xffe10x00a30xefbfa10xc2a3Pound Sign

 

Correction for Japanese

A patch corrects four incorrect Japanese character mappings that occur between Shift-JIS/EUC and Unicode (UCS2)/UTF-8.

The following table summarizes the changes applied.

Table 2-24 Changes in iconv Tables for Japanese

sjiseucJPIncorrect UCS2Correct UCS2Character Name
0x81500xA1B10xFFE30x203EOverline
0x815C0xA1BD0x20150x2014Em Dash
0x818F0xA1EF0xFFE50x00A5Yen Sign
N/A0x8FA2B70x02DC0xFF5EFull-width Tilde

 

Affected iconv conversions are conversions between sjis and UCS2 or UTF-8 as well as conversions between eucJP and UCS2 or UTF-8.

The sjis=ucs2, ucs2=sjis, eucJP=ucs2 and ucs2=eucJP iconv conversion tables are affected. These tables are shared by both UCS2 and UTF-8 conversions.

No compatibility problems are anticipated. However, if compatibility concerns arise with regard to persistent data stored either in Unicode (UCS2) or UTF-8 on an HP-UX system, it is possible to generate a simple conversion script to search for each occurrence of an incorrect value in either UCS2 or UTF-8 and convert it to the correct value, based on the mappings in Table 2-25 “Mapping Between Old and New Unicode Characters for Japanese”.

Table 2-25 Mapping Between Old and New Unicode Characters for Japanese

Old UCS2UCS2Old UTF-8UTF-8Char Name
0xFFE30x203E0xefbfa30xe280beOverline
0x20150x20140xe280950xe28094Em Dash
0xFFE50x00A50xefbfa50xc2a5Yen Sign
0x02DC0xFF5E0xcb9c0xefbd9eFull-width Tilde

 

Correction for Korean

A patch provides a defect fix to address standards nonconformance for Korean Unicode (UCS2)/UTF-8 character mappings.

The currently supplied Korean iconv converter tables do not conform to the Unicode 2.1 and ISO 10646 (with 1997 amendments) standards in addition to the Korean national standard, KSC-5700. The current mappings are considered obsolete by all noted standards organizations.

The enhancement provides a set of standards-conformant iconv converter tables for converting between eucKR and Unicode/UTF-8. Specifically, the obsolete region of 0x3d2e-0x4dff has been remapped to the 0xac00-0xd7ff region specified in Unicode 2.1 for Hangul.

Without this modification, it is impossible to share data with any other system that is standards-conformant in adhering to the Unicode 2.1/ISO 10646/KSC-5700 standards.

Affected iconv conversions are any conversions between eucKR and UCS2 or UTF-8.

The iconv conversion tables affected by this modification are eucKR=ucs2 and ucs2=eucKR. These tables are shared by both UCS2 and UTF-8 conversions.

No compatibility problems are anticipated. However, if compatibility concerns arise with regard to persistent data stored either in Unicode (UCS2) or UTF-8 on an HP-UX system, it is recommended that the previously installed ucs2=eucKR table be saved and renamed prior to installation of this fix. Persistent data can then be converted back to eucKR using this old table and then reconverted to the correct Unicode/UTF-8 representation.

Japan Vendor Council iconv Converters [11i v1.6]

New iconv converters have been introduced to allow for greater interoperability of data sharing within Japanese computing environments. The following items are related to this change:

  • New Japanese iconv conversion tables are now available that conform to the Open Group Japanese Vendor Council (TOG/JVC) CDE/Motif Technical working group recommendations that ensure the interoperability of ISO 10646/Unicode/JIS X0221 within Japanese computing environments.

    NOTE: JIS X0221 is the Japanese national standard equivalent to ISO-10646:1993 and Unicode 2.1.

    These tables convert between HP’s EUC and Shift-JIS (SJIS) characters to those specified by the strict JIS X0201 interpretation as well as the Microsoft Unicode and UTF-8 mappings. The JIS X0221 Extended conversion mappings are already supported within HP-UX. These new converters require 0.6 MB of additional space.

    Table 2-26 “New iconv Converters” shows eight new iconv converter tables that are provided to support the alternate mappings in round-trip conversions between EUC and Shift-JIS (SJIS) to /from Unicode and UTF-8.

    Table 2-26 New iconv Converters

    Conversion

    Interpretation

    eucJP0201 <-> ucs2

    JIS X0201
    eucJPMS <-> ucs2Microsoft
    sjis0201 <-> ucs2JIS X0201

    sjisMS <-> ucs2

    Microsoft

     

    Table 2-27 “Mapping Variances Between SJIS and Unicode Mappings” illustrates the mapping variances between converter tables when converting between SJIS and different interpretations of Unicode character mappings.

    NOTE: The JIS X0201 Extended (Unix) mappings are already available on HP-UX, referenced in this table as SJIS.

    Table 2-27 Mapping Variances Between SJIS and Unicode Mappings

    SJIS

    sjis0201 UCS2 mappingSJIS UCS2 mappingsjisMS UCS2 mapping

    0x5C

    0x00A50x005C0x005C

    0x7E

    0x203E0x007E0x007E

    0x8150

    0xFFE30x203E0xFFE3
    0x815F0x005C0xFF3C0xFF3C

    0x818F

    0xFFE50x00A50xFFE5
    0x815C

    0x2014

    0x2014

    0x2015

    0x8160

    0x301C

    0x301C

    0xFF5E

    0x8161

    0x2016

    0x2016

    0x2225

    0x817C

    0x2212

    0x2212

    0xFF0D

    0x8191

    0x00A2

    0x00A2

    0xFFE0

    0x8192

    0x00A3

    0x00A3

    0xFFE1

    0x81CA

    0x00AC

    0x00AC

    0xFFE2

     

    Table 2-28 “Mapping Variances Between EUC and Unicode Mappings” illustrates the mapping variances between converter tables when converting between EUC and different interpretations of Unicode character mappings.

    NOTE: The JIS X0201 Extended (Unix) mappings are already available on HP-UX, referenced in this table as eucJP.

    Table 2-28 Mapping Variances Between EUC and Unicode Mappings

    EUC

    eucJP0201 UCS2 mappingeucJP UCS2 mappingeucJPMS UCS2 mapping

    0x5C

    0x00A50x005C0x005C

    0x7E

    0x203E0x007E0x007E

    0xA1B1

    0xFFE30x203E0xFFE3
    0xA1C00x005C0xFF3C0xFF3C

    0xA1EF

    0xFFE50x00A50xFFE5
    0xA1BD

    0x2014

    0x2014

    0x2015

    0xA1C1

    0x301C

    0x301C

    0xFF5E

    0xA1C2

    0x2016

    0x2016

    0x2225

    0xA1DD

    0x2212

    0x2212

    0xFF0D

    0xA1F1

    0x00A2

    0x00A2

    0xFFE0

    0xA1F2

    0x00A3

    0x00A3

    0xFFE1

    0xA2CC

    0x00AC

    0x00AC

    0xFFE2

    0x8FA2B7

    0x007E

    0xFF5E0xFF5E

    0x8FA2C3

    0x00A60x00A60xFFE4

     

  • The TOG/JVC has formalized three possible converter mappings for several characters. These mappings are:

    1. JIS X0221 based on strict JIS X0201 standard interpretation

    2. JIS X0221 Extended that allows for greater co-existence in current ASCII-based (such as Unix) environments

    3. JIS X0221 based on Microsoft Japanese mappings

iconv Enhancements for Greek Euro Support [11i v1.6]

See “Greek Euro Support [11i v1.6]” for detailed information on the iconv enhancements for Greek Euro Support.

Mainframe iconv Converters for Japanese characters [11.0 patch, 11i v1 patch, 11i v2]

Mainframe iconv converters between ShiftJIS/eucJP/UCS2 and NEC-JIPS/Hitachi-KEIS/Fujitsu-JEF, were introduced at HP-UX 11i v1.0. This release includes several fixes of mapping errors for JIS standard characters.

This release of mainframe iconv conversion tables includes numerous fixes for mapping errors for JIS standard characters in the basic part of those mainframe codesets. The detailed changes are described in MFConvChanges.jips, MFConvChanges.keis and MFConvChanges.jef in the /usr/share/doc directory.

In addition, this release of mainframe iconv conversion methods includes a fix to handle an incomplete shift sequence at the end of an input buffer.

Impact to the Customer

If the customer has already used the HP-UX 11i v1 version of mainframe iconv converters and then uses this version, the results will be different because of fixes in the mappings for JIS standard characters. It is recommended that the previously installed tables be saved and renamed prior to installation of this release. Persistent data can then be converted back using the old table and then reconverted using the new tables to the correct representation.

If the last character in the input buffer could be a valid character OR an incomplete shift sequence, iconv(3C) returns EINVAL. If that character is the final one of the input file, iconv(3C) never returns successfully without appending other dummy data like NULL to that character. That character is 0x1a for jipsj, 0x3f for jipsec/jipsek and 0xa for keis7c/keis7k/keis8c/keis8k which could be a control character OR an incomplete shift sequence.

Compatibility

No compatibility problems are anticipated. However, if compatibility concerns arise with regard to persistent data stored on an HP-UX system, it is recommended that the previously installed tables be saved and renamed prior to installation of this release. Persistent data can then be converted back using this old table and then reconverted to the correct representation using the new tables.

Performance

No performance issues.

Documentation

  • Japanese System Environment User's Manual (in Japanese language)

  • iconv(3C) manpage

Future Plans

The mainframe iconv converters will track future changes in Unicode standard support in HP-UX.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2001-2003, 2005 Hewlett-Packard Development Company, L.P.