Authors: Tom Bishop (firstname.lastname@example.org) and Richard Cook (email@example.com).
Updated August 26, 2007
Please Note: This is not a finalized specification. It is still at the "draft proposal" stage and may change.
The name of this UTF (UCS Transformation Format) is "UTF-G-32". UTF-G-32 extends UTF-32 to support over two billion characters, with code points up to U+7FFFFFFF.
UTF-G-32 is one of the encodings defined as part of UCS-G, which also includes similar extensions for UTF-8 and UTF-16. For general information about UCS-G, please see the UCS-G Specification.
UTF-G-32 is identical to the original UCS-4 encoding. The only thing that is new about it is the name. Unlike UTF-32, it explicitly supports code points greater than U+10FFFF. (If the reader is already familiar with UCS-4, the remainder of this UTF-G-32 specification is probably superfluous.)
UTF-G-32 preserves and extends useful properties of UTF-32 and UCS-4. For code points less than or equal to U+10FFFF, it is identical to UTF-32. It is always identical to the original UCS-4 encoding.
UTF-G-32 employs thirty-two-bit code units. All codes are one unit in length.
A simple binary comparison of UTF-G-32 codes yields the same sort-order as a numerical comparison of code points.
A code is a single unit which simply contains the USV.
U+0041 = 00000041 (the code for the letter 'A') U+10FFFF = 0010FFFF (the last UTF-32 code) U+110000 = 00110000 U+12345678 = 12345678 U+7FFFFFFF = 7FFFFFFF (the last original UCS-4 code and the last UTF-G-32 code)
To the UCS-G Specification
UCS-X Home Page