HELLO·Android
系统源代码
IT资讯
技术文章
我的收藏
注册
登录
-
我收藏的文章
创建代码块
我的代码块
我的账号
Gingerbread
|
2.3.2_r1
下载
查看原文件
收藏
根目录
external
chromium
third_party
icu
patches
converters.patch.txt
--- source/data/mappings/ucmlocal.mk 1969-12-31 16:00:00.000000000 -0800 +++ source/data/mappings/ucmlocal.mk 2009-12-02 13:12:20.156521000 -0800 @@ -0,0 +1,58 @@ +# Note: A number of encodings are handled with purely algorithmic converters, +# without any mapping tables: +# US-ASCII, ISO 8859-1, UTF-7/8/16/32, SCSU + +# Listed here: + +# * ISO 8859-2..9,10,13,14,15,16 +# - 8859-11 table is not included. It's rather treated as a synonym of +# Windows-874 +# * Windows-125[0-8] +# * Simplified Chinese : GBK(Windows cp936), GB 18030 +# - GB2312 table was removed and 4 aliases for GB2312 were added +# to GBK in convrtrs.txt to treat GB2312 as a synonym of GBK. +# - GB-HZ is supported now that it uses the GBK table. +# * Traditional Chinese : Big5 (Windows cp950), Big5HKSCS (no PUA) +# * Japanese : SJIS (Windows cp932), EUC-JP (ibm-954_P101-2007) +# * Korean : Windows-949 +# - EUC-KR table was removed. It's different from Windows-949, but +# in practice EUC-KR and Windows-949 are treated synonymously. +# - ISO-2022-KR is now supported by with a one-line change +# in common/ucnv2022.c and other changes in convrtrs.txt to make it +# use the windows-949 table. +# * Thai : Windows-874 +# - TIS-620 and ISO-8859-11 are treated as synonyms of Windows-874 +# although they're not the same. +# * Mac encodings : MacRoman, MacCyrillic +# * Cyrillic : KOI8-R, KOI8-U +# * WebKit wants PC-Arabic (DOS 720 = IBM-864) +# * Three 'fake' tables to prevent Webkit from falling back to the default +# encoding when coming across ISO-2022-CN-(Ext). +# +# * Missing +# - Armenian, Georgian : extremly rare +# - Mac encodings (other than Roman and Cyrillic) : extremly rare + +UCM_SOURCE_FILES= + +UCM_SOURCE_CORE=ibm-912_P100-1995.ucm ibm-913_P100-2000.ucm\ +ibm-914_P100-1995.ucm ibm-915_P100-1995.ucm\ +ibm-1089_P100-1995.ucm ibm-9005_X110-2007.ucm\ +ibm-5012_P100-1999.ucm ibm-920_P100-1995.ucm\ +iso-8859_10-1998.ucm\ +ibm-921_P100-1995.ucm iso-8859_14-1998.ucm ibm-923_P100-1998.ucm\ +iso-8859_16-2001.ucm\ +ibm-5346_P100-1998.ucm ibm-5347_P100-1998.ucm ibm-5348_P100-1997.ucm\ +ibm-5349_P100-1998.ucm ibm-5350_P100-1998.ucm ibm-9447_P100-2002.ucm\ +ibm-9448_X100-2005.ucm ibm-9449_P100-2002.ucm ibm-5354_P100-1998.ucm\ +windows-936-2000.ucm gb18030.ucm\ +windows-950-2000.ucm ibm-1375_P100-2007.ucm\ +ibm-943_P15A-2003.ucm google-euc_jp_mod.ucm\ +windows-949-2000.ucm\ +windows-874-2000.ucm ibm-874_P100-1995.ucm\ +macos-0_2-10.2.ucm macos-7_3-10.2.ucm\ +ibm-878_P100-1996.ucm ibm-1168_P100-2002.ucm\ +ibm-864_X110-1999.ucm\ +noop-cns-11643.ucm\ +noop-gb2312_gl.ucm\ +noop-iso-ir-165.ucm --- source/data/mappings/convrtrs.txt 2009-08-04 10:53:44.000000000 -0700 +++ source/data/mappings/convrtrs.txt 2009-08-27 09:33:30.822570000 -0700 @@ -345,7 +345,7 @@ ibm-367 { IBM* } IBM367 { IANA WINDOWS } # This is not truely ibm-367 because it's missing the fallbacks. # GB 18030 is partly algorithmic, using the MBCS converter -gb18030 { IANA* } ibm-1392 { IBM* } windows-54936 { WINDOWS* } GB18030 { MIME* } +gb18030 { IANA* } ibm-1392 { IBM* } windows-54936 { WINDOWS* } gb18030 { MIME* } # Table-based interchange codepages @@ -482,15 +482,16 @@ 916 { JAVA } # Turkish +# CHROME: ISO-8859-9 and its aliases are moved to windows-1254 per +# HTML5. ibm-920_P100-1995 { UTR22* } - ibm-920 { IBM* JAVA } - ISO-8859-9 { MIME* IANA WINDOWS JAVA* } - latin5 { IANA WINDOWS JAVA } - csISOLatin5 { IANA JAVA } - iso-ir-148 { IANA WINDOWS JAVA } - ISO_8859-9:1989 { IANA* WINDOWS } - l5 { IANA WINDOWS JAVA } - 8859_9 { JAVA } + ibm-920 { IBM* JAVA* } + ISO-8859-9 + latin5 + csISOLatin5 + iso-ir-148 + ISO_8859-9:1989 + l5 cp920 { JAVA } 920 { JAVA } windows-28599 { WINDOWS* } @@ -588,10 +589,6 @@ ibm-33722_P12A_P12A-2004_U2 { UTR22* } ibm-33722 # Leave untagged because this isn't the default ibm-5050 # Leave untagged because this isn't the default, and yes this alias is correct - EUC-JP { IANA MIME* WINDOWS } - Extended_UNIX_Code_Packed_Format_for_Japanese { IANA* WINDOWS } - csEUCPkdFmtJapanese { IANA WINDOWS } - X-EUC-JP { WINDOWS } # Japan EUC. x-euc-jp is a MIME name windows-51932 { WINDOWS* } ibm-33722_VPUA IBM-eucJP @@ -604,14 +601,17 @@ # ibm-954 seems to be almost a superset of ibm-33722 and ibm-1350 # ibm-1350 seems to be almost a superset of ibm-33722 # ibm-954 contains more PUA characters than the others. +# CHROME : Instead of ibm-33722_P*, we use our own EUC-JP converter +# to match IE7 and Mozilla more closely. +google-euc_jp_mod { UTR22* } # a modified version of EUC-JP that prefers 2-byte code points when converting from Unicode while recognizing both 2-byte and 3-byte code points when converting to Unicode. + EUC-JP { MIME* IANA JAVA* WINDOWS* } + Extended_UNIX_Code_Packed_Format_for_Japanese { IANA* JAVA WINDOWS } + csEUCPkdFmtJapanese { IANA JAVA WINDOWS } + X-EUC-JP { MIME JAVA WINDOWS } # Japan EUC. x-euc-jp is a MIME name + eucjis {JAVA} + ujis # Linux sometimes uses this name. This is an unfortunate generic and rarely used name. Its use is discouraged. ibm-954_P101-2007 { UTR22* } ibm-954 { IBM* } - EUC-JP { JAVA* } # Matches more closely with ibm-1350 - Extended_UNIX_Code_Packed_Format_for_Japanese { JAVA } - csEUCPkdFmtJapanese { JAVA } - X-EUC-JP { JAVA } # Japan EUC. x-euc-jp is a MIME name - eucjis { JAVA } - ujis # Linux sometimes uses this name. This is an unfortunate generic and rarely used name. Its use is discouraged. # eucJP # This is closest to Solaris EUC-JP. # Here are various interpretations and extentions of Big5 @@ -645,33 +645,40 @@ ibm-1386_P100-2001 { UTR22* } ibm-1386 { IBM* } cp1386 - windows-936 # Alternate mapping. Leave untagged. This is the IBM interpretation of a Windows codepage. + #windows-936 # Alternate mapping. Leave untagged. This is the IBM interpretation of a Windows codepage. ibm-1386_VSUB_VPUA +# CHROME: Added 4 GB2312 aliases and EUC-CN to Windows-936 to reflect the +# reality of the web (GB2312 is treated synonymously with its +# superset, Windows-936/GBK) +# All the aliases listed for this converter (windows-936-2000) +# are removed from the list of aliases for other simplified Chinese +# converters above. windows-936-2000 { UTR22* } - GBK { IANA* WINDOWS JAVA* } + GB2312 { IANA* MIME* } + windows-936 { IANA WINDOWS* JAVA } + GBK { WINDOWS JAVA* } CP936 { IANA JAVA } MS936 { IANA } # In JDK 1.5, this goes to x-mswin-936. This is an IANA name split. - windows-936 { IANA WINDOWS* JAVA } + chinese { IANA } + iso-ir-58 { IANA } + gb2312-1980 + EUC-CN + csGB2312 { IANA } + GB_2312-80 { IANA } -# Java has two different tables for ibm-1383 and gb2312. We pick closest set for tagging. +# Java has two different tables for ibm-1383 and gb2312. We pick closest set for tagging. ibm-1383_P110-1999 { UTR22* } # China EUC. ibm-1383 { IBM* JAVA } - GB2312 { IANA* MIME* } - csGB2312 { IANA } cp1383 { JAVA* } 1383 { JAVA } - EUC-CN # According to other platforms, windows-20936 looks more like euc-cn. x-euc-cn is also a MIME name + #EUC-CN # According to other platforms, windows-20936 looks more like euc-cn. x-euc-cn is also a MIME name ibm-eucCN hp15CN # From HP-UX? ibm-1383_VPUA # gb # This is not an IANA name. gb in IANA means Great Britain. ibm-5478_P100-1995 { UTR22* } ibm-5478 { IBM* } # This gb_2312_80 DBCS mapping is needed by iso-2022. - GB_2312-80 { IANA* } # Windows maps this alias incorrectly - chinese { IANA } - iso-ir-58 { IANA } - csISO58GB231280 { IANA } - gb2312-1980 + csISO58GB231280 { IANA* } GB2312.1980-0 # From X11R6 ibm-964_P110-1999 { UTR22* } # Taiwan EUC. x-euc-tw is a MIME name @@ -720,13 +727,8 @@ # Java has both ibm-970 and EUC-KR as separate converters. ibm-970_P110_P110-2006_U2 { UTR22* } ibm-970 { IBM* JAVA } - EUC-KR { IANA* MIME* WINDOWS JAVA } - KS_C_5601-1987 { JAVA } windows-51949 { WINDOWS* } - csEUCKR { IANA WINDOWS } # x-euc-kr is also a MIME name ibm-eucKR { JAVA } - KSC_5601 { JAVA } # Needed by iso-2022 - 5601 { JAVA } cp970 { JAVA* } 970 { JAVA } ibm-970_VPUA @@ -738,16 +740,16 @@ # ibm-1363 is almost a superset of ibm-970. ibm-1363_P11B-1998 { UTR22* } ibm-1363 # Leave untagged because this isn't the default - KS_C_5601-1987 { IANA* } - KS_C_5601-1989 { IANA } - KSC_5601 { IANA } - csKSC56011987 { IANA } - korean { IANA } - iso-ir-149 { IANA } + #KS_C_5601-1987 { IANA* } + #KS_C_5601-1989 { IANA } + #KSC_5601 { IANA } + #csKSC56011987 { IANA } + #korean { IANA } + #iso-ir-149 { IANA } cp1363 { MIME* } - 5601 - ksc - windows-949 # Alternate mapping. Leave untagged. This is the IBM interpretation of a Windows codepage. + #5601 + #ksc + #windows-949 # Alternate mapping. Leave untagged. This is the IBM interpretation of a Windows codepage. ibm-1363_VSUB_VPUA # ks_x_1001:1992 # ksc5601-1992 @@ -756,27 +758,41 @@ ibm-1363 { IBM* } ibm-1363_VASCII_VSUB_VPUA +#CHROME: Windows-949 is NOT EUC-KR, but a superset of EUC-KR with 8,822 +# additional Hangul syllables. However, the reality of the web +# dictates that we make a compromise and make EUC-KR a synonym of +# windows-949. +# All the aliases listed for this converter (windows-949-2000) +# are removed from the list of aliases for other Korean converters +# above. windows-949-2000 { UTR22* } - windows-949 { JAVA* WINDOWS* } - KS_C_5601-1987 { WINDOWS } - KS_C_5601-1989 { WINDOWS } - KSC_5601 { MIME WINDOWS } # Needed by iso-2022 + EUC-KR { IANA* MIME* WINDOWS } + windows-949 { JAVA* WINDOWS } + KS_C_5601-1987 { WINDOWS* IANA } + KS_C_5601-1989 { WINDOWS IANA } + KSC_5601 { IANA WINDOWS } # Needed by iso-2022 csKSC56011987 { WINDOWS } - korean { WINDOWS } - iso-ir-149 { WINDOWS } + korean { IANA WINDOWS } + iso-ir-149 { IANA WINDOWS } ms949 { JAVA } + csEUCKR { IANA WINDOWS } + 5601 + x-windows-949 # Mozilla + x-UHC # Mozilla (Unified Hangul Code) +#CHROME: TIS-620, ISO-8859-11 and Windows-874 are slightly different from +# each other, but they're used as if they're identical on the web. windows-874-2000 { UTR22* } # Thai (w/ euro update) - TIS-620 { WINDOWS } - windows-874 { JAVA* WINDOWS* } + TIS-620 { IANA* WINDOWS MIME* } + windows-874 { JAVA* WINDOWS* MIME } MS874 { JAVA } - # iso-8859-11 { WINDOWS } # iso-8859-11 is similar to TIS-620. ibm-13162 is a closer match. + iso-8859-11 { IANA WINDOWS MIME } # iso-8859-11 is similar to TIS-620. ibm-13162 is a closer match. ibm-874_P100-1995 { UTR22* } # Thai PC (w/o euro update). ibm-874 { IBM* JAVA } ibm-9066 { IBM } # Yes ibm-874 == ibm-9066. ibm-1161 has the euro update. cp874 { JAVA* } - TIS-620 { IANA* JAVA } # This is actually separate from ibm-874, which is similar to this table + #TIS-620 { IANA* JAVA } # This is actually separate from ibm-874, which is similar to this table tis620.2533 { JAVA } # This is actually separate from ibm-874, which is similar to this table eucTH # eucTH is an unusual alias from Solaris. eucTH has fewer mappings than TIS620 @@ -820,7 +836,16 @@ ibm-5347_P100-1998 { UTR22* } ibm-5347 { IBM* } windows-1251 { IANA* JAVA* WINDOWS* } cp1251 { WINDOWS JAVA } ANSI1251 # Windows Cyrillic (w/ euro update). ANSI1251 is from Solaris ibm-5348_P100-1997 { UTR22* } ibm-5348 { IBM* } windows-1252 { IANA* JAVA* WINDOWS* } cp1252 { JAVA } # Windows Latin1 (w/ euro update) ibm-5349_P100-1998 { UTR22* } ibm-5349 { IBM* } windows-1253 { IANA* JAVA* WINDOWS* } cp1253 { JAVA } # Windows Greek (w/ euro update) -ibm-5350_P100-1998 { UTR22* } ibm-5350 { IBM* } windows-1254 { IANA* JAVA* WINDOWS* } cp1254 { JAVA } # Windows Turkish (w/ euro update) +#CHROME : Make ISO-8859-9 an alias to windows-1254 per HTML5. Move +#other IANA aliases for ISO-8859-9 as well. +ibm-5350_P100-1998 { UTR22* } ibm-5350 { IBM* } windows-1254 { MIME* IANA* JAVA* WINDOWS* } cp1254 { JAVA } # Windows Turkish (w/ euro update) + ISO-8859-9 { MIME } + latin5 { IANA } + csISOLatin5 { IANA } + iso-ir-148 { IANA } + ISO_8859-9:1989 { IANA } + l5 { IANA } + 8859_9 { JAVA } ibm-9447_P100-2002 { UTR22* } ibm-9447 { IBM* } windows-1255 { IANA* JAVA* WINDOWS* } cp1255 { JAVA } # Windows Hebrew (w/ euro update) ibm-9448_X100-2005 { UTR22* } ibm-9448 { IBM* } windows-1256 { IANA* JAVA* WINDOWS* } cp1256 { WINDOWS JAVA } # Windows Arabic (w/ euro update) ibm-9449_P100-2002 { UTR22* } ibm-9449 { IBM* } windows-1257 { IANA* JAVA* WINDOWS* } cp1257 { JAVA } # Windows Baltic (w/ euro update) --- source/data/mappings/windows-932-2000.ucm 1969-12-31 16:00:00.000000000 -0800 +++ source/data/mappings/windows-932-2000.ucm 2009-08-05 13:21:17.750080000 -0700 @@ -0,0 +1,9932 @@ +# *************************************************************************** +# * +# * Copyright (C) 2001-2002, International Business Machines +# * Corporation and others. All Rights Reserved. +# * +# *************************************************************************** +# +# File created on Dec 03 13:49 Pacific Standard Time 2002 +# +# File created by genmucm tool. +# from windows 2000 using IMultiLanguage 5.50.4522.1800 +# +# Table Version : 1.0 +# The 1st column is the Unicode scalar value. +# The 2nd column is the codepage byte sequence. +# The 3rd column is the fallback indicator. +# The fallback indicator can have one of the following values: +# |0 for exact 1-1 roundtrip mapping +# |1 for the best fallback codepage byte sequence. +# |2 for the substitution character +# |3 for the best reverse fallback Unicode scaler value +# +# Encoding description: Japanese (Shift-JIS) +# Encoding name: shift_jis +# +
"windows-932-2000" +
2 +
1 +
"MBCS" +
\x3F +
"ASCII" +# Suggested ICU specific alias information +#
"windows-932_VPUA" + +
0-80, 81-9f:1, a0-df, e0-fc:1, fd-ff +
40-7e, 80-fc + +# The following was the generated state table. +# This does not account for unassigned characters +#
0-80, 81-84:1, 87-9f:1, a0-df, e0-ea:1, ed-ee:1, f0-fc:1, fd-ff +#
40-7e, 80-fc +# +CHARMAP +# +#UNICODE 932 +#_______ _________ +
\x00 |0 +
\x01 |0 +
\x02 |0 +
\x03 |0 +
\x04 |0 +
\x05 |0 +
\x06 |0 +
\x07 |0 +
\x08 |0 +
\x09 |0 +
\x0A |0 +
\x0B |0 +
\x0C |0 +
\x0D |0 +
\x0E |0 +
\x0F |0 +
\x10 |0 +
\x11 |0 +
\x12 |0 +
\x13 |0 +
\x14 |0 +
\x15 |0 +
\x16 |0 +
\x17 |0 +
\x18 |0 +
\x19 |0 +
\x1A |0 +
\x1B |0 +
\x1C |0 +
\x1D |0 +
\x1E |0 +
\x1F |0 +
\x20 |0 +
\x21 |0 +
\x22 |0 +
\x23 |0 +
\x24 |0 +
\x25 |0 +
\x26 |0 +
\x27 |0 +
\x28 |0 +
\x29 |0 +
\x2A |0 +
\x2B |0 +
\x2C |0 +
\x2D |0 +
\x2E |0 +
\x2F |0 +
\x30 |0 +
\x31 |0 +
\x32 |0 +
\x33 |0 +
\x34 |0 +
\x35 |0 +
\x36 |0 +
\x37 |0 +
\x38 |0 +
\x39 |0 +
\x3A |0 +
\x3B |0 +
\x3C |0 +
\x3D |0 +
\x3E |0 +
\x3F |0 +
\x40 |0 +
\x41 |0 +
\x42 |0 +
\x43 |0 +
\x44 |0 +
\x45 |0 +
\x46 |0 +
\x47 |0 +
\x48 |0 +
\x49 |0 +
\x4A |0 +
\x4B |0 +
\x4C |0 +
\x4D |0 +
\x4E |0 +
\x4F |0 +
\x50 |0 +
\x51 |0 +
\x52 |0 +
\x53 |0 +
\x54 |0 +
\x55 |0 +
\x56 |0 +
\x57 |0 +
\x58 |0 +
\x59 |0 +
\x5A |0 +
\x5B |0 +
\x5C |0 +
\x5D |0 +
\x5E |0 +
\x5F |0 +
\x60 |0 +
\x61 |0 +
\x62 |0 +
\x63 |0 +
\x64 |0 +
\x65 |0 +
\x66 |0 +
\x67 |0 +
\x68 |0 +
\x69 |0 +
\x6A |0 +
\x6B |0 +
\x6C |0 +
\x6D |0 +
\x6E |0 +
\x6F |0 +
\x70 |0 +
\x71 |0 +
\x72 |0 +
\x73 |0 +
\x74 |0 +
\x75 |0 +
\x76 |0 +
\x77 |0 +
\x78 |0 +
\x79 |0 +
\x7A |0 +
\x7B |0 +
\x7C |0 +
\x7D |0 +
\x7E |0 +
\x7F |0 +
\x80 |0 +
\x21 |1 +
\x81\x91 |1 +
\x81\x92 |1 +
\x5C |1 +
\x7C |1 +
\x81\x98 |0 +
\x81\x4E |0 +
\x63 |1 +
\x61 |1 +
\x81\xE1 |1 +
\x81\xCA |1 +
\x2D |1 +
\x52 |1 +
\x81\x50 |1 +
\x81\x8B |0 +
\x81\x7D |0 +
\x32 |1 +
\x33 |1 +
\x81\x4C |0 +
\x83\xCA |1 +
\x81\xF7 |0 +
\x81\x45 |1 +
\x81\x43 |1 +
\x31 |1 +
\x6F |1 +
\x81\xE2 |1 +
\x41 |1 +
\x41 |1 +
\x41 |1 +
\x41 |1 +
\x41 |1 +
\x41 |1 +
\x41 |1 +
\x43 |1 +
\x45 |1 +
\x45 |1 +
\x45 |1 +
\x45 |1 +
\x49 |1 +
\x49 |1 +
\x49 |1 +
\x49 |1 +
\x44 |1 +
\x4E |1 +
\x4F |1 +
\x4F |1 +
\x4F |1 +
\x4F |1 +
\x4F |1 +
\x81\x7E |0 +
\x4F |1 +
\x55 |1 +
\x55 |1 +
\x55 |1 +
\x55 |1 +
\x59 |1 +
\x54 |1 +
\x73 |1 +
\x61 |1 +
\x61 |1 +
\x61 |1 +
\x61 |1 +
\x61 |1 +
\x61 |1 +
\x61 |1 +
\x63 |1 +
\x65 |1 +
\x65 |1 +
\x65 |1 +
\x65 |1 +
\x69 |1 +
\x69 |1 +
\x69 |1 +
\x69 |1 +
\x64 |1 +
\x6E |1 +
\x6F |1 +
\x6F |1 +
\x6F |1 +
\x6F |1 +
\x6F |1 +
\x81\x80 |0 +
\x6F |1 +
\x75 |1 +
\x75 |1 +
\x75 |1 +
\x75 |1 +
\x79 |1 +
\x74 |1 +
\x79 |1 +
\x83\x9F |0 +
\x83\xA0 |0 +
\x83\xA1 |0 +
\x83\xA2 |0 +
\x83\xA3 |0 +
\x83\xA4 |0 +
\x83\xA5 |0 +
\x83\xA6 |0 +
\x83\xA7 |0 +
\x83\xA8 |0 +
\x83\xA9 |0 +
\x83\xAA |0 +
\x83\xAB |0 +
\x83\xAC |0 +
\x83\xAD |0 +
\x83\xAE |0 +
\x83\xAF |0 +
\x83\xB0 |0 +
\x83\xB1 |0 +
\x83\xB2 |0 +
\x83\xB3 |0 +
\x83\xB4 |0 +
\x83\xB5 |0 +