接吻有什么好处| 塔塔粉是什么粉| 扯证是什么意思| 风声鹤唳是什么意思| 宝珀手表属于什么档次| 白羊座的幸运色是什么| 鱼子酱什么味道| 恶露是什么| 世界上最大的海洋是什么| 嗔心是什么意思| aimee是什么意思| 出生证编号是什么| 什么血型最好| 声讨是什么意思| 如果你是什么那快乐就是什么| 梦见自己洗澡是什么意思| 春梦是什么意思啊| 紫涵女装属于什么档次| 2月2日什么星座| 什么香什么鼻| 基尼系数是什么意思| 散瞳是什么意思| 双十一从什么时候开始| 寂寞难耐是什么意思| 右肋骨疼是什么原因| 听调不听宣什么意思| 世界上最高的塔是什么塔| 女性后背疼挂什么科室| ms.是什么意思| 什么人骗别人也骗自己| 真菌阴性是什么意思| 女生打呼噜是什么原因| 肾小球是什么| 霉菌是什么病| 属龙和什么属相相冲| 惜败是什么意思| 尿结石什么症状| 安全三原则是指什么| 小排畸主要检查什么| 滑膜炎挂什么科| 胆固醇高吃什么食物最好| 黑茶有什么功效| 中性粒细胞偏高是什么意思| 1992属什么生肖| 血小板压积偏低是什么意思| 达英35是什么| 动物的脖子有什么作用| 白露是什么季节| 单核细胞偏低是什么意思| 属猴配什么属相最好| 荷兰豆炒什么好吃| fresh是什么意思| 乌龟代表什么生肖| 外婆家是什么菜系| 为什么会得梅毒| 月经来了吃什么水果好| 额头上长痘痘是什么原因| 成熟是什么意思| pd999是什么金| n表示什么| 按摩是什么意思| 1919年发生了什么| 经常叹气是什么原因| 港式按摩是什么意思| 梦见妯娌是什么意思| 打嗝不停是什么原因| 老打嗝什么原因| 黑色柳丁是什么意思| gr是什么元素| 女人每天喝什么最养颜| 减肥有什么好方法| 未见明显血流信号是什么意思| 摇头是什么病| 什么吞什么咽| 喝什么酒容易醉| 荨麻疹挂什么科| 抗sm抗体阳性什么意思| 心灵的洗礼是什么意思| 喝酸梅汤有什么好处| other什么意思| 1987年是什么年| 南是什么结构| 精神障碍是什么病| 黑上衣配什么颜色裤子男| 人尽可夫是什么意思| 肩膀疼是什么原因引起的| 桃皮绒是什么面料| 为什么声音老是嘶哑| 知柏地黄丸有什么功效| 梦到孩子丢了是什么征兆| 胃字出头念什么| 肌无力挂什么科| 看头发挂什么科| 什么油适合炒菜| 高血糖可以吃什么水果| 斐乐手表属于什么档次| 不全纵隔子宫是什么意思| 什么是提肛运动| 收官是什么意思| 怀孕抽烟对孩子有什么影响| 乳腺纤维瘤有什么症状表现| 梅毒检查挂什么科| 1月21号是什么星座| cashmere是什么面料| 子宫低回声结节是什么意思| 今天什么生肖| mnm是什么单位| 头痛眼睛痛什么原因引起的| 苦丁茶有什么功效| 梦见针是什么意思| 平诊是什么意思| 目赤肿痛吃什么药最好| 喝酒上头是什么原因| 靶向药是什么意思| 什么是染色体| 经常流眼泪是什么原因| 拔了牙吃什么消炎药| 口腔溃疡用什么药治疗| 荣耀是什么品牌| 什么是粒子植入治疗| 93鸡和94狗生什么宝宝| 肛门看什么科| 非什么意思| 1月25号什么星座| 鳀鱼是什么鱼| sakose是什么牌子| 女人40不惑什么意思| 牙齿上有黑点是什么原因| 七上八下是什么生肖| 马冬梅是什么电影| 得莫利是什么意思| 肾动脉彩超主要查什么| 跑完步头疼是为什么| 梦见涨洪水是什么兆头| 麝香是什么动物| 蒲公英的花是什么颜色| 小儿湿疹是什么原因造成的| 掉筷子有什么预兆| 下面瘙痒用什么药| 3月25日是什么星座| 蚝油可以用什么代替| 男人结扎对身体有什么影响| 为什么舌头老是有灼烧感| 耳鼻喉科主要看什么病| 什么是八爪鱼| 石斛有什么用| 丘疹性荨麻疹用什么药| 醋蛋液主要治什么| 叶黄素是什么| 慢性宫颈炎用什么药| 回族不吃什么肉| 怀孕做nt检查什么| 无花果和什么煲汤好| 胆红素偏高是什么意思| 蚕除了吃桑叶还能吃什么| 梦到挖坟墓是什么意思| 减肥头晕是什么原因| 膀胱结石是什么症状| 回族为什么不吃猪肉| 尿潜血阳性是什么意思| 直肠息肉有什么症状| 梦见搬家是什么预兆| 六个坚持是什么| 小怪兽是什么| 电销是什么| 无畏布施是什么意思| 答非所问是什么意思| 室性逸搏是什么意思| 乌鸦飞进家里什么征兆| cin3是什么意思| 乐山大佛是什么佛| 43岁属什么| 态度是什么| 人棍是什么意思| 为什么不一样| 皮试是什么| 老干部是什么意思| 男性内分泌失调吃什么药| 玉如意什么属相不能戴| crn什么意思| 卵巢早衰吃什么可以补回来| 牙齿根部发黑是什么原因| 得乙肝的人有什么症状| 哪吒的妈妈叫什么| 莴笋不能和什么一起吃| 肉松是什么做的| 键盘侠是什么意思| 韩语欧巴是什么意思| 厅堂是什么意思| 胃溃疡吃什么药好得快| 继发性肺结核是什么意思| 法界是什么意思| 双肺微结节是什么意思| 丹参是什么样子的图片| 钓鱼执法什么意思| 梦见大蟒蛇是什么征兆| 胃阳不足吃什么中成药| 一鸣惊人指什么动物| 生物制剂对人体有什么副作用| 上热下寒吃什么药| 下午17点是什么时辰| 爱打扮的女人说明什么| emoji是什么意思| 1961年属什么| 白陶土样便见于什么病| 为什么不开朱元璋的墓| 单活胎是什么意思| 卤素灯是什么灯| 什么狗不咬人| 下午三点到五点是什么时辰| 两面人是什么意思| mica是什么意思| 京东京豆有什么用| 文科女生学什么专业就业前景好| 念珠菌阳性是什么意思| 低血压去药店买什么药| 亩产是什么意思| 肩膀痛是什么原因| 乳酸杆菌阳性什么意思| 什么叫代孕| 武则天墓为什么不敢挖| 4月10号什么星座| 攻是什么意思| 手口足吃什么药| 山峰是什么意思| 肾病钾高吃什么食物好| 检验科是做什么的| 胸口隐隐作痛挂什么科| cno什么意思| 雅字取名的寓意是什么| 精分是什么意思| 打狂犬疫苗不能吃什么食物| 病毒感染发烧吃什么药| 脑供血不足什么原因引起的| 属鸡在脖子上戴什么好| 房颤与早搏有什么区别| 什么是逆向思维| 西安有什么山| 抗角蛋白抗体阳性是什么意思| 送朋友鲜花送什么花| 呕什么意思| 细菌是什么生殖| 甲功三项查的是什么| 一把手是什么意思| 巴厘岛机场叫什么| 夜字五行属什么| 白脸代表什么| 膝盖疼痛吃什么药好| 中暑吃什么药见效快| 六月十九是什么星座| 柠檬泡蜂蜜有什么功效| 牙套什么年龄戴合适| 卯时属什么| 禀报是什么意思| 夯实是什么意思| 食用酒精是什么做的| 梦见蝴蝶是什么意思| 曹操是什么星座| 胸片是什么| 为什么手比脸白那么多| 梦见死人是什么兆头| 小螃蟹吃什么| 11月18日什么星座| 被隐翅虫咬了涂什么药| 百度Jump to content

连城:“腾笼换鸟”攻坚项目落地 突破工业发展

From Wikipedia, the free encyclopedia
Big5
MIME / IANABig5
Alias(es)Big-5, 大五碼
Language(s)Traditional Chinese, English
Partial support:
Simplified Chinese, Greek, Japanese, Russian, Bulgarian, some of IPA letters for phonetic usage.[1]
Created byInstitute for Information Industry
ClassificationExtended ASCII,[a][b] variable-width encoding, DBCS, CJK encoding
ExtendsASCII[b]
ExtensionsWindows-950, Big5-HKSCS, numerous others
Other related encoding(s)CNS 11643
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.
  2. ^ a b Big5 does not specify a single-byte component; however, ASCII (or an extension) is used in practice.
百度 然而,相比于酒后不能驾车,大家对酒后不能驾船的认知还不深刻,于是,不少人存在着侥幸的心理。

Big-5 or Big5 (Chinese: 大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead (though it can also substitute Big-5 or UTF-8).[citation needed]

Big5 gets its name from the consortium of five companies in Taiwan that developed it.[2]

Encoding

[edit]

The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical.

The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity.

The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the Shift JIS encoding. It is a double-byte character set (DBCS) with the following structure:

First byte ("lead byte") 0x81 to 0xfe (or 0xa1 to 0xf9 for non-user-defined characters)
Second byte 0x40 to 0x7e, 0xa1 to 0xfe

(the prefix 0x signifying hexadecimal numbers).

Standard assignments (excluding vendor or user-defined extensions) do not use the bytes 0x7F through 0xA0, nor 0xFF, as either lead (first) or trail (second) bytes. Bytes 0xA1 through 0xFE are used for both lead and trail bytes for double-byte (Big5) codes. Bytes 0x40 through 0x7E are used as trail bytes following a lead byte, or for single-byte codes otherwise. If the second byte is not in either range, behavior is unspecified (i.e., varies from system to system). Additionally, certain variants of the Big5 character set, for example the HKSCS, use an expanded range for the lead byte, including values in the 0x81 to 0xA0 range (similar to Shift JIS), whereas others use reduced lead byte ranges (for instance, the Apple Macintosh variant uses 0xFD through 0xFF as single-byte codes, limiting the lead byte range to 0xA1 through 0xFC).[3]

The numerical value of individual Big5 codes are frequently given as a 4-digit hexadecimal number, which describes the two bytes that comprise the Big5 code as if the two bytes were a big endian representation of a 16-bit number. For example, the Big5 code for a full-width space, which are the bytes 0xa1 0x40, is usually written as 0xa140 or just A140.

Strictly speaking, the Big5 encoding contains only DBCS characters. However, in practice, the Big5 codes are always used together with an unspecified, system-dependent single-byte character set (SBCS) (such as ASCII or code page 437), so that Big5-encoded text contains a mix of double-byte characters and single-byte characters. Bytes in the range 0x00 to 0x7f that are not part of a double-byte character are assumed to be single-byte characters. (For a more detailed description of this problem, please see the discussion on "The Matching SBCS" below.)

The meaning of non-ASCII single bytes outside the permitted values that are not part of a double-byte character varies from system to system. In old MSDOS-based systems, they are likely to be displayed as 8-bit characters; in modern systems, they are likely to either give unpredictable results or generate an error.

A more detailed look at the organization

[edit]

In the original Big5, the encoding is compartmentalized into different zones:

0x8140 to 0xA0FE Reserved for user-defined characters 造字
0xA140 to 0xA3BF "Graphical characters" 圖形碼
0xA3C0 to 0xA3FE Reserved, not for user-defined characters
0xA440 to 0xC67E Frequently used characters 常用字
0xC6A1 to 0xC8FE Reserved for user-defined characters
0xC940 to 0xF9D5 Less frequently used characters 次常用字
0xF9D6 to 0xFEFE Reserved for user-defined characters

The "graphical characters" actually comprise punctuation marks, partial punctuation marks (e.g., half of a dash, half of an ellipsis; see below), dingbats, foreign characters, and other special characters (e.g., presentational "full width" forms, digits for Suzhou numerals, zhuyin fuhao, etc.)

In most vendor extensions, extended characters are placed in the various zones reserved for user-defined characters, each of which are normally regarded as associated with the preceding zone. For example, additional "graphical characters" (e.g., punctuation marks) would be expected to be placed in the 0xa3c00xa3fe range, and additional logograms would be placed in either the 0xc6a10xc8fe or the 0xf9d60xfefe range. Sometimes, this is not possible due to the large number of extended characters to be added; for example, Cyrillic letters and Japanese kana have been placed in the zone associated with "frequently-used characters".

Duplicates

[edit]

Big5 has encoded two duplicate characters: "兀" on 0xA461 (U+5140) and 0xC94A (U+FA0C), "嗀" on 0xDCD1 (U+55C0) and 0xDDFC (U+FA0D).

Some encoding mapping also maps the three Suzhou numerals, "?", "?" and "?", in the graphical section to ideograph characters (U+5341, U+5344 and U+5345 respectively)[4][5] instead of CJK Symbols and Punctuation (U+3038, U+3039 and U+303A respectively).[6][7]

What a Big5 code actually encodes

[edit]

An individual Big5 code does not always represent a complete semantic unit. The Big5 codes of logograms are always logograms, but codes in the "graphical characters" section are not always complete "graphical characters". What Big5 encodes are particular graphical representations of characters or part of characters that happen to fit in the space taken by two monospaced ASCII characters. This is a property of CJK double-byte character sets, and is not a unique problem of Big5.

(The above might need some explanation by putting it in historical perspective, as it is theoretically incorrect: Back when text mode personal computing was still the norm, characters were normally represented as single bytes and each character takes one position on the screen. There was therefore a practical reason to insist that double-byte characters must take up two positions on the screen, namely that off-the-shelf, American-made software would then be usable without modification in a DBCS-based system. If a character can take an arbitrary number of screen positions, software that assumes that one byte of text takes one screen position would produce incorrect output. Of course, if a computer never had to deal with the text screen, the manufacturer would not enforce this artificial restriction; the Apple Macintosh is an example. Nevertheless, the encoding itself must be designed so that it works correctly on text-screen-based systems.)

To illustrate this point, consider the Big5 code 0xa14b (…). To English speakers this looks like an ellipsis and the Unicode standard identifies it as such; however, in Chinese, the ellipsis consists of six dots that fit in the space of two Chinese characters (……), so in fact there is no Big5 code for the Chinese ellipsis, and the Big5 code 0xa14b just represents half of a Chinese ellipsis. It represents only half of an ellipsis because the whole ellipsis should take the space of two Chinese characters, and in many DBCS systems one DBCS character must take exactly the space of one Chinese character.

Characters encoded in Big5 do not always represent things that can be readily used in plain text files; an example is "citation mark" (0xa1ca, ﹋), which is, when used, required to be typeset under the title of literary works. Another example is the Suzhou numerals, which is a form of scientific notation that requires the number to be laid out in a 2-D form consisting of at least two rows.

The Matching SBCS

[edit]

In practice, Big5 cannot be used without a matching SBCS; this is mostly to do with a compatibility reason. However, as in the case of other CJK DBCS character sets, the SBCS to use has never been specified. Big5 has always been defined as a DBCS, though when used it must be paired with a suitable, unspecified SBCS and therefore used as what some people call a MBCS; nevertheless, Big5 by itself, as defined, is strictly a DBCS.

The SBCS to use being unspecified implies that the SBCS used can theoretically vary from system to system. Nowadays, ASCII is the only possible SBCS one would use. However, in old DOS-based systems, code page 437—with its extra special symbols in the control code area including position 127—was much more common. Yet, on a Macintosh system with the Chinese Language Kit, or on a Unix system running the cxterm terminal emulator, the SBCS paired with Big5 would not be code page 437.

Outside the valid range of Big5, the old DOS-based systems would routinely interpret things according to the SBCS that is paired with Big5 on that system. In such systems, characters 127 to 160, for example, were very likely not avoided because they would produce invalid Big5, but used because they would be valid characters in code page 437.

The modern characterization of Big5 as an MBCS consisting of the DBCS of Big5 plus the SBCS of ASCII is therefore historically incorrect and potentially flawed, as the choice of the matching SBCS was, and theoretically still is, quite independent of the flavour of Big5 being used.

History

[edit]

The inability of ASCII to support large Chinese, Japanese and Korean (CJK) character sets led to governments and industry to find creative solutions to enable their languages to be rendered on computers. A variety of ad hoc and usually proprietary input methods led to efforts to develop a standard system. As a result, Big5 encoding was defined by the Institute for Information Industry of Taiwan in 1984.

The name "Big5" is in recognition that the standard emerged from collaboration of five of Taiwan's largest IT firms:

Big5 was rapidly popularized in Taiwan and worldwide among Chinese who used the traditional Chinese character set through its adoption in several commercial software packages, notably the E-TEN Chinese DOS input system (ETen Chinese System). The Republic of China government declared Big5 as their standard in mid-1980s since it was, by then, the de facto standard for using traditional Chinese on computers.

Extensions

[edit]

The original Big-5 only include CJK logograms from the Charts of Standard Forms of Common National Characters (4808 characters) and Less-Than-Common National Characters (6343 characters), but not letters from people's names, place names, dialects, chemistry, biology, and Japanese kana. As a result, many Big-5 supporting programs include extensions to address the problems.

The plethora of variations make UTF-8 (or UTF-16 or the Chinese GB 18030 standard, which is also a full Unicode Transformation Format, i.e. not only for simplified Chinese) a more consistent code page for modern use.

Vendor extensions

[edit]

ETen extensions

[edit]

In the ETen (倚天) Chinese operating system, the following code points are added, to add support for some characters present in the IBM 5550's code page but absent from generic Big5:

In some versions of ETen, there are extra graphical symbols and simplified Chinese characters.

Microsoft code pages

[edit]

Microsoft (微軟) created its own version of Big5 extension as code page 950 for use with Microsoft Windows, which supports the F9D6–F9FE code points from ETEN's extensions. In some versions of Windows, the euro currency symbol is mapped to Big-5 code point A3E1.

After installing Microsoft's HKSCS patch on top of traditional Chinese Windows (or any version of Windows 2000 and above with proper language pack), applications using code page 950 automatically use a hidden code page 951 table. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard.[8]

IBM code pages

[edit]

In contrast to Microsoft's code page 950, IBM's CCSID 950 comprises single byte code page 1114 (CCSID 1114) and double byte code page 947 (CCSID 947).[9][10][11] It incorporates ETEN extensions for lead bytes 0xA3,[12] 0xC6,[13][14] 0xC7[15] and 0xC8,[13][16] while omitting those with lead byte 0xF9 (which Microsoft includes), mapping them instead to the Private Use Area as user-defined characters.[13][17] It also includes two non-ETEN extension regions with trail bytes 0x81–A0, i.e. outside the usual Big5 trail byte range but similar to the Big5+ trail byte range: area 5 has lead bytes 0xF2–F9 and contains IBM-selected characters, while area 9 has lead bytes 0x81–8C and is a user-defined region.[18]

IBM refers to the euro sign update of their Big-5 variant as CCSID 1370, which includes both single-byte (0x80) and double-byte (0xA3E1) euro signs.[19] It comprises single byte code page 1114 (CCSID 5210) and double byte code page 947 (CCSID 21427).[19][20][21] For better compatibility with Microsoft's variant in IBM Db2, IBM also define the pure double-byte code page 1372[22] and the associated variable-width CCSID 1373, which corresponds to Microsoft's code page 950.[23]

IBM assigns CCSID 5471 to the HKSCS-2001 Big5 code page (with CPGID 1374 as CCSID 5470 as the double byte component),[24][25] CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component),[26] and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component),[27] while CCSID 1375 is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663.[28]

ChinaSea font

[edit]

ChinaSea fonts (中國海字集)[29] are Traditional Chinese fonts made by ChinaSea. The fonts are rarely sold separately, but are bundled with other products, such as the Chinese version of Microsoft Office 97. The fonts support Japanese kana, kokuji, and other characters missing in Big-5. As a result, the ChinaSea extensions have become more popular than the government-supported extensions.[as of?] Some Hong Kong BBSes had used encodings in ChinaSea fonts before the introduction of HKSCS.

'Sakura' font

[edit]

The 'Sakura' font (日和字集 Sakura Version) is developed in Hong Kong and is designed to be compatible with HKSCS. It adds support for kokuji and proprietary dingbats (including Doraemon) not found in HKSCS.

Unicode-at-on

[edit]

Unicode-at-on (Unicode補完計畫), formerly BIG5 extension, extends BIG-5 by altering code page tables, but uses the ChinaSea extensions starting with version 2. However, with the bankruptcy of ChinaSea, late development, and the increasing popularity of HKSCS and Unicode (the project is not compatible with HKSCS), the success of this extension is limited at best.

Despite the problems, characters previously mapped to Unicode Private Use Area are remapped to the standardized equivalents when exporting characters to Unicode format.

OPG

[edit]

The web sites of the Oriental Daily News and Sun Daily, belonging to the Oriental Press Group Limited (東方報業集團有限公司) in Hong Kong, used a downloadable font with a different Big-5 extension coding than the HKSCS.

Official extensions

[edit]

Taiwan Ministry of Education font

[edit]

The Taiwan Ministry of Education supplied its own font, the Taiwan Ministry of Education font (臺灣教育部造字檔) for use internally.

Taiwan Council of Agriculture font

[edit]

Executive Yuan introduced a 133-character custom font, the Taiwan Council of Agriculture font (臺灣農委會常用中文外字集), that includes 84 characters from the fish radical and 7 from the bird radical.

Big5+

[edit]

The Chinese Foundation for Digitization Technology (中文數位化技術推廣委員會) introduced Big5+ in 1997, which used over 20000 code points to incorporate all CJK logograms in Unicode 1.1. However, the extra code points exceeded the original Big-5 definition (Big5+ uses high byte values 81-FE and low byte values 40-7E and 80-FE), preventing it from being installed on Microsoft Windows without new codepage files.

Big-5E

[edit]

To allow Windows users to use custom fonts, the Chinese Foundation for Digitization Technology introduced Big-5E, which added 3954 characters (in three blocks of code points: 8E40-A0FE, 8140-86DF, 86E0-875C) and removed the Japanese kana from the ETEN extension. Unlike Big-5+, Big5E extends Big-5 within its original definition. Mac OS X 10.3 and later supports Big-5E in the fonts LiHei Pro (儷黑 Pro.ttf) and LiSong Pro (儷宋 Pro.ttf).

Big5-2003

[edit]

The Chinese Foundation for Digitization Technology made a Big5 definition and put it into CNS 11643 in note form, making it part of the official standard in Taiwan.

Big5-2003 incorporates all Big-5 characters introduced in the 1984 ETEN extensions (code points A3C0-A3E0, C6A1-C7F2, and F9D6-F9FE) and the Euro symbol. Cyrillic characters were not included because the authority claimed CNS 11643 does not include such characters.

CDP

[edit]

The Academia Sinica made a Chinese Data Processing font (漢字構形資料庫) in late 1990s, which the latest release version 2.5 included 112,533 characters, some less than the Mojikyo fonts.

HKSCS

[edit]

Hong Kong also adopted Big5 for character encoding. However, written Cantonese has its own characters not available in the normal Big5 character set. To solve this problem, the Hong Kong Government created the Big5 extensions Government Chinese Character Set (GCCS) in 1995 and Hong Kong Supplementary Character Set in 1999. The Hong Kong extensions were commonly distributed as a patch. It is still being distributed as a patch by Microsoft, but a full Unicode font is also available from the Hong Kong Government's web site.

There are two encoding schemes of HKSCS: one encoding scheme is for the Big-5 coding standard and the other is for the ISO 10646 standard. Subsequent to the initial release, there are also HKSCS-2001 and HKSCS-2004. The HKSCS-2004 is aligned technically with the ISO/IEC 10646:2003 and its Amendment 1 published in April 2004 by the International Organization for Standardization (ISO).

HKSCS includes all the characters from the common ETen extension, plus some characters from simplified Chinese, place names, people's names, and Cantonese phrases (including profanity).

As of 2020, the most recent edition of HKSCS is HKSCS-2016; however, the last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008, while the characters added in more recent editions are mapped to ISO 10646 / Unicode only (as a CJK Unified Ideographs horizontal glyph extension where appropriate).[30] Additionally, similarly to Hong Kong's situation, there are also characters that are needed by Macao but is neither included in Big5 nor HKSCS, hence, the Macao Supplementary Character Set was developed, comprising characters not found in Big5 or HKSCS; this, however, is also not encoded in Big5. The first batch of 121 MSCS characters were submitted for inclusion in or mapping to Unicode in 2009,[31] and the first final version of MSCS was established in 2020.[30]

Kana and Cyrillic

[edit]

There are two major Big5 extension layouts for encoding kana, Russian Cyrillic and list markers in the range 0xC6A1 through 0xC875. These are not compatible with one another.[32] They are compared in the table below.

The ETEN layout of kana and Cyrillic is also used by the HKSCS[33] (including HTML5)[34] and Unicode-At-On[35] variants, as well as by IBM's version of code page 950,[36][37][38] and the ETEN layout of the kana (with Cyrillic omitted) is also used by the Big5-2003 variant.[39] The published mapping files for Windows-950 include neither, and this Big5 range is mapped to the Private Use Area by the Windows-950 implementation from International Components for Unicode.[40] Python's built-in cp950 codec implementation is using the BIG5.TXT layout.[41] The classic Mac OS version includes neither layout.[3]

See also

[edit]

References

[edit]
  1. ^ "Big5 (Traditional Chinese) character code table". Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  2. ^ "Character Sets". chinesemac.org. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  3. ^ a b Apple, Inc (2025-08-08) [2025-08-08]. Map (external version) from Mac OS Chinese Traditional encoding to Unicode 3.0 and later. Unicode Consortium. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  4. ^ "Unicode CP950 mapping file". Unicode. Unicode Consortium. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  5. ^ "Unicode Big5 mapping file". Unicode. Unicode Consortium. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  6. ^ "Mozilla 系列與 Big5 中文字碼(Big5-2003)". Mozilla 台湾社群 (in Chinese (Taiwan)). Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  7. ^ The ETEN mapping file provided by Mozilla Taiwan community maps the three characters to both the symbol and ideograph codepoint. "Mozilla 系列與 Big5 中文字碼(ETEN)". Mozilla 台湾社群 (in Chinese (Taiwan)). Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  8. ^ "狗爺語錄 ? Blog Archive ? What is Code Page 951 (CP951)?". Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  9. ^ "CCSID 950 information document". Archived from the original on 2025-08-08.
  10. ^ "CCSID 1114 information document". Archived from the original on 2025-08-08.
  11. ^ "CCSID 947 information document". Archived from the original on 2025-08-08.
  12. ^ "Lead byte A3: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  13. ^ a b c Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  14. ^ "Lead byte C6: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  15. ^ "Lead byte C7: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  16. ^ "Lead byte C8: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  17. ^ "Lead byte F9: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  18. ^ "IBM Traditional Chinese Graphic Character Set for IBM BIG-5 Code" (PDF). IBM. 1999. C-H 3-3220-131 1999-04. Archived (PDF) from the original on 2025-08-08. Retrieved 2025-08-08.
  19. ^ a b "CCSID 1370 information document". Archived from the original on 2025-08-08.
  20. ^ "CCSID 5210 information document". Archived from the original on 2025-08-08.
  21. ^ "CCSID 21427 information document". Archived from the original on 2025-08-08.
  22. ^ "CPGID 01372: MS T-Chinese Big-5 (Special for DB2)". IBM Globalization - Code page identifiers. Archived from the original on 2025-08-08.
  23. ^ "ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  24. ^ "CCSID 5471: Mixed Big-5 ext for HKSCS-2001". IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2025-08-08.
  25. ^ International Components for Unicode (ICU), ibm-5471_P100-2006.ucm, 2025-08-08, archived from the original on 2025-08-08, retrieved 2025-08-08
  26. ^ "CCSID 9567: Mixed Big-5 ext for HKSCS-2004". IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2025-08-08.
  27. ^ "CCSID 13663: Mixed Big-5 ext for HKSCS-2008". IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2025-08-08.
  28. ^ "CCSID 1375: Mixed Big-5 ext for HKSCS". IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2025-08-08.
  29. ^ 黃國書. "Chinasea 1.0 中國海字集". ISU FTP. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  30. ^ a b Macao Special Administrative Region Government (2025-08-08). "Submission of Macao's Vertical Extension (UNC Characters), Horizontal Extension, and IVSes Registration for MSCS" (PDF). ISO/IEC JTC 1/SC 2/WG 2 IRGN 2430. Archived (PDF) from the original on 2025-08-08. Retrieved 2025-08-08.
  31. ^ Computer Chinese Characters Encoding Workgroup (2025-08-08). "Submission of Characters from Macao Information Systems Character Set" (PDF). ISO/IEC JTC 1/SC 2/WG 2 IRGN 1580. Archived from the original (PDF) on 2025-08-08.
  32. ^ Lunde, Ken (2025-08-08). "2.3.1: BIG FIVE". CJK.INF Version 2.1. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  33. ^ "Big5HKSCS-2004". Mozilla Taiwan. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  34. ^ van Kesteren, Anne. "big5". Encoding Standard. WHATWG. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  35. ^ "UAO 2.41 b2u". Mozilla Taiwan. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  36. ^ "Lead byte C6: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  37. ^ "Lead byte C7: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  38. ^ "Lead byte C8: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  39. ^ "Big5-2003 b2u". Mozilla Taiwan. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  40. ^ IBM; Unicode Consortium (2025-08-08). "windows-950-2000". International Components for Unicode. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  41. ^ "Script showing output of cp950 codec for lead bytes 0xC6 and 0xC7". Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  42. ^ Unicode Consortium (2025-08-08) [2025-08-08]. BIG5 to Unicode table (complete). Archived from the original on 2025-08-08. Retrieved 2025-08-08.
  43. ^ "Big5-ETen vs Unicode mapping table". Mozilla Taiwan. 2025-08-08. Archived from the original on 2025-08-08. Retrieved 2025-08-08.
[edit]
梦见很多小孩是什么意思 兆后面是什么单位 得逞是什么意思 皮疹和湿疹有什么区别 吃维生素a有什么好处
瞳孔放大意味着什么 养老院靠什么挣钱 甘草长什么样子图片 米五行属什么 戴玉对身体有什么好处
cri是什么意思 为什么发动文化大革命 靴型心见于什么病 ic50是什么意思 经常口腔溃疡挂什么科
附睾炎吃什么药 人生苦短是什么意思 僵尸是什么意思 小腿肚子疼是什么原因 下面痒用什么清洗最好
7月17什么星座hcv8jop1ns9r.cn 肾病综合征是什么病hcv8jop9ns5r.cn 吃什么下火效果最好hcv9jop7ns0r.cn 总胆红素高是什么意思hcv8jop8ns2r.cn 解表药是什么意思hcv8jop9ns2r.cn
梦见捡鸡蛋是什么意思hcv9jop8ns1r.cn 前列腺穿刺是什么意思hcv8jop0ns1r.cn 早上起来眼皮肿是什么原因hcv7jop9ns3r.cn 月经期体重增加是什么原因zhongyiyatai.com 特警力量第二部叫什么wzqsfys.com
为什么要闰月hcv8jop9ns6r.cn 白带豆腐渣状是什么原因造成的fenrenren.com 饿是什么感觉hcv9jop3ns8r.cn 低蛋白血症是什么意思hcv9jop0ns9r.cn 鱼泡是鱼的什么器官hcv9jop5ns2r.cn
吃绝户是什么意思hcv8jop2ns7r.cn 在家里做什么能赚钱hcv7jop6ns1r.cn 嗓子挂什么科hcv9jop8ns0r.cn 男人喝劲酒有什么好处hcv9jop5ns0r.cn 紧急避孕药什么时候吃最好hanqikai.com
百度