Character conversion data tables

Authors
Peter Krefting <peter@opera.com>
Date
2012-02-01
Circulation
Customer
Abstract
This document briefly documents the sizes of the conversion data tables used by Opera's to decode and encode various character encodings

This document is believed to be inaccurate as of 2012-02-01.

Encoding Forward conversion Reverse conversion
UTF
UTF-7
UTF-8
UTF-16
UTF-32
ISO 8859 series
ISO 8859-1
ISO 8859-2 256 bytes 384 bytes
ISO 8859-3 256 bytes 363 bytes
ISO 8859-4 256 bytes 384 bytes
ISO 8859-5 256 bytes 384 bytes
ISO 8859-6 256 bytes 249 bytes
ISO 8859-7 256 bytes 375 bytes
ISO 8859-8 256 bytes 276 bytes
ISO 8859-9 256 bytes 384 bytes
ISO 8859-10 256 bytes 384 bytes
ISO 8859-11 256 bytes 291 bytes
ISO 8859-13 256 bytes 384 bytes
ISO 8859-14 256 bytes 384 bytes
ISO 8859-15 256 bytes 384 bytes
ISO 8859-16 256 bytes 384 bytes
Windows code pages
Codepage 1250 256 bytes 369 bytes
Codepage 1251 256 bytes 381 bytes
Codepage 1252 256 bytes (see note)369 bytes
Codepage 1253 256 bytes 333 bytes
Codepage 1254 256 bytes 363 bytes
Codepage 1255 256 bytes 315 bytes
Codepage 1256 256 bytes 384 bytes
Codepage 1257 256 bytes 348 bytes
Codepage 1258 256 bytes 357 bytes
Multi-byte Chinese encodings
Big 5 33998 bytes 41802 + 2816 bytes
↳ Big 5 HKSCS (see note) 336 + 12976 + 6772 bytes (see note)
EUC-TW 52452 bytes 41804 + 1812 bytes
ISO 2022-CN (see note) (see note)
GBK/GB2312 48132 bytes 41804 + 3880 bytes
HZ-GB2312
↳ GB 18030 824 bytes (see note)(see note)
Multi-byte Japanese encodings
ISO 2022-JP 512 + 22372 bytes (see note)
+ 22372 bytes (JIS-0212, see note)
41794 + 2576 bytes (normal)
41794 + 3584 bytes (IMode)
+ 41794 + 1080 bytes (JIS-0212, see note)
ISO 2022-JP-1
EUC-JP
Shift-JIS
Multi-byte Korean encodings
EUC-KR 35260 bytes 22344 + 23504 bytes
ISO 2022-KR
Other encodings
IBM code page 866256 bytes 384 bytes
KOI8-R 256 bytes 384 bytes
KOI8-U 256 bytes 384 bytes
VISCII 512 bytes 402 bytes

Notes: