since 2013-01-14
python_unicodedata の関連記事。
0xff5e は iso-2022-jp にできない:
>>> unichr(0xff5e).encode('cp932').decode('cp932') u'\uff5e' >>> unichr(0x301c).encode('iso-2022-jp').decode('iso-2022-jp') u'\u301c' >>> unichr(0xff5e).encode('iso-2022-jp').decode('iso-2022-jp') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'iso2022_jp' codec can't encode character u'\uff5e' in position 0: illegal multibyte sequence
ignore を使うと消えてしまう。
>>> unichr(0xff5e).encode('iso-2022-jp', 'ignore').decode('iso-2022-jp') u''
0304 COMBINING MACRON もエラーに:
>>> unichr(0x0304).encode('iso-2022-jp').decode('iso-2022-jp') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'iso2022_jp' codec can't encode character u'\u0304' in position 0: illegal multibyte sequence
非BMP文字もエラーに:
>>> u'\U0002667e'.encode('iso-2022-jp').decode('iso-2022-jp') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'iso2022_jp' codec can't encode character u'\ud859' in position 0: illegal multibyte sequence