================= Support ================= **If you are running:** - Python >=2.7,<3.5: Unsupported - Python 3.5: charset-normalizer < 2.1 - Python 3.6: charset-normalizer < 3.1 Upgrade your Python interpreter as soon as possible. ------------------- Supported Encodings ------------------- Here are a list of supported encoding and supported language with latest update. Also this list may change depending of your python version. Charset Normalizer is able to detect any of those encoding. This list is NOT static and depends heavily on what your current cPython version is shipped with. See https://docs.python.org/3/library/codecs.html#standard-encodings =============== =============================================================================================================================== IANA Code Page Aliases =============== =============================================================================================================================== ascii 646, ansi_x3.4_1968, ansi_x3_4_1968, ansi_x3.4_1986, cp367, csascii, ibm367, iso646_us, iso_646.irv_1991, iso_ir_6, us, us_ascii big5 big5_tw, csbig5, x_mac_trad_chinese big5hkscs big5_hkscs, hkscs cp037 037, csibm037, ebcdic_cp_ca, ebcdic_cp_nl, ebcdic_cp_us, ebcdic_cp_wt, ibm037, ibm039 cp1026 1026, csibm1026, ibm1026 cp1125 1125, ibm1125, cp866u, ruscii cp1140 1140, ibm1140 cp1250 1250, windows_1250 cp1251 1251, windows_1251 cp1252 1252, windows_1252 cp1253 1253, windows_1253 cp1254 1254, windows_1254 cp1255 1255, windows_1255 cp1256 1256, windows_1256 cp1257 1257, windows_1257 cp1258 1258, windows_1258 cp273 273, ibm273, csibm273 cp424 424, csibm424, ebcdic_cp_he, ibm424 cp437 437, cspc8codepage437, ibm437 cp500 500, csibm500, ebcdic_cp_be, ebcdic_cp_ch, ibm500 cp775 775, cspc775baltic, ibm775 cp850 850, cspc850multilingual, ibm850 cp852 852, cspcp852, ibm852 cp855 855, csibm855, ibm855 cp857 857, csibm857, ibm857 cp858 858, csibm858, ibm858 cp860 860, csibm860, ibm860 cp861 861, cp_is, csibm861, ibm861 cp862 862, cspc862latinhebrew, ibm862 cp863 863, csibm863, ibm863 cp864 864, csibm864, ibm864 cp865 865, csibm865, ibm865 cp866 866, csibm866, ibm866 cp869 869, cp_gr, csibm869, ibm869 cp932 932, ms932, mskanji, ms_kanji cp949 949, ms949, uhc cp950 950, ms950 euc_jis_2004 jisx0213, eucjis2004, euc_jis2004 euc_jisx0213 eucjisx0213 euc_jp eucjp, ujis, u_jis euc_kr euckr, korean, ksc5601, ks_c_5601, ks_c_5601_1987, ksx1001, ks_x_1001, x_mac_korean gb18030 gb18030_2000 gb2312 chinese, csiso58gb231280, euc_cn, euccn, eucgb2312_cn, gb2312_1980, gb2312_80, iso_ir_58, x_mac_simp_chinese gbk 936, cp936, ms936 hp_roman8 roman8, r8, csHPRoman8 hz hzgb, hz_gb, hz_gb_2312 iso2022_jp csiso2022jp, iso2022jp, iso_2022_jp iso2022_jp_1 iso2022jp_1, iso_2022_jp_1 iso2022_jp_2 iso2022jp_2, iso_2022_jp_2 iso2022_jp_3 iso2022jp_3, iso_2022_jp_3 iso2022_jp_ext iso2022jp_ext, iso_2022_jp_ext iso2022_kr csiso2022kr, iso2022kr, iso_2022_kr iso8859_10 csisolatin6, iso_8859_10, iso_8859_10_1992, iso_ir_157, l6, latin6 iso8859_11 thai, iso_8859_11, iso_8859_11_2001 iso8859_13 iso_8859_13, l7, latin7 iso8859_14 iso_8859_14, iso_8859_14_1998, iso_celtic, iso_ir_199, l8, latin8 iso8859_15 iso_8859_15, l9, latin9 iso8859_16 iso_8859_16, iso_8859_16_2001, iso_ir_226, l10, latin10 iso8859_2 csisolatin2, iso_8859_2, iso_8859_2_1987, iso_ir_101, l2, latin2 iso8859_3 csisolatin3, iso_8859_3, iso_8859_3_1988, iso_ir_109, l3, latin3 iso8859_4 csisolatin4, iso_8859_4, iso_8859_4_1988, iso_ir_110, l4, latin4 iso8859_5 csisolatincyrillic, cyrillic, iso_8859_5, iso_8859_5_1988, iso_ir_144 iso8859_6 arabic, asmo_708, csisolatinarabic, ecma_114, iso_8859_6, iso_8859_6_1987, iso_ir_127 iso8859_7 csisolatingreek, ecma_118, elot_928, greek, greek8, iso_8859_7, iso_8859_7_1987, iso_ir_126 iso8859_8 csisolatinhebrew, hebrew, iso_8859_8, iso_8859_8_1988, iso_ir_138 iso8859_9 csisolatin5, iso_8859_9, iso_8859_9_1989, iso_ir_148, l5, latin5 iso2022_jp_2004 iso_2022_jp_2004, iso2022jp_2004 johab cp1361, ms1361 koi8_r cskoi8r kz1048 kz_1048, rk1048, strk1048_2002 latin_1 8859, cp819, csisolatin1, ibm819, iso8859, iso8859_1, iso_8859_1, iso_8859_1_1987, iso_ir_100, l1, latin, latin1 mac_cyrillic maccyrillic mac_greek macgreek mac_iceland maciceland mac_latin2 maccentraleurope, maclatin2 mac_roman macintosh, macroman mac_turkish macturkish ptcp154 csptcp154, pt154, cp154, cyrillic_asian shift_jis csshiftjis, shiftjis, sjis, s_jis, x_mac_japanese shift_jis_2004 shiftjis2004, sjis_2004, s_jis_2004 shift_jisx0213 shiftjisx0213, sjisx0213, s_jisx0213 tis_620 tis620, tis_620_0, tis_620_2529_0, tis_620_2529_1, iso_ir_166 utf_16 u16, utf16 utf_16_be unicodebigunmarked, utf_16be utf_16_le unicodelittleunmarked, utf_16le utf_32 u32, utf32 utf_32_be utf_32be utf_32_le utf_32le utf_8 u8, utf, utf8, utf8_ucs2, utf8_ucs4 (+utf_8_sig) utf_7* u7, unicode-1-1-utf-7 cp720 N.A. cp737 N.A. cp856 N.A. cp874 N.A. cp875 N.A. cp1006 N.A. koi8_r N.A. koi8_t N.A. koi8_u N.A. =============== =============================================================================================================================== *: Only if a SIG/mark is found. ------------------- Supported Languages ------------------- Those language can be detected inside your content. All of these are specified in ./charset_normalizer/assets/__init__.py . | English, | German, | French, | Dutch, | Italian, | Polish, | Spanish, | Russian, | Japanese, | Portuguese, | Swedish, | Chinese, | Ukrainian, | Norwegian, | Finnish, | Vietnamese, | Czech, | Hungarian, | Korean, | Indonesian, | Turkish, | Romanian, | Farsi, | Arabic, | Danish, | Serbian, | Lithuanian, | Slovene, | Slovak, | Malay, | Hebrew, | Bulgarian, | Croatian, | Hindi, | Estonian, | Thai, | Greek, | Tamil. ---------------------------- Incomplete Sequence / Stream ---------------------------- It is not (yet) officially supported. If you feed an incomplete byte sequence (eg. truncated multi-byte sequence) the detector will most likely fail to return a proper result. If you are purposely feeding part of your payload for performance concerns, you may stop doing it as this package is fairly optimized. We are working on a dedicated way to handle streams.