Handling Result¶

When initiating search upon a buffer, bytes or file you can assign the return value and fully exploit it.

my_byte_str = 'Bсеки човек има право на образование.'.encode('cp1251')

# Assign return value so we can fully exploit result
result = from_bytes(
    my_byte_str
).best()

print(result.encoding)  # cp1251

Using CharsetMatch¶

Here, result is a CharsetMatch object or None.

class charset_normalizer.CharsetMatch(payload: bytes | bytearray, guessed_encoding: str, mean_mess_ratio: float, has_sig_or_bom: bool, languages: List[Tuple[str, float]], decoded_payload: str | None = None, preemptive_declaration: str | None = None)[source]¶

property could_be_from_charset: list[str]¶: The complete list of encoding that output the exact SAME str result and therefore could be the originating encoding. This list does include the encoding available in property ‘encoding’.

property encoding_aliases: list[str]¶: Encoding name are known by many name, using this could help when searching for IBM855 when it’s listed as CP855.

property fingerprint: int¶: Retrieve a hash fingerprint of the decoded payload, used for deduplication.

property language: str¶: Most probable language found in decoded sequence. If none were detected or inferred, the property will return “Unknown”.

property languages: list[str]¶: Return the complete list of possible languages found in decoded sequence. Usually not really useful. Returned list may be empty even if ‘language’ property return something != ‘Unknown’.

output(encoding: str = 'utf_8') → bytes[source]¶: Method to get re-encoded bytes payload using given target encoding. Default to UTF-8. Any errors will be simply ignored by the encoder NOT replaced.

property raw: bytes | bytearray¶: Original untouched bytes.