Handling Result¶
When initiating search upon a buffer, bytes or file you can assign the return value and fully exploit it.
my_byte_str = 'Bсеки човек има право на образование.'.encode('cp1251') # Assign return value so we can fully exploit result result = from_bytes( my_byte_str ).best() print(result.encoding) # cp1251
Using CharsetMatch¶
Here, result
is a CharsetMatch
object or None
.
- class charset_normalizer.CharsetMatch(payload: bytes, guessed_encoding: str, mean_mess_ratio: float, has_sig_or_bom: bool, languages: List[Tuple[str, float]], decoded_payload: str | None = None, preemptive_declaration: str | None = None)[source]¶
- property could_be_from_charset: List[str]¶
The complete list of encoding that output the exact SAME str result and therefore could be the originating encoding. This list does include the encoding available in property ‘encoding’.
- property encoding_aliases: List[str]¶
Encoding name are known by many name, using this could help when searching for IBM855 when it’s listed as CP855.
- property fingerprint: str¶
Retrieve the unique SHA256 computed using the transformed (re-encoded) payload. Not the original one.
- property language: str¶
Most probable language found in decoded sequence. If none were detected or inferred, the property will return “Unknown”.
- property languages: List[str]¶
Return the complete list of possible languages found in decoded sequence. Usually not really useful. Returned list may be empty even if ‘language’ property return something != ‘Unknown’.
- output(encoding: str = 'utf_8') bytes [source]¶
Method to get re-encoded bytes payload using given target encoding. Default to UTF-8. Any errors will be simply ignored by the encoder NOT replaced.
- property raw: bytes¶
Original untouched bytes.