Handling Result#
When initiating search upon a buffer, bytes or file you can assign the return value and fully exploit it.
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030') # Assign return value so we can fully exploit result result = from_bytes( my_byte_str ).best() print(result.encoding) # gb18030
Using CharsetMatch#
Here, result
is a CharsetMatch
object or None
.
- class charset_normalizer.CharsetMatch(payload: bytes, guessed_encoding: str, mean_mess_ratio: float, has_sig_or_bom: bool, languages: List[Tuple[str, float]], decoded_payload: str | None = None)[source]#
- best() CharsetMatch [source]#
Kept for BC reasons. Will be removed in 3.0.
- property chaos_secondary_pass: float#
Check once again chaos in decoded text, except this time, with full content. Use with caution, this can be very slow. Notice: Will be removed in 3.0
- property coherence_non_latin: float#
Coherence ratio on the first non-latin language detected if ANY. Notice: Will be removed in 3.0
- property could_be_from_charset: List[str]#
The complete list of encoding that output the exact SAME str result and therefore could be the originating encoding. This list does include the encoding available in property ‘encoding’.
- property encoding_aliases: List[str]#
Encoding name are known by many name, using this could help when searching for IBM855 when it’s listed as CP855.
- property fingerprint: str#
Retrieve the unique SHA256 computed using the transformed (re-encoded) payload. Not the original one.
- first() CharsetMatch [source]#
Kept for BC reasons. Will be removed in 3.0.
- property language: str#
Most probable language found in decoded sequence. If none were detected or inferred, the property will return “Unknown”.
- property languages: List[str]#
Return the complete list of possible languages found in decoded sequence. Usually not really useful. Returned list may be empty even if ‘language’ property return something != ‘Unknown’.
- output(encoding: str = 'utf_8') bytes [source]#
Method to get re-encoded bytes payload using given target encoding. Default to UTF-8. Any errors will be simply ignored by the encoder NOT replaced.
- property raw: bytes#
Original untouched bytes.
- property w_counter: Counter#
Word counter instance on decoded text. Notice: Will be removed in 3.0