Miscellaneous¶
Convert to str¶
Any CharsetNormalizerMatch
object can be transformed to exploitable str
variable.
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030') # Assign return value so we can fully exploit result result = CnM.from_bytes( my_byte_str ).best().first() # This should print '我没有埋怨,磋砣的只是一些时间。' print(str(result))
Expect UnicodeDecodeError¶
This package also offer you the possibility to reconfigure the way UnicodeDecodeError
is raised.
Charset Normalizer offer the possibility to extend the actual message inside it to provide a clue about what
encoding it should actually be.
import charset_normalizer # Nothing else is needed my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030') my_byte_str.decode('utf_8') # raise UnicodeDecodeError # UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte; you may want to consider gb18030 codec for this sequence. # instead of # UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte
Here, the addition is “you may want to consider gb18030 codec for this sequence.”.
Is does not work when using try
.. except
block.