Installation¶
This installs a package that can be used from Python (import charset_normalizer
).
To install for all users on the system, administrator rights (root) may be required.
Using PIP¶
Charset Normalizer can be installed from pip:
pip install charset-normalizer
You may retrieve the latest unicodedata backport as follow:
pip install charset-normalizer[unicode_backport]
From git via master¶
You can install from dev-master branch using git:
git clone https://github.com/Ousret/charset_normalizer.git
cd charset_normalizer/
python setup.py install
Basic Usage¶
The new way¶
You may want to get right to it.
from charset_normalizer import from_bytes, from_path
# This is going to print out your sequence once properly decoded
print(
str(
from_bytes(
my_byte_str
).best()
)
)
# You could also want the same from a file
print(
str(
from_path(
'./data/sample.1.ar.srt'
).best()
)
)
Backward compatibility¶
If you were used to python chardet, we are providing the very same detect()
method as chardet.
This function is mostly backward-compatible with Chardet. The migration should be painless.
from charset_normalizer import detect # This will behave exactly the same as python chardet result = detect(my_byte_str) if result['encoding'] is not None: print('got', result['encoding'], 'as detected encoding')
You may upgrade your code with ease.
CTRL + R from chardet import detect
to from charset_normalizer import detect
.