Handling Result¶
When initiating search upon a buffer, bytes or file you can assign the return value and fully exploit it.
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030') # Assign return value so we can fully exploit result result = from_bytes( my_byte_str ).best() print(result.encoding) # gb18030
Using CharsetMatch¶
Here, result
is a CharsetMatch
object or None
.
-
class
charset_normalizer.
CharsetMatch
(payload: bytes, guessed_encoding: str, mean_mess_ratio: float, has_sig_or_bom: bool, languages: List[Tuple[str, float]], decoded_payload: Optional[str] = None)[source]¶ -
best
() → charset_normalizer.models.CharsetMatch[source]¶ Kept for BC reasons. Will be removed in 3.0.
-
property
chaos_secondary_pass
¶ Check once again chaos in decoded text, except this time, with full content. Use with caution, this can be very slow. Notice: Will be removed in 3.0
-
property
coherence_non_latin
¶ Coherence ratio on the first non-latin language detected if ANY. Notice: Will be removed in 3.0
-
property
could_be_from_charset
¶ The complete list of encoding that output the exact SAME str result and therefore could be the originating encoding. This list does include the encoding available in property ‘encoding’.
-
property
encoding_aliases
¶ Encoding name are known by many name, using this could help when searching for IBM855 when it’s listed as CP855.
-
property
fingerprint
¶ Retrieve the unique SHA256 computed using the transformed (re-encoded) payload. Not the original one.
-
first
() → charset_normalizer.models.CharsetMatch[source]¶ Kept for BC reasons. Will be removed in 3.0.
-
property
language
¶ Most probable language found in decoded sequence. If none were detected or inferred, the property will return “Unknown”.
-
property
languages
¶ Return the complete list of possible languages found in decoded sequence. Usually not really useful. Returned list may be empty even if ‘language’ property return something != ‘Unknown’.
-
output
(encoding: str = 'utf_8') → bytes[source]¶ Method to get re-encoded bytes payload using given target encoding. Default to UTF-8. Any errors will be simply ignored by the encoder NOT replaced.
-
property
raw
¶ Original untouched bytes.
-
property
w_counter
¶ Word counter instance on decoded text. Notice: Will be removed in 3.0
-