관리-도구
편집 파일: universaldetector.cpython-39.opt-1.pyc
a ������=b�0����������������������@���s����d�Z�ddlZddlZddlZddlmZ�ddlmZmZm Z �ddl mZ�ddlm Z �ddlmZ�dd lmZ�G�d d��de�ZdS�)a�� Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco �����N����)�CharSetGroupProber)� InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc���������������� ���@���sn���e�Zd�ZdZdZe�d�Ze�d�Ze�d�Z dddd d ddd d�Z ejfdd�Z dd��Zdd��Zdd��ZdS�)�UniversalDetectoraq�� The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s���[�-�]s���(|~{)s���[�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8z iso-8859-9ziso-8859-13c�����������������C���sN���d�|�_�g�|�_d�|�_d�|�_d�|�_d�|�_d�|�_||�_t� t �|�_d�|�_|�� ���d�S�)N)�_esc_charset_prober�_charset_probers�result�done� _got_data�_input_state� _last_char�lang_filter�loggingZ getLogger�__name__�logger�_has_win_bytes�reset)�selfr�����r����=/usr/lib/python3.9/site-packages/chardet/universaldetector.py�__init__Q���s����zUniversalDetector.__init__c�����������������C���sV���dddd�|�_�d|�_d|�_d|�_tj|�_d|�_|�jr>|�j� ���|�j D�]}|� ���qDdS�)z� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. N�����������encoding� confidence�languageF�����)r���r���r���r���r���� PURE_ASCIIr���r���r���r���r ���)r����proberr���r���r���r���^���s���� zUniversalDetector.resetc�����������������C���s>��|�j�r dS�t|�sdS�t|t�s(t|�}|�js�|�tj�rJdddd�|�_nv|�tj tj f�rldddd�|�_nT|�d�r�dddd�|�_n:|�d �r�d ddd�|�_n |�tjtjf�r�dddd�|�_d|�_|�jd �dur�d|�_�dS�|�j tjk�r.|�j�|��rtj|�_ n*|�j tjk�r.|�j�|�j|���r.tj|�_ |dd��|�_|�j tjk�r�|�j�s^t|�j�|�_|�j�|�tjk�r:|�jj|�j���|�jjd�|�_d|�_�n�|�j tjk�r:|�j�s�t |�j�g|�_|�jt!j"@��r�|�j�#t$����|�j�#t%����|�jD�]:}|�|�tjk�r�|j|���|jd�|�_d|�_���q&�q�|�j&�|��r:d|�_'dS�)a��� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG��������?��r���zUTF-32s�������zX-ISO-10646-UCS-4-3412s�������zX-ISO-10646-UCS-4-2143zUTF-16Tr������)(r����len� isinstance� bytearrayr���� startswith�codecs�BOM_UTF8r����BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr���r���r#����HIGH_BYTE_DETECTOR�search� HIGH_BYTE�ESC_DETECTORr���Z ESC_ASCIIr���r���r����feedr���ZFOUND_IT�charset_name�get_confidencer!���r ���r ���r���ZNON_CJK�appendr ���r����WIN_BYTE_DETECTORr���)r���Zbyte_strr$���r���r���r���r6���o���s����� � �� � � � �� � zUniversalDetector.feedc����������� ��� ���C���st��|�j�r|�jS�d|�_�|�js&|�j�d��n�|�jtjkrBdddd�|�_n�|�jtjkr�d}d}d}|�j D�]"}|sjq`|� ��}||kr`|}|}q`|r�||�jkr�|j}|j� ��}|� ��}|�d �r�|�jr�|�j�||�}|||jd�|�_|�j���tjk�rn|�jd �du��rn|�j�d��|�j D�]`}|�s�qt|t��rP|jD�] }|�j�d|j|j|� �����q,n|�j�d|j|j|� �����q|�jS�) z� Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!�asciir%���r&���r���Nr���ziso-8859r���z no probers hit minimum thresholdz%s %s confidence = %s)r���r���r���r����debugr���r���r#���r4���r ���r8����MINIMUM_THRESHOLDr7����lowerr+���r����ISO_WIN_MAP�getr!���ZgetEffectiveLevelr����DEBUGr)���r���Zprobers) r���Zprober_confidenceZmax_prober_confidenceZ max_proberr$���r7���Zlower_charset_namer ���Zgroup_proberr���r���r����close����sj���� � �� � �zUniversalDetector.closeN)r���� __module__�__qualname__�__doc__r=����re�compiler2���r5���r:���r?���r���ZALLr���r���r6���rB���r���r���r���r���r���3���s$��� � mr���)rE���r,���r���rF���Zcharsetgroupproberr���Zenumsr���r���r���Z escproberr���Zlatin1proberr���Zmbcsgroupproberr ���Zsbcsgroupproberr ����objectr���r���r���r���r����<module>���s���