관리-도구
편집 파일: html5parser.cpython-39.opt-1.pyc
a ������a�!����������������������@���sh��d�Z�ddlZddlZddlmZ�ddlmZ�ddlm Z �ddl mZmZm Z �zeZW�n�eyn���eefZY�n0�zddlmZ�W�n�ey����ddlmZ�Y�n0�zddlmZ�W�n�ey����ddlmZ�Y�n0�G�d d ��d e�ZzddlmZ�W�n�e�y���Y�n0�G�dd ��d e�Ze��Zdd��Zddd�Zddd�Zddd�Zd dd�Z d!dd�Z!dd��Z"e��Z#dS�)"z? An interface to html5lib that mimics the lxml.html interface. �����N)� HTMLParser)�TreeBuilder)�etree)�Element�XHTML_NAMESPACE�_contains_block_level_tag)�urlopen)�urlparsec�������������������@���s���e�Zd�ZdZddd�ZdS�)r���z*An html5lib HTML parser with lxml as tree.Fc�����������������K���s���t�j|�f|td�|���d�S��N)�strict�tree)�_HTMLParser�__init__r�����selfr����kwargs��r����;/usr/lib64/python3.9/site-packages/lxml/html/html5parser.pyr������s����zHTMLParser.__init__N)F��__name__� __module__�__qualname__�__doc__r���r���r���r���r���r������s���r���)�XHTMLParserc�������������������@���s���e�Zd�ZdZddd�ZdS�)r���z+An html5lib XHTML Parser with lxml as tree.Fc�����������������K���s���t�j|�f|td�|���d�S�r ���)�_XHTMLParserr���r���r���r���r���r���r���*���s����zXHTMLParser.__init__N)Fr���r���r���r���r���r���'���s���r���c�����������������C���s(���|���|�}|d�ur|S�|���dt|f��S�)Nz{%s}%s)�findr���)r����tag�elemr���r���r���� _find_tag0���s���� r���c�����������������C���s^���t�|�t�std��|du�rt}i�}|du�r8t�|�t�r8d}|durH||d<�|j|�fi�|�����S�)z� Parse a whole document into a string. If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. �string requiredNT� useChardet)� isinstance�_strings� TypeError�html_parser�bytes�parseZgetroot)�html� guess_charset�parser�optionsr���r���r����document_fromstring7���s���� r+���Fc�����������������C���s����t�|�t�std��|du�rt}i�}|du�r8t�|�t�r8d}|durH||d<�|j|�dfi�|��}|r�t�|d�t�r�|r�|d����r�t�d|d����|d=�|S�)a`��Parses several HTML elements, returning a list of elements. The first item in the list may be a string. If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r���NFr ����divr���zThere is leading text: %r) r!���r"���r#���r$���r%���Z parseFragment�stripr����ParserError)r'����no_leading_textr(���r)���r*���Zchildrenr���r���r����fragments_fromstringO���s$���� �r0���c�����������������C���s����t�|�t�std��t|�}t|�|||�d�}|rvt�|t�s>d}t|�}|rrt�|d�t�rh|d�|_|d=�|�|��|S�|s�t� d��t |�dkr�t� d��|d�}|jr�|j���r�t� d|j���d |_|S�) a���Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element. If 'create_parent' is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element. In this case, leading or trailing text is allowed. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r���)r(���r)���r/���r,���r���zNo elements found����zMultiple elements foundzElement followed by text: %rN) r!���r"���r#����boolr0���r����text�extendr���r.����len�tailr-���)r'���Z create_parentr(���r)���Zaccept_leading_text�elementsZnew_root�resultr���r���r����fragment_fromstringq���s4���� � r9���c�����������������C���s����t�|�t�std��t|�||d�}|�dd��}t�|t�rB|�dd�}|������}|�d�sb|�d�rf|S�t |d �}t |�r||S�t |d �}t |�dkr�|jr�|j���s�|d�j r�|d�j ���s�|d �S�t|�r�d|_nd|_|S�)a���Parse the html, returning a single element/document. This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document. 'base_url' will set the document's base_url attribute (and the tree's docinfo.URL) If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. r���)r)���r(���N�2����ascii�replacez<htmlz <!doctype�head�bodyr1������r���r,����span)r!���r"���r#���r+���r%����decode�lstrip�lower� startswithr���r5���r3���r-���r6���r���r���)r'���r(���r)����doc�startr=���r>���r���r���r���� fromstring����s2���� � ��rG���c�����������������C���s~���|du�rt�}t|�t�s(|�}|du�r\d}n4t|��rFt|��}|du�r\d}nt|�d�}|du�r\d}i�}|rl||d<�|j|fi�|��S�)a*��Parse a filename, URL, or file-like object into an HTML document tree. Note: this returns a tree, not an element. Use ``parse(...).getroot()`` to get the document root. If ``guess_charset`` is true, the ``useChardet`` option is passed into html5lib to enable character detection. This option is on by default when parsing from URLs, off by default when parsing from file(-like) objects (which tend to return Unicode more often than not), and on by default when parsing from a file path (which is read in binary mode). NFT�rbr ���)r$���r!���r"����_looks_like_urlr����openr&���)Zfilename_url_or_filer(���r)����fpr*���r���r���r���r&�������s"���� r&���c�����������������C���s@���t�|��d�}|sdS�tjdkr8|tjv�r8t|�dkr8dS�dS�d�S�)Nr���F�win32r1���T)r ����sys�platform�string� ascii_lettersr5���)�str�schemer���r���r���rI�������s���� � �rI���)NN)FNN)FNN)NN)NN)$r���rM���rO���Zhtml5libr���r ���Z html5lib.treebuilders.etree_lxmlr���Zlxmlr���Z lxml.htmlr���r���r���Z basestringr"���� NameErrorr%���rQ���Zurllib2r����ImportErrorZurllib.requestr ����urllib.parser���r���Zxhtml_parserr���r+���r0���r9���rG���r&���rI���r$���r���r���r���r����<module>���sJ��� ��� "��� , 6 $