관리-도구

편집 파일: html5parser.cpython-39.opt-1.pyc

������a�!����������������������@���sh��d�Z�ddlZddlZddlmZ�ddlmZ�ddlm	Z	�ddl
mZmZm
Z
�zeZW�n�eyn���eefZY�n0�zddlmZ�W�n�ey����ddlmZ�Y�n0�zddlmZ�W�n�ey����ddlmZ�Y�n0�G�d	d
��d
e�ZzddlmZ�W�n�e�y���Y�n0�G�dd
��d
e�Ze��Zdd��Zddd�Zddd�Zddd�Zd dd�Z d!dd�Z!dd��Z"e��Z#dS�)"z?
An interface to html5lib that mimics the lxml.html interface.
�����N)�
HTMLParser)�TreeBuilder)�etree)�Element�XHTML_NAMESPACE�_contains_block_level_tag)�urlopen)�urlparsec�������������������@���s���e�Zd�ZdZddd�ZdS�)r���z*An html5lib HTML parser with lxml as tree.Fc�����������������K���s���t�j|�f|td�|���d�S��N)�strict�tree)�_HTMLParser�__init__r�����selfr����kwargs��r����;/usr/lib64/python3.9/site-packages/lxml/html/html5parser.pyr������s����zHTMLParser.__init__N)F��__name__�
__module__�__qualname__�__doc__r���r���r���r���r���r������s���r���)�XHTMLParserc�������������������@���s���e�Zd�ZdZddd�ZdS�)r���z+An html5lib XHTML Parser with lxml as tree.Fc�����������������K���s���t�j|�f|td�|���d�S�r
���)�_XHTMLParserr���r���r���r���r���r���r���*���s����zXHTMLParser.__init__N)Fr���r���r���r���r���r���'���s���r���c�����������������C���s(���|���|�}|d�ur|S�|���dt|f��S�)Nz{%s}%s)�findr���)r����tag�elemr���r���r����	_find_tag0���s����
r���c�����������������C���s^���t�|�t�std��|du�rt}i�}|du�r8t�|�t�r8d}|durH||d<�|j|�fi�|�����S�)z�
    Parse a whole document into a string.

If `guess_charset` is true, or if the input is not Unicode but a
    byte string, the `chardet` library will perform charset guessing
    on the string.
    �string requiredNT�
useChardet)�
isinstance�_strings�	TypeError�html_parser�bytes�parseZgetroot)�html�
guess_charset�parser�optionsr���r���r����document_fromstring7���s����
r+���Fc�����������������C���s����t�|�t�std��|du�rt}i�}|du�r8t�|�t�r8d}|durH||d<�|j|�dfi�|��}|r�t�|d�t�r�|r�|d����r�t�d|d����|d=�|S�)a`��Parses several HTML elements, returning a list of elements.

The first item in the list may be a string.  If no_leading_text is true,
    then it will be an error if there is leading text, and it will always be
    a list of only elements.

If `guess_charset` is true, the `chardet` library will perform charset
    guessing on the string.
    r���NFr ����divr���zThere is leading text: %r)	r!���r"���r#���r$���r%���Z
parseFragment�stripr����ParserError)r'����no_leading_textr(���r)���r*���Zchildrenr���r���r����fragments_fromstringO���s$����
�r0���c�����������������C���s����t�|�t�std��t|�}t|�|||�d�}|rvt�|t�s>d}t|�}|rrt�|d�t�rh|d�|_|d=�|�|��|S�|s�t�	d��t
|�dkr�t�	d��|d�}|jr�|j���r�t�	d|j���d	|_|S�)
a���Parses a single HTML element; it is an error if there is more than
    one element, or if anything but whitespace precedes or follows the
    element.

If 'create_parent' is true (or is a tag name) then a parent node
    will be created to encapsulate the HTML in a single element.  In
    this case, leading or trailing text is allowed.

If `guess_charset` is true, the `chardet` library will perform charset
    guessing on the string.
    r���)r(���r)���r/���r,���r���zNo elements found����zMultiple elements foundzElement followed by text: %rN)
r!���r"���r#����boolr0���r����text�extendr���r.����len�tailr-���)r'���Z
create_parentr(���r)���Zaccept_leading_text�elementsZnew_root�resultr���r���r����fragment_fromstringq���s4����
�

r9���c�����������������C���s����t�|�t�std��t|�||d�}|�dd��}t�|t�rB|�dd�}|������}|�d�sb|�d�rf|S�t	|d	�}t
|�r||S�t	|d
�}t
|�dkr�|jr�|j���s�|d�j
r�|d�j
���s�|d
�S�t|�r�d|_nd|_|S�)a���Parse the html, returning a single element/document.

This tries to minimally parse the chunk of text, without knowing if it
    is a fragment or a document.

'base_url' will set the document's base_url attribute (and the tree's
    docinfo.URL)

If `guess_charset` is true, or if the input is not Unicode but a
    byte string, the `chardet` library will perform charset guessing
    on the string.
    r���)r)���r(���N�2����ascii�replacez<htmlz	<!doctype�head�bodyr1������r���r,����span)r!���r"���r#���r+���r%����decode�lstrip�lower�
startswithr���r5���r3���r-���r6���r���r���)r'���r(���r)����doc�startr=���r>���r���r���r����
fromstring����s2����
�

��rG���c�����������������C���s~���|du�rt�}t|�t�s(|�}|du�r\d}n4t|��rFt|��}|du�r\d}nt|�d�}|du�r\d}i�}|rl||d<�|j|fi�|��S�)a*��Parse a filename, URL, or file-like object into an HTML document
    tree.  Note: this returns a tree, not an element.  Use
    ``parse(...).getroot()`` to get the document root.

If ``guess_charset`` is true, the ``useChardet`` option is passed into
    html5lib to enable character detection.  This option is on by default
    when parsing from URLs, off by default when parsing from file(-like)
    objects (which tend to return Unicode more often than not), and on by
    default when parsing from a file path (which is read in binary mode).
    NFT�rbr ���)r$���r!���r"����_looks_like_urlr����openr&���)Zfilename_url_or_filer(���r)����fpr*���r���r���r���r&�������s"����

r&���c�����������������C���s@���t�|��d�}|sdS�tjdkr8|tjv�r8t|�dkr8dS�dS�d�S�)Nr���F�win32r1���T)r	����sys�platform�string�
ascii_lettersr5���)�str�schemer���r���r���rI�������s����
�
�rI���)NN)FNN)FNN)NN)NN)$r���rM���rO���Zhtml5libr���r
���Z html5lib.treebuilders.etree_lxmlr���Zlxmlr���Z	lxml.htmlr���r���r���Z
basestringr"����	NameErrorr%���rQ���Zurllib2r����ImportErrorZurllib.requestr	����urllib.parser���r���Zxhtml_parserr���r+���r0���r9���rG���r&���rI���r$���r���r���r���r����<module>���sJ���
���
"���
,
6
$