Module htmldata :: Class URLMatch
[show private | hide private]
[frames | no frames]

Class URLMatch


A matched URL inside an HTML document or stylesheet.

A list of URLMatch objects is returned by urlextract.
Method Summary
  __init__(self, doc, start, end, siteurl, in_html, in_css, tag_attr, tag_attrs, tag_index, tag_name)
Create a URLMatch object.

Instance Variable Summary
  end: End character index.
  in_css: True if URL occurs within a stylesheet.
  in_html: True if URL occurs within an HTML tag.
  start: Starting character index.
  tag_attr: Specific tag attribute in which URL occurs.
  tag_attrs: Dictionary of all tag attributes and values.
  tag_index: Index of the tag in the list that would be generated by a call to tagextract.
  tag_name: HTML tag name in which URL occurs.
  url: URL extracted.

Method Details

__init__(self, doc, start, end, siteurl, in_html, in_css, tag_attr=None, tag_attrs=None, tag_index=None, tag_name=None)
(Constructor)

Create a URLMatch object.

Instance Variable Details

end

End character index.

in_css

True if URL occurs within a stylesheet.

in_html

True if URL occurs within an HTML tag.

start

Starting character index.

tag_attr

Specific tag attribute in which URL occurs.

Example: 'href'. None if the URL does not occur within an HTML tag.

tag_attrs

Dictionary of all tag attributes and values.

Example: {'src':'http://X','alt':'Img'}. None if the URL does not occur within an HTML tag.

tag_index

Index of the tag in the list that would be generated by a call to tagextract.

tag_name

HTML tag name in which URL occurs.

Example: 'img'. None if the URL does not occur within an HTML tag.

url

URL extracted.