1.1.1   2007-10-26
 * Handle non-space characters inside tags the same as spaces.
   (Previously, would fail on <a\thref=foo>).

1.1.0   2005-09-15
 * Short version: Got rid of some exceptions that occurred on
   malformed input, and improved handling of malformed input cases.

 * Long version: (skip if you don't care about malformed input).

   Performed round-trip parsing (urlextract => urljoin, tagextract =>
   tagjoin) of some Fortune 500 Web pages, and roughly 6000 pages from
   a "free for all links" site.  Malformed HTML would previously cause
   exceptions to be raised, but this is undesirable, since we really
   want to act like a browser, and take our "best guess" at parsing
   the HTML in malformed and ambiguous cases.

   Got rid of the exceptions.  There are new algorithms intended for
   recovering from malformed quotes and ">" within a quoted value.  I
   looked in detail at five of the sites that were previously raising
   errors.  The algorithms seem to be working (i.e. document parsing
   continues, similarly to what a human might do, instead of
   considering the entire rest of the document as plaintext).

   I plan to build a catalogue of the malformed documents and
   heuristically tweak the algorithm to "do well" on them.  In the
   end, there will probably be two sets of unit tests, one for correct
   (or close enough to be unambiguous) documents, and one for
   malformed documents.  Of course, the former set of unit tests must
   always pass.

   Feel free to report either kind of bug (the correctness bugs, or
   the malformed input pseudobugs).  I guess I could modify this
   module to use a real parser and the Mozilla grammar DTDs, but
   that's a lot of work, and thus it only seems worthwhile if *this*
   code is plagued with bugs...

   - Connelly Barnes

1.0.9   2005-09-15
 * Better mime type handling for urlextract().
 * Duck typing, so string-like objects can be passed in.
 * Naive Unicode tests.
 * The function tagjoin() handles HTML attribute values with single,
   or double quotes, but not both (if both, then an error is raised).
   - Connelly Barnes

1.0.8   2005-07-14
 * Fixed parsing of single quoted attribute values in HTML
   (eg <a href='url'>).
   - Connelly Barnes

1.0.7   2005-04-26
 * Fixed bug where duplicate matches would be returned in
   urlextract().  This would cause urljoin() to fail.
   - Connelly Barnes

1.0.6   2005-02-06
 * urlextract() finds URLs inside style="..." tag attributes.
   - Connelly Barnes

1.0.5   2005-02-06
 * Correctly parses tags like <a href="a"target="_top">
 * urlextract() handles @import statements in CSS.
   - Connelly Barnes

1.0.4   2004-12-10
 * Python 2.0-2.4 compatibility.
   - Connelly Barnes

1.0.3   2004-12-10
 * Python 2.2 compatibility.
 * Fixed XHTML parsing (which didn't work correctly with
   <!DOCTYPE...> and <?xml...?> directives).
 * Added rules for XML directives.
 * Changed comments so that a comment <!-- comment -->
   becomes [('!-- comment --', {})] after being parsed
   by tagextract(), instead of
   [('!--', {}), ' comment ', ('--', {})].
   - Connelly Barnes

1.0.2   2004-10-07
 * Stopped parsing HTML tags by accident inside comments.
 * Fixed HTML decoding %ff bug.
 * Fixed HTML dropping characters inside comment.
 * Changed interface for HTML <-> Data structure
   (use tagextract() and tagjoin() now).
 * Added URL extraction and modification functions.
   (urlextract() and urljoin()).
   - Connelly Barnes

1.0.1   2004-10-01
 * Fixed bug for parsing tag inside comment.
   - Connelly Barnes

1.0.0   2004-09-30
 * Initial release
   - Connelly Barnes