Ticket #24 (closed defect: fixed)
Replace BeautifulSoup
| Reported by: | mitsuhiko | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | Zine 0.1 |
| Component: | general | Version: | |
| Keywords: | Cc: |
Description
We have to find a replacement for beautiful soup. It should provide the same semantics as the current version, or at least similar semantics.
There are three parsers that deal with HTML. The plain HTML parser that just parses stuff into good locking fragments, the simplehtml parser that additionally escapes <pre> blocks and then the autop parser that adds paragraphs automatically like Wordpress does.
All that functionality should still be provided after switching away from beautiful soup.
The reason why BS is problematic: the navigable string is a memory leak, it monkeypatches sgmllib, has a pretty weird sourcecode and there are some issues with the way it handles tag soup.
Change History
Note: See
TracTickets for help on using
tickets.
Any idea what you want to replace it with?