Python convert html to text and mimic formatting -
i'm learning beautifulsoup, , found many "html2text" solutions, 1 i'm looking should mimic formatting:
<ul> <li>one</li> <li>two</li> </ul>
would become
* 1 * 2
and
some text <blockquote> more magnificent text here </blockquote> final text
to
some text more magnificent text here final text
i'm reading docs, i'm not seeing straight forward. help? i'm open using other beautifulsoup.
take @ aaron swartz's html2text script (can installed pip install html2text
). note output valid markdown. if reason doesn't suit you, rather trivial tweaks should exact output in question:
in [1]: import html2text in [2]: h1 = """<ul> ...: <li>one</li> ...: <li>two</li> ...: </ul>""" in [3]: print html2text.html2text(h1) * 1 * 2 in [4]: h2 = """<p>some text ...: <blockquote> ...: more magnificent text here ...: </blockquote> ...: final text</p>""" in [5]: print html2text.html2text(h2) text > more magnificent text here final text
Comments
Post a Comment