Python convert html to text and mimic formatting -


i'm learning beautifulsoup, , found many "html2text" solutions, 1 i'm looking should mimic formatting:

<ul> <li>one</li> <li>two</li> </ul> 

would become

* 1 * 2 

and

some text <blockquote> more magnificent text here </blockquote> final text 

to

some text      more magnificent text here  final text 

i'm reading docs, i'm not seeing straight forward. help? i'm open using other beautifulsoup.

take @ aaron swartz's html2text script (can installed pip install html2text). note output valid markdown. if reason doesn't suit you, rather trivial tweaks should exact output in question:

in [1]: import html2text  in [2]: h1 = """<ul>    ...: <li>one</li>    ...: <li>two</li>    ...: </ul>"""  in [3]: print html2text.html2text(h1)   * 1   * 2  in [4]: h2 = """<p>some text    ...: <blockquote>    ...: more magnificent text here    ...: </blockquote>    ...: final text</p>"""  in [5]: print html2text.html2text(h2) text  > more magnificent text here  final text 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -