jquery-like HTML parsing in Python?

Is there any Python library that allows me to parse an HTML document similar to what jQuery does?

i.e. I’d like to be able to use CSS selector syntax to grab an arbitrary set of nodes from the document, read their content/attributes, etc.

The only Python HTML parsing lib I’ve used before was BeautifulSoup, and even though it’s fine I keep thinking it would be faster to do my parsing if I had jQuery syntax available. :D

Posted by Roy Tang under notes at 16 Jun 2010 7:12am #jquery #python #questions #stackoverflow #software development
Also on: stackexchange / 3

Comments

Ignacio Vazquez-Abrams

2010-06-16 07:14:40

The lxml library supports CSS selectors.

systempuntoout

2010-06-16 07:32:01

If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
Soupselect is a CSS selector extension for BeautifulSoup.

Usage:

>>> from BeautifulSoup import BeautifulSoup as Soup
>>> from soupselect import select
>>> import urllib
>>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
>>> select(soup, 'div.title h3')
[<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
 <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
..]

Luke Stanley

2011-05-10 23:19:20

Consider PyQuery:

http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url='http://google.com/')
>>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())
>>> d = pq(filename=path_to_html_file)
>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> p.html()
'Hello world !'
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> p.html()
u'you know <a href="http://python.org/">Python</a> rocks'
>>> p.text()
'you know Python rocks'

If you enjoyed my content for some reason, I'd love to hear from you! Here are some options:

You can buy me a coffee!
You can write a reply on your own site and submit the URL as a webmention via the form below.
Or you can just contact me!