Python and Screen Scraping
I wanted to do a quick test on a website using Python. I knew about beautifulsoup but I wanted the power of JQuery. So I found pyquery.
I found some instructions to get started and noticed some people complaining about how difficult it is to get installed. Hmm I wonder why?
It only needs a dependency to lxml
which has a dependency to easy_install which needs setuptools. Oh, that’s why people complain. Oh well let’s try.
- I downloaded Download the setuptools-0.6c11-py2.6.egg because my version of Python is 2.6
- Run setuptools as if it were a shell script. Apparently this installs easy_install
- Now I can install lxml with easy_install
$ sudo easy_install lxml
This failed for me. So I’m going to make sure I have libxml2
and libxslt. First I install libxml2
-dev $ sudo apt-get install libxml2-dev
First I need to build libxml2
wget ftp://xmlsoft.org/libxml2/libxml2-sources-2.7.6.tar.gz
tar -xvsf libxml2-sources-2.7.6.tar.gz
cd libxml2-2.7.6/
./configure --prefix=/usr/local/libxml2
make
sudo make install
Next build libxslt
wget ftp://xmlsoft.org/libxslt/libxslt-1.1.26.tar.gz
tar -xvzf libxslt-1.1.26.tar.gz
cd libxslt-1.1.26
./configure --prefix=/usr/local/libxslt --with-libxml-prefix=/usr/local/libxml2/
make
sudo make install
Still errors… I’m stopping