UPDATE on 18th February 2014:
This python module has now its own github repository!
The plugin can extract
- All links
- Link titles
- The description/caption below the links
and has the following features:
- Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY
- Multithreading
- XPATH parsing
- Supports almost all search parameters
Please note that this is by no means a permanent version! Heavy structural changes will be implemented in the near future (I'll experiment with asynchronous networking for instance). But on this site, I will always host a working version with instructions how to use it, such that visitors can always use the script!
1. Edit (07.01.2013):
- Using requests instead of urllib
- Added random User Agents for every new search.
- Cleaned the code
- Implemented foundation to combine with proxychains
Original Blog Post
Sample output after searching for 'cats are not cute' (sorry) with 100 results per page on 3 ascending pages: results.txt
I always was in need of a fast and reliable working python module to query the google search engine. The google API is rubbish, because they just give you maximally 36 results. This is completly inacceptable!
So, I looked further and found http://code.google …
Continue reading