GoogleScraper - A simple module to scrape and extract links from Google.

What does GoogleScraper?

GoogleScraper parses Google search engine results easily and in a performant way. It allows you to extract all found links/link titles/ link descriptions and the total results for you query problematically and your application can do whatever it want with them (Probably some SEO related research)

There are unlimited use cases:

  • Quickly harvest masses of google dorks.
  • Use it as a SEO tool.
  • Discover trends.
  • Compile lists of sites to feed your own database.
  • Many more use cases...

GoogleScraper is implemented with the following techniques/software:

  • Written in Python 3.3
  • Uses multihreading/asynchronous IO (Uses twisted).
  • Supports parallel google scraping with multiple IP addresses.
  • Provides proxy support using socksipy:
    • Socks5
    • Socks4
    • HttpProxy
  • Support for additional google search futures.
  • Includes exhaustive research of similar projects!

Example Usage

import GoogleScraper
import urllib.parse

if __name__ == '__main__':

    results = GoogleScraper.scrape('HOly shit', number_pages=1)
    for link_title, link_snippet, link_url in results['results']:
        # You can access all parts of the search results like that
        # link_url.scheme => URL scheme specifier (Ex: 'http')
        # link_url.netloc => Network location part (Ex: '')
        # link_url.path => URL scheme specifier (Ex: ''help/Python.html'')
        # link_url.params => Parameters for last path element
        # link_url.query => Query component
            print(urllib.parse.unquote(link_url.geturl())) # This reassembles the parts of the url to the whole thing

# How many urls did we get?

# How many hits has google found with our keyword?

Example Output

Direct command line usage

In case you want to use as a CLI tool, use it somehow like this:

python -p 1 -n 25 -q 'inurl:".php?id=555"'

But be aware that google might recognize you pretty fast as a abuser if you use such google dorks.

Maybe try a socks proxy then (But don't bet on TOR) [This is just a example, this socks will probably not work anymore when you are here]

python -p 1 -n 25 -q 'i hate google' --proxy=""


If you feel like contacting me, do so and send me a mail. You can find my contact information on my blog.

To-do list (As of 25.12.2013)

  • Figure out whether to use threads or asynchronous I/O for multiple connections.
  • Determine if is is possible to use one google search session with multiple connections that are independent of each other (They have different IP's)

Stable version

This is a development repository. But you can always find a working script here.