Scraping and Extracting Links from any major Search Engine like Google, Yandex, Baidu, Bing and Duckduckgo

Posted on Mi 12 November 2014 in Meta • Tagged with Scraping, Baidu, Extracting, Google, Programming, Python, Searchengine, Bing, MetaLeave a comment

Prelude

It's been quite a while since I worked on my projects. But recently I had some motivation and energy left, which is quite nice considering my full time university week and a programming job besides.

I have a little project on GitHub that I worked on every now and again in the last year or so. Recently it got a little bit bigger (I have 115 github stars now, would've never imagined that I ever achieve this) and I receive up to 2 mails with job offers ...

Continue reading

GoogleScraper.py - A simple python module to parse google search results.

Posted on So 06 Januar 2013 in Programming • Tagged with Google, Scraping, Programming, SecurityLeave a comment

UPDATE on 18th February 2014:

This python module has now its own github repository!

The plugin can extract

  • All links
  • Link titles
  • The description/caption below the links

and has the following features:

  • Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY
  • Multithreading
  • XPATH parsing
  • Supports almost all search parameters

Please note that this is by no means a permanent version! Heavy structural changes will be implemented in the near future (I'll experiment with asynchronous networking for instance). But on this site, I will always host a working ...

Continue reading