incolumitas.com

Hello dear readers

I get a lot of mail regarding questions about GoogleScraper. I really appreciate them, but at some stage I cannot answer them anymore. In the last weeks I didn't have a lot of time (and motivation I must admit) to put into GoogleScraper.

The reason is, that I am still unconfortable with the architecture of GoogleScraper. There are basically two ways to use the tool:

As a command line tool
From another program over the API (programming approach)

and furthermore there are 3 very different modes GoogleScraper runs in:

http mode
selenium mode which again can be divided in Firefox, Chrome and PhantomJS selenium browsers
asynchronous mode

whereas I think that selenium is the hardest to work with (very buggy and complex to program in). This leads to a complex software architecture, mainly because the two operational modes (CLI tool and API) have different priorities of how to handle exceptions.

The CLI tool should be VERY robust and it should to everything it can to continue scraping with the remaining ressources (like proxies, RAM, when lots of selenium instances become an issue, networking bandwith, ...), because the user cannot handle these problems by himself when he calls GoogleScraper …

Discontinuation of GoogleScraper

GoogleScraper Tutorial - How to scrape 1000 keywords with Google

A lot of work to do for GoogleScraper in the future and request for comments!