Scraping with puppeteer and headless chrome deployed to AWS Lambda

Posted on August 31, 2019 in Scraping • Tagged with puppeteer, web scraping, AWS lambda, headless chrome • 4 min read

In this blog post, we demonstrate how a web scraping function is deployed to the AWS cloud with puppeteer and headless chrome.


Continue reading

Struktur: A completely new approach to web scraping

Posted on July 15, 2019 in Scraping • Tagged with puppeteer, web scraping, CSS selectors, XPath queries • 7 min read

I will shop an alternative approach to web scraping without using css selectors and XPath queries. We make use of the fact that most web pages visually render the information of interest in a coherent, structured way. This technique requires a remotely controllable web browser such as puppeteer, that is capable of rendering web pages visually.


Continue reading

Breaking Google's Recaptcha

Posted on March 01, 2019 in Scraping • Tagged with puppeteer, recatpcha, scraping • 5 min read

A captcha is a mechanism to distinguish human users from automated programs (bot). There are many service providers in the Internet that have a major incentive to prevent bots from (ab)using their systems.


Continue reading

Scraping search engines in 2019

Posted on February 04, 2019 in Scraping • Tagged with puppeteer, scraping, modern • 4 min read

Modern scraping now is mostly done with real browsers, configured to behave like real humans.


Continue reading

Tutorial: Youtube scraping with puppeteer

Posted on October 29, 2018 in Scraping • Tagged with Youtube, Video, Scraping • 4 min read

How to scrape youtube videos using puppeteer


Continue reading

Scraping Amazon Reviews using Headless Chrome Browser and Python3

Posted on October 03, 2018 in Scraping • Tagged with Amazon, Reviews, Scraping • 2 min read

Tutorial that teaches how scrape amazon reviews


Continue reading