In this blog post I explain why it is best to avoid puppeteer and playwright for web scraping.
Continue reading
Posted on May 20, 2021 in Scraping • Tagged with web scraping, crawling, puppeteer, playwright, CDP • 10 min read
In this blog post I explain why it is best to avoid puppeteer and playwright for web scraping.
Posted on March 01, 2021 in Scraping • Tagged with web scraping, crawling, puppeteer, playwright • 13 min read
In this blog post, I am talking about my several year long experience with web scraping and common mistakes I made along the road. The more I dive into web scraping, the more I realize how easy it is to take wrong decisions when scraping a site. For that reason, I compiled a list of seven common mistakes in regard to web scraping.
Posted on September 30, 2019 in Scraping, Crawling • Tagged with puppeteer, web scraping, headless chrome, marketing • 6 min read
In this blog post, it is explained how a lack of perfect information about the market allows the clever middleman to connect market supply with market demand by advertisement scrawping and lead crawling.
Posted on September 17, 2019 in Scraping • Tagged with puppeteer, web scraping, headless chrome, 1 million, queue, architecture • 5 min read
Scraping one million keywords is not a easy task. There are proxy problems, big data problems and reliability issues. In this blog post, the most valuable insights are shared.
Posted on August 31, 2019 in Scraping • Tagged with puppeteer, web scraping, AWS lambda, headless chrome • 4 min read
In this blog post, we demonstrate how a web scraping function is deployed to the AWS cloud with puppeteer and headless chrome.
Posted on July 15, 2019 in Scraping • Tagged with puppeteer, web scraping, CSS selectors, XPath queries • 7 min read
I will shop an alternative approach to web scraping without using css selectors and XPath queries. We make use of the fact that most web pages visually render the information of interest in a coherent, structured way. This technique requires a remotely controllable web browser such as puppeteer, that is capable of rendering web pages visually.