In this blog post I explain why it is best to avoid puppeteer and playwright for web scraping.
Continue reading
Posted on May 20, 2021 in Scraping • Tagged with web scraping, crawling, puppeteer, playwright, CDP • 10 min read
In this blog post I explain why it is best to avoid puppeteer and playwright for web scraping.
Posted on March 01, 2021 in Scraping • Tagged with web scraping, crawling, puppeteer, playwright • 13 min read
In this blog post, I am talking about my several year long experience with web scraping and common mistakes I made along the road. The more I dive into web scraping, the more I realize how easy it is to take wrong decisions when scraping a site. For that reason, I compiled a list of seven common mistakes in regard to web scraping.
Posted on May 18, 2020 in Crawling • Tagged with Crawling, Distributed Computing, Cloud, Web Bots • 6 min read
In this blog article I will introduce my most recent project: The distributed crawling infrastructure which allows to crawl any website with a low-level Http library or a fully fledged chrome browser configured to evade bot detection attempts.
This introduction is divided into three distinct blog articles, because one blog article would be too large to cover this huge topic.
<meta>
tags …