Detecting scraping services

Posted on March 11, 2021 in Scraping • Tagged with detecting, scraping, security, fingerprint • 13 min read

In this blog post I will demonstrate how it is possible to detect several scraping services: luminati.io, ScrapingBee, scraperapi.com, scrapingrobot.com, scrapfly.io.


Continue reading

7 Common Mistakes in Professional Scraping

Posted on March 01, 2021 in Scraping • Tagged with web scraping, crawling, puppeteer, playwright • 13 min read

In this blog post, I am talking about my several year long experience with web scraping and common mistakes I made along the road. The more I dive into web scraping, the more I realize how easy it is to take wrong decisions when scraping a site. For that reason, I compiled a list of seven common mistakes in regard to web scraping.


Continue reading

Why does this Website know that I am sitting on the Toilet?

Posted on February 05, 2021 in Security • Tagged with JavaScript, deviceorientation, devicemotion • 3 min read

Android mobile devices give to any website device orientation and device motion data. This data is quite sensitive in nature and should not be granted to websites without obtaining explicit user consent.


Continue reading

Headful Google Chrome with Xvfb on AWS Lambda Container

Posted on January 23, 2021 in Tutorials • Tagged with AWS Lambda, Xvfb, Docker, Container • 4 min read

The following write-up is an attempt to launch headful Google Chrome with Xvfb on AWS Lambda container.


Continue reading

Browser Red Pills: Why are you browsing my website from AWS Lambda?

Posted on January 17, 2021 in Security • Tagged with red pill, Bot, Advanced Bots, JavaScript, Puppeteer, Playwright • 6 min read

Advanced bots use modern browsers and automation frameworks such as puppeteer and playwright. It becomes increasingly hard to distinguish bots from real human traffic, therefore, new methods are required.


Continue reading

Browser based Port Scanning with JavaScript

Posted on January 10, 2021 in Security • Tagged with browser, port scanning, JavaScript • 10 min read

In this article, various techniques to conduct port scanning from within the browser are developed. Modern JavaScript is used.


Continue reading