Breaking the Google Audio reCAPTCHA with Google's own Speech to Text API

Video Demonstration and Explanation

In the YouTube video below, I explain the technical details behind solving the Google Audio reCAPTCHA. I even make some remarks about philosophical implications when AI is becoming smarter than humans in certain fields. I am in no way qualified to talk about such matters though ;)

Link to GitHub repository

Introduction

This blog article uses the fantastic research from the authors of uncaptcha2 repository. The original scientific uncaptcha paper proposes a method to solves Google's Audio reCAPTCHA with Google's own Speech-to-Text API.

Yes you read that correctly: It is possible to solve the Audio version of reCAPTCHA v2 with Google's own Speech-to-Text API.

Even worse: reCAPTCHA v2 is still used in the new reCAPTCHA v3 as a fall-back mechanism.

Home being Homer — No description needed. (Source: https://i.giphy.com/media/xT5LMzIK1AdZJ4cYW4/giphy.webp)

Since the release of uncaptcha2 is from Janunary 18, 2019, their Proof of Concept code does not work anymore (as the authors predicted correctly).

This blog post attempts to keep the proof of concept up to date and working.

How does it work?

Everyone knows and hates reCAPTCHA. It looks like this:

For the inclusion of visually impaired people, there is also an audio version of reCAPTCHA.

The idea of the attack is very simple: You grab the mp3 file of the audio reCAPTCHA and you submit it to Google's own Speech to Text API.

Google will return the correct answer in over 97% (* Edit: 91%] of all cases.

* The figure 91% comes from the original uncaptcha2 repository. I have not run statistical significant tests with the current bot, but based on intuition, it seems to be more than 90% when you rotate IP addresses and browser fingerprints.

Proof of Concept

All mouse movements are done by the bot. Movements are randomized to some degree.

Conclusion

We do live in astonishing times.