Scraping and Extracting Links from any major Search Engine like Google, Yandex, Baidu, Bing and Duckduckgo

Posted on November 12, 2014 in Meta • Tagged with Scraping, Baidu, Extracting, Google, Programming, Python, Searchengine, Bing, Meta • 7 min read

Prelude

It's been quite a while since I worked on my projects. But recently I had some motivation and energy left, which is quite nice considering my full time university week and a programming job besides.

I have a little project on GitHub that I worked on every now and again in the last year or so. Recently it got a little bit bigger (I have 115 github stars now, would've never imagined that I ever achieve this) and I receive up to 2 mails with job offers every week (Sorry if I cannot accept any request :( ).

But unfortunately my progress with this project is not as good as I want it to be (that's probably a quite common feeling under us programmers). It's not a problem of missing ideas and features that I want to implement, the hard part is to extend the project without blowing legacy code up. GoogleScraper has grown evolutionary and I am waisting a lot of time to understand my old code. Mostly it's much better to just erease whole modules and reimplement things completely anew. This is essentially what I made with the parsing module.

Parsing SERP pages with many search engines

So I …


Continue reading

Using the Python cryptography module with custom passwords

Posted on October 19, 2014 in Cryptography • Tagged with Cryptography, Programming, Uncategorized • 1 min read

Hey all

I recently discovered a quite cute crypto module for Python. It is divided in two logical security layers. The first (Fernet) can be used by cryptology unaware programmers in a way that makes it unlikely to introduce any security flaws. The seconds layer (called Hazmat) allows access to all kinds of cryptographical primitives, such as HMACS and asymmetric encryption functions.

The Problem

Normally you don't want to use primitives, because it is tricky to do correct (event for advanced programmers). But unfortunately the secure and simple API functionality Fernet:

>>> from cryptography.fernet import Fernet
>>> key = Fernet.generate_key()
>>> f = Fernet(key)
>>> token = f.encrypt(b"my deep dark secret")
>>> token
'...'
>>> f.decrypt(token)
'my deep dark secret

suffers from the huge inconvenience that you need to store (or imagine:remember!) a 32 byte key in order to decrypt the tokens that Fernet outputs.
It would be much more convenient to just pass a password to Fernet which in turn makes a 32 byte, Base 64 encoded encryption token out of it. Of course your own
password is much less secure then 32 bytes from os.urandom(32), but at least it is somehow usable.

So I came up …


Continue reading

Lichess.org chess bot!

Posted on April 23, 2014 in Uncategorized • Tagged with Uncategorized, Programming, Chess • 4 min read

22.05.2014: Updated the bot, should work better now

Hi everyone!

I was in a coding mood during Easter and decided to write a small chess bot with selenium and stockfish engine to cheat a bit on lichess.org.

I think the code is pretty self explanatory and I won't discuss it in depth here. You can tweak the config, the comments should explain what the parameters do.

The config is in the beginning of the code, so modify it there. You should maybe modify it to use your username and password. Make sure that you download stockfish and install it. Then supply the correct path in the 'stockfish_binary' parameter.

As always: Have fun!

Some open issues:

  • Sometimes the last move fails because the bot won't to start a new game before it can checkmate
  • Promoting doesn't work yet :/

Here is the code:

__author__ = 'nikolai'
__date__ = 'Easter 2014'

config = {
    'username' : 'probably_a_spider', # the login username
    'password' : 'somepwd', # the login password
    'stockfish_binary' : '/home/nikolai/PycharmProjects/LichessBot/stockfish-dd-src/src/stockfish', # the path to your local stockfish binary
    #Set to true if the bot should play forever
    'pwn_forever' : True, # if the bot should play endlessly
    'min_per_side …

Continue reading

Socks 5 client support for twisted

Posted on February 05, 2014 in Programming • Tagged with Python, Twisted, Socks5, Programming, Security • 5 min read

I recently forked twisted-socks to add SOCKS 5 support for my GoogleScraper in order to scraper Google pages asynchronously. Obviously I needed SOCKS5 support to anonymize the parallel requests such that I can scrape more pages simultaneously.

I tested the code for SOCKS4 and SOCKS4a with a local TOR proxy and twistd -n socks and the SOCKS5 protocol with the dante socks proxy server on my VPS. So I guess the basic functionality should be working by now. GSSAPI (Kerberos) support is planned.

Here is the socksclient code, which is also available on my github repository:

# Copyright (c) 2011-2013, The Tor Project
# See LICENSE for the license.

# Updated on 25.01.14-28.01.14 to add SOCKS 5 support.
# Cleaned some parts of the code and abstracted quite a bit to handle the most important SOCKS5
# functionality like
# - username/password authentication
# - gssapi authentication (planned)
# - CONNECT command (the normal case, there are others: UDP ASSOCIATE and BIND, but they aren't as important. Maybe I will add them
#   in the future. If anyone wants to implement them, the basic structure is already here and the SOCKSv5ClientProtocol should be
#   rather easy extensible (how the actual connection, listening for incoming connections (BIND) and …

Continue reading

The art of cheating: Making a chess.com chess bot following an unusual approach!

Posted on January 26, 2014 in C • Tagged with C, Chess.com, Cheating, Firefox, Hooking, Chess, Lowlevel, Programming, Security • 21 min read

Table of contents

  1. Preface: Giving first insight into the idea and why I think that hooking into a browser is a good idea.
  2. Many different ways to make browser game bots: Discussion various techniques to write HTTP/WebSocket bots
  3. How does chess.com internally look like?: Investigation of the client side behavior of chess.com
  4. How the bot works: Explaining how my shared library hooks firefox network functions
  5. Conclusion: Summary of my discoveries
  6. Demo Video and another, better demo video: You might only watch that video, but make sure you read the explanation on the very bottom of this blog post!
  7. You may find the sources to the shared library (so) on my github account.

Preface

Usually I don't have good ideas in forms of flashes of genius. On the contrary, I think that many endeavors and interesting projects might be reasonable if realized, but often so, there's a huge amount of work involved and too many variables and strategic decisions in the process that could eventually render the project a failure. What I try to say: A mediocre idea well engineered might be a good product. But a good idea badly implemented and designed is usually just bad in …


Continue reading

Exploiting wordpress plugins through admin options (No 3. — Easy Media Gallery stored XSS)

Posted on December 17, 2013 in Php • Tagged with Vulnerablity, Websecurity, Exploit, Stored, Php, Programming, Security, Xss, Wordpress, Easy-media-gallery • 12 min read

Preface

This post is about general security weaknesses in wordpress plugins, that allow malicious attackers to gain code execution access on the web server (which is quite often the user www-data). To outline the problem shortly: Often, wordpress plugins need a administration form to handle settings and options. These options are meant to be exclusively alterable by the admin of the wordpress site. But unfortunately, lots of wordpress plugins suffer from a very dangerous combination of CSRF and stored XSS vulnerabilities, that wrapped up in a social engineering approach, may break the site.

I have done some research in the past about such attacks. You can read about a stored xss in flash album gallery plugin as well as my findings about a similar flaw in the wp members plugin.

How does the attack vector look like?

First we need to understand how administration menus are created in wordpress, because these forms are the point where data flows into a application. You can learn more about the underlying concept on wordpress codex.

But the crucial point to understand is, that they all consist of forms, independently of the fact that you can pack your options under a predefined and already …


Continue reading