GoogleScraper.py - A simple python module to parse google search results.

Posted on January 06, 2013 in Programming • Tagged with Google, Scraping, Programming, Security • 14 min read

UPDATE on 18th February 2014:

This python module has now its own github repository!

The plugin can extract

  • All links
  • Link titles
  • The description/caption below the links

and has the following features:

  • Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY
  • Multithreading
  • XPATH parsing
  • Supports almost all search parameters

Please note that this is by no means a permanent version! Heavy structural changes will be implemented in the near future (I'll experiment with asynchronous networking for instance). But on this site, I will always host a working version with instructions how to use it, such that visitors can always use the script!

1. Edit (07.01.2013):

  • Using requests instead of urllib
  • Added random User Agents for every new search.
  • Cleaned the code
  • Implemented foundation to combine with proxychains

Original Blog Post

Sample output after searching for 'cats are not cute' (sorry) with 100 results per page on 3 ascending pages: results.txt

I always was in need of a fast and reliable working python module to query the google search engine. The google API is rubbish, because they just give you maximally 36 results. This is completly inacceptable!

So, I looked further and found http://code.google …


Continue reading

Web safe Base64 Encode/Decode in C

Posted on October 29, 2012 in Programming • Tagged with C, Programming • 2 min read

A short while ago I needed to implement a little web safe base64 en/decoder and couldn't find any good small example in the width of the internet, so I decided to do my own dirty one. I hope I help somebody with this  little demonstration code...

I used Pelles C Compiler to build this program, but I am optimistic that it works on every common C Compiler, since it's quite close to the C11 standard.

#include 
#include 
#include 
#include

#define MAX_B64_PADDING 0x2
#define B64_PAD_CHAR "="

char * Base64Encode(char *input, unsigned int inputLen);
char * Base64Decode(char *input, unsigned int inputLen);
static unsigned char GetIndexByChar(unsigned char c);

static char *b64alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

int main(int argc, char **argv) {

    if (argc != 2) {
        printf("Usage: %s StringToEncode\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    printf("String \"%s\" to: " ,argv[1]);
    printf("%s\n", Base64Encode(argv[1], strlen(argv[1])));

    exit(EXIT_SUCCESS);
}

/* Caller has to free the returned base64 encoded string ! */
char *
Base64Encode(char *input, unsigned int inputLen)
{
    char *encodedBuf;
    int fillBytes, i, k, base64StrLen;
    unsigned char a0, a1, a2, a3;
    /* Make sure there is no overflow. RAM is cheap :) */
    base64StrLen = inputLen + (int)(inputLen * 0.45);

    encodedBuf = calloc(base64StrLen, sizeof(char …

Continue reading