incolumitas.com

GoogleScraper.py - A simple python module to parse google search results.

Posted on January 06, 2013 in Programming • Tagged with Google, Scraping, Programming, Security • 14 min read

UPDATE on 18th February 2014:

This python module has now its own github repository!

The plugin can extract

All links
Link titles
The description/caption below the links

and has the following features:

Advanced proxy support for SOCKS4/4a/5 and HTTP PROXY
Multithreading
XPATH parsing
Supports almost all search parameters

Please note that this is by no means a permanent version! Heavy structural changes will be implemented in the near future (I'll experiment with asynchronous networking for instance). But on this site, I will always host a working version with instructions how to use it, such that visitors can always use the script!

1. Edit (07.01.2013):

Using requests instead of urllib
Added random User Agents for every new search.
Cleaned the code
Implemented foundation to combine with proxychains

Original Blog Post

Sample output after searching for 'cats are not cute' (sorry) with 100 results per page on 3 ascending pages: results.txt

I always was in need of a fast and reliable working python module to query the google search engine. The google API is rubbish, because they just give you maximally 36 results. This is completly inacceptable!

So, I looked further and found http://code.google …

Web safe Base64 Encode/Decode in C

Posted on October 29, 2012 in Programming • Tagged with C, Programming • 2 min read

A short while ago I needed to implement a little web safe base64 en/decoder and couldn't find any good small example in the width of the internet, so I decided to do my own dirty one. I hope I help somebody with this little demonstration code...

I used Pelles C Compiler to build this program, but I am optimistic that it works on every common C Compiler, since it's quite close to the C11 standard.

#include 
#include 
#include 
#include

#define MAX_B64_PADDING 0x2
#define B64_PAD_CHAR "="

char * Base64Encode(char *input, unsigned int inputLen);
char * Base64Decode(char *input, unsigned int inputLen);
static unsigned char GetIndexByChar(unsigned char c);

static char *b64alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

int main(int argc, char **argv) {

    if (argc != 2) {
        printf("Usage: %s StringToEncode\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    printf("String \"%s\" to: " ,argv[1]);
    printf("%s\n", Base64Encode(argv[1], strlen(argv[1])));

    exit(EXIT_SUCCESS);
}

/* Caller has to free the returned base64 encoded string ! */
char *
Base64Encode(char *input, unsigned int inputLen)
{
    char *encodedBuf;
    int fillBytes, i, k, base64StrLen;
    unsigned char a0, a1, a2, a3;
    /* Make sure there is no overflow. RAM is cheap :) */
    base64StrLen = inputLen + (int)(inputLen * 0.45);

    encodedBuf = calloc(base64StrLen, sizeof(char));
    if (encodedBuf == NULL) {
        printf("calloc() failed …

Newer Posts