Scraping and Extracting Links from any major Search Engine like Google, Yandex, Baidu, Bing and Duckduckgo

Posted on November 12, 2014 in Meta • Tagged with Scraping, Baidu, Extracting, Google, Programming, Python, Searchengine, Bing, Meta • 7 min read

Prelude

It's been quite a while since I worked on my projects. But recently I had some motivation and energy left, which is quite nice considering my full time university week and a programming job besides.

I have a little project on GitHub that I worked on every now and again in the last year or so. Recently it got a little bit bigger (I have 115 github stars now, would've never imagined that I ever achieve this) and I receive up to 2 mails with job offers every week (Sorry if I cannot accept any request :( ).

But unfortunately my progress with this project is not as good as I want it to be (that's probably a quite common feeling under us programmers). It's not a problem of missing ideas and features that I want to implement, the hard part is to extend the project without blowing legacy code up. GoogleScraper has grown evolutionary and I am waisting a lot of time to understand my old code. Mostly it's much better to just erease whole modules and reimplement things completely anew. This is essentially what I made with the parsing module.

Parsing SERP pages with many search engines

So I …


Continue reading

Let's begin this...

Posted on July 01, 2012 in Meta • Tagged with Meta • 1 min read

Hey World!

Before you leave!

This blog and homepage is under construction. Due the fact that Im currently implementing my own little wordpress theme and the rather embarassing circumstance that my design knowledge is pretty ...eehhm... basic, you'd better stay patient until you see the procution level of this blog...That can last several weeks.

I am a 21 year old german programmer and hopefully somewhen in the future a freelancing security consultant. In the foreseable future, I'll post here audit sessions, papers and tutorials on this blog. I would consider myself as a whitehat, so don't expect illicit stuff from me. My favourite programming languages are Python and C. Everything I do for myself and can be painless done in those programming languages, will be done in one of these.

Maybe you ask yourself what incolumitas.com means?

Well, it's the latin translation for safety and since every good domain name is already taken, I just switched to a different language (latin if you're curious) and a not so common word for safety. The DNS name matches perfectly my needs, because it points directly to the main intention of this site:

Offering security services.

You can hire me. Don't …


Continue reading