Probabilistic data structures to estimate cardinalities and frequencies of massive streams

Posted on Mi 20 Juli 2016 in BigData • Tagged with LogLog-Count, Count-Min-Sketch, Linear Count, Big Data, Stream Processing

In the following blog post we will introduce three different Big Data algorithms. More specifically, we will learn about probabilistic data structures that allow us to estimate cardinalities and frequencies of elements that originate from a massive stream of data. This blog post is heavily inspired by a the well written article on probabilistic data structures for web analytics and data mining. I will not cover the mathematics behind those data structures, the beforementioned blog post does that much better. And if not, then you should probably consult the original …

Continue reading