Disco Tip – Crunching web server logs

At my day job we use Disco, a Python + Erlang based Map-Reduce framework, to crunch our web servers and application logs to generate useful data. Each web server log file per day is a couple of GB of data which can amount to a lot of log data that needs to be processed on a daily. Since the files are big it was easier for us to perform all the necessary filtering of find the rows of interest in the “map” function. [Read More]