nsq-to-gs – Streaming NSQ messages directly to Google Cloud Storage

nsq-to-googlestorage

In addition to my previously published (very early) project to stream NSQ messages directly to BigQuery, I am happy to presents a modified version of nsq-to-s3 that supports streaming NSQ messages directly Google Cloud Storage.

Grab it while its hot from the nsq-to-gs repo.

I do see a future for a merged version of these two projects that supports both S3 and Google Cloud Storage but this would have to be enough for now.

 

The current version has the same functionality as the latest nsq-to-s3 version and was adapted to support Google Storage with minor modifications (such as the default path and filename formats).

nsq-to-bigquery – Stream messages from NSQ directly to Google BigQuery

nsq-to-bigquery

In the spirit of nsq-to-XXX such as nsq-to-http and nsq-to-file – I bring you the very first version of nsq-to-bigquery.

nsq-to-bigquery, as the name suggest, streams data from an NSQ channel into Google’s BigQuery using the Streaming API and provide very effective means to stream data that should be then further analysed and aggregated by BigQuery’s excellent performance.

This is a (very) initial version so it has some limitations and assumptions.

Limitations / Assumptions

  • The BigQuery table MUST exist prior to streaming the data
  • The NSQ message being sent MUST be a valid JSON string
  • The JSON format MUST be a simple flat dictionary (key and simple value. Value can’t be another dictionary or list)
  • The JSON format MUST match the format of the BigQuery table
  • At the moment there is no support for batching so each message will issue an API call to BigQuery with a single line of data

Planed Features:

  • Support batching with flushing based on X number of rows or Y amount of time passed since last flush
  • Flushing will happen in parallel with receiving information so there is almost no delay

Stay tuned on the github repo for more news.

gonionoo – Go wrapper for the Tor Network Status Protocol – OnionOO

I’ve bene running a Tor exit node in the Netherlands since August 2013. I believe in the cause of Tor and it was only a matter of time before I started adding code in some for or another.

gonionoo is Go wrapper for OnionOO – the Tor Network Status protocol as is the first step in a slightly larger project I’m working on that I’ve been planning for a while ever since I’ve became a Tor exit node operator.

The OnionOO API has lots of interesting data on the Tor network. You can see it visualized as part of the Atlas project.

New programming languages forces you to re-think a problem in a fresh way (or why do we need new programming languages. always.)

Whenever a new programming language appears some claim its the best thing since sliced bread (tm – not mine ;-) ), other claim its the worst thing that can happen and you can implement everything that the language provides in programming language X (assign X to your favorite low level programming language and append a suitable library).

After seeing Google’s new Go programming language I must say I’m excited. Not because its from Google and it got a huge buzz around the net. I am excited about the fact that people decided to think differently before they went on and created Go.

I’m reading Masterminds of Programming: Conversations with the Creators of Major Programming Languages (a good read for any programming language fanaticos) which is a set of interviews with various programming languages creators and its very interesting to see the thoughts and processes behind a couple of the most widely used programming languages (and even some non-so-widely-used programming languages).

In a recent interview Brad Fitzpatrick (of LiveJournal fame and now a Google employee) was asked:

You’ve done a lot of work in Perl, which is a pretty high-level language. How low do you think programmers need to go – do programmers still need to know assembly and how chips work?

To which he replied:

… I see people that are really smart – I would say they’re good programmers – but say they only know Java. The way they think about solving things is always within the space they know. They don’t think ends-to-ends as much. I think it’s really important to know the whole stack even if you don’t operate within the whole stack.

I subscribe to Brad’s point of view because   a) you need to know your stack from end to end – from the metals in your servers (i.e. server configuration), the operating system internals to the data structures used in your code and   b) you need to know more than one programming language to open up your mind to different ways of implementing a solution to a problem.

Perl has regular expressions baked into the language making every Perl developer to think in pattern matching when performing string operations instead of writing tedious code of finding and replacing strings. Of course you can always use various find and replace methods, but the power and way of thinking of compiled pattern matching makes it much more accessible, powerful and useful.

Python has lists and dictionaries (using a VERY efficient hashtable implementation, at least in CPython) backed into the language because lists and dictionaries are very powerful data structures that can be used in a lot solutions to problems.

One of Go’s baked in features is concurrency support in the form of goroutines. Goroutines makes the use of multi-core systems very easy without the complexities that exists in multi-processing or multi-threading programming such as synchronization. This feature actually shares some ancestry with Erlang (which by itself has a very unique syntax and vocabulary for scalable functional programming).

Every programming language brings something new to the table and a new way of looking at things and solving problems. That’s why its so special :-)