nsq-to-gs – Streaming NSQ messages directly to Google Cloud Storage


In addition to my previously published (very early) project to stream NSQ messages directly to BigQuery, I am happy to presents a modified version of nsq-to-s3 that supports streaming NSQ messages directly Google Cloud Storage.

Grab it while its hot from the nsq-to-gs repo.

I do see a future for a merged version of these two projects that supports both S3 and Google Cloud Storage but this would have to be enough for now.


The current version has the same functionality as the latest nsq-to-s3 version and was adapted to support Google Storage with minor modifications (such as the default path and filename formats).

nsq-to-bigquery – Stream messages from NSQ directly to Google BigQuery


In the spirit of nsq-to-XXX such as nsq-to-http and nsq-to-file – I bring you the very first version of nsq-to-bigquery.

nsq-to-bigquery, as the name suggest, streams data from an NSQ channel into Google’s BigQuery using the Streaming API and provide very effective means to stream data that should be then further analysed and aggregated by BigQuery’s excellent performance.

This is a (very) initial version so it has some limitations and assumptions.

Limitations / Assumptions

  • The BigQuery table MUST exist prior to streaming the data
  • The NSQ message being sent MUST be a valid JSON string
  • The JSON format MUST be a simple flat dictionary (key and simple value. Value can’t be another dictionary or list)
  • The JSON format MUST match the format of the BigQuery table
  • At the moment there is no support for batching so each message will issue an API call to BigQuery with a single line of data

Planed Features:

  • Support batching with flushing based on X number of rows or Y amount of time passed since last flush
  • Flushing will happen in parallel with receiving information so there is almost no delay

Stay tuned on the github repo for more news.

MongoDB Replica-Set Aware Backup Script

I’ve created a nice little bash script to take MongoDB backups that is replicaset aware.

It will only take a backup from a replica so if you have the classic master,replica,arbiter configuration you can setup the script via cron on both (current) master and replica and the backup will only run on the replica.

It will then tar.gz the backup and upload it to Google Storage. It can be easily adapted to upload the backup to S3 using s3cmd or the aws cli (aws-cli).

Cross posted at Forecast:Cloudy (my cloud blog).

Don Dodge, Google and Developers Evangelism

I was just reading over at TechCrunch about Google quickly hiring Don Dodge after he was let go from Microsoft. It seems Don will be doing what he used to do at Microsoft – Developer Evangelism (good for him, and Google!).

I’m very happy to see that Google is putting their stock options and cash where their mouth is to evangelize their APIs, platforms (Android, AppEngine) and tools to developers.

A while back I wrote about the lack of Google’s outreach in the Israeli developers community, and it is still very visible in Israel by the jobs listings as well as various events and conventions that Microsoft Technology still dominates the Israeli high-tech software scene.

I do hope that hiring Don Dodge and keep on releasing tools, SDKs, Platforms and even languages such as the new Go programming language, to create the necessary diversification that every monopolized field needs.

I just hope that Google will start to do more than just very simple and shallow Dev Days in Israel and will start reaching out the community, specifically in Israel. I would like to see a Google I/O event in Israel and may be a couple of smaller events that dig down into code and details in a more intimate scenario with less people.

In general I would expect Google to start evangelizing in other countries and start having evangelists in every country they have an office. I would suggest Google to learn a bit from MSDN as well as the Microsoft Valued Professional (MVP) program – these tools are one of the best examples of creating a community based on core leaders that can drive the community as well as Google straight up.

Google is still light years from reaching the well oiled, well organized Microsoft evangelism machine and I hope Don and other will be able to make big leaps to close that gap.

New programming languages forces you to re-think a problem in a fresh way (or why do we need new programming languages. always.)

Whenever a new programming language appears some claim its the best thing since sliced bread (tm – not mine ;-) ), other claim its the worst thing that can happen and you can implement everything that the language provides in programming language X (assign X to your favorite low level programming language and append a suitable library).

After seeing Google’s new Go programming language I must say I’m excited. Not because its from Google and it got a huge buzz around the net. I am excited about the fact that people decided to think differently before they went on and created Go.

I’m reading Masterminds of Programming: Conversations with the Creators of Major Programming Languages (a good read for any programming language fanaticos) which is a set of interviews with various programming languages creators and its very interesting to see the thoughts and processes behind a couple of the most widely used programming languages (and even some non-so-widely-used programming languages).

In a recent interview Brad Fitzpatrick (of LiveJournal fame and now a Google employee) was asked:

You’ve done a lot of work in Perl, which is a pretty high-level language. How low do you think programmers need to go – do programmers still need to know assembly and how chips work?

To which he replied:

… I see people that are really smart – I would say they’re good programmers – but say they only know Java. The way they think about solving things is always within the space they know. They don’t think ends-to-ends as much. I think it’s really important to know the whole stack even if you don’t operate within the whole stack.

I subscribe to Brad’s point of view because   a) you need to know your stack from end to end – from the metals in your servers (i.e. server configuration), the operating system internals to the data structures used in your code and   b) you need to know more than one programming language to open up your mind to different ways of implementing a solution to a problem.

Perl has regular expressions baked into the language making every Perl developer to think in pattern matching when performing string operations instead of writing tedious code of finding and replacing strings. Of course you can always use various find and replace methods, but the power and way of thinking of compiled pattern matching makes it much more accessible, powerful and useful.

Python has lists and dictionaries (using a VERY efficient hashtable implementation, at least in CPython) backed into the language because lists and dictionaries are very powerful data structures that can be used in a lot solutions to problems.

One of Go’s baked in features is concurrency support in the form of goroutines. Goroutines makes the use of multi-core systems very easy without the complexities that exists in multi-processing or multi-threading programming such as synchronization. This feature actually shares some ancestry with Erlang (which by itself has a very unique syntax and vocabulary for scalable functional programming).

Every programming language brings something new to the table and a new way of looking at things and solving problems. That’s why its so special :-)

Google AppEngine – Python – issubclass() arg 1 must be a class

If you are getting the error “”issubclass() arg 1 must be a class”” with Google App Engine SDK for Python on Linux its probably because you are running Python 2.6 (and will probably happen to you when you run Ubuntu 9.04 – 2.6 is the default there).

Just run the dev server under python 2.5 (i.e. python2.5 dev_appserver.py)

Google Developer Day 2008 Israel (yes, it’s in Israel)

About a year and a half ago I’ve written about Google Israel’s position in the Israeli development community (actually, there lack of) and that a company like Google should be more involved.

This was written around the time the 2007 Google Developer Day happened in more than 10 places around the world but not in Israel.

I opened my Email this morning and to my surprise I found an invitation to the Google Developer Day 2008 in Israel.

It seems there is a good schedule and a very interesting cast of lecturers. Some of the lecturers are Israeli Googlers while others are Googlers from Europe and the USA.

While most of it revolves around Google technologies (GData and the APIs, AppEngine, V8 JavaScript engine) or Google sponsored initiatives (OpenSocial) it’s a good start for a conversation between the Israeli development community and Google Israel (or Google in general for that matter).

I hope this is a first step in Google’s involvment in the Israeli development community, one that will lead to a more diverse and engaged community.

The event will take place on November 2nd at the Avenue convention center (near Airprot city). Currently registration requires an invitation.

I’ve already registered and if nothing else will change my schedule I will be there. If you also registered and know me (or don’t know me yet) feel free to drop by and say hi.

Google Reader Search is here!

I fired up Google Reader this morning and to my surprise I found a search box:

Google Reader now has search

This is one of the last missing features I wanted Google Reader to have.

I actually have a friend that didn’t want to switch from a desktop feed reader until Google Reader added search. Now he can safely move to it :-)

You can limit your search to all items in all of your feeds, all stared items, all shared items or items from a specific folder. I couldn’t make the search work with some of the search keywords I’m familiar with in Gmail like “from:XXX”, “label:XXX” etc, which I think is very important.

I even used the Google Blog Search syntax of “inpostauthor:Eran” to find all posts written by Eran, but it doesn’t seem to work.

I would have expected that the Google Reader search will use the Google Blog Search engine underneath and just add additional limitation for searches like “All shared items” in which it will perform the search only on that specific set of items. Perhaps it does use it but without some of the query syntax features.

Oh well, I hope the Google Reader search will converge with the syntax of Google Blog Search to make the search feature complete.

All in all this is a great and long requested feature. Great job Google Reader team!

Google Apps for your Domain, DNS, CNAME and Security

I’ve recently started to use Google Apps for Your domain to host my private emails on the sandler.co.il domain.

Google Apps for your domain is quite cool and was very easy to configure. I mainly moved to it due to the unbelievable amounts of SPAM and I didn’t have the power or time to configure SpamAssassin in a reasonable way that would actually work.

When I moved, one of the things I did was to change the “default” URL in which me and other members of my family use to access the web mail of the domain. Google Apps for your Domain allows you to do just that by configuring it in its configuration screen and settings a CNAME record that points to ghs.google.com.

After configuring everything I tested it out and noticed something disturbing.

It seems that CNAME (by design/default/whatever) does not support HTTPS, only HTTP. This means that the CNAME alias I configured will be resolved to mail.google.com/a/YourDomain.XXX (replace YourDomain.XXX with your domain ;-) ). If you are not authenticated you’ll be redirected to authenticate on an SSL protected address (https) and upon successful authentication you will be directed to http://mail.google.com/a/YourDomain.XXX (not https – not SSL).

This means that now, when you read or write Emails they are not protected. If you are sitting in an open WIFI network (passwordless network) people can easily sniff out your Emails and correspondence (I know that not using WPA will make you prune to man in the middle attacks, but that’s not the issue here). This is just one of the scenarios that you will be vulnerable (there are a few more).

It’s not that accessing https://mail.google.com/a/YourDOMAIN.XXX will not work. On the contrary, it will work fine and all the communication will be secured using SSL (https).

It seems Google is encouraging recklessness with their current configuration, instead of redirecting authenticated users to the secured version (https/SSL) of their web mail specifically because of the DNS CNAME limitations.

It is a simple fix on Google’s behalf which will increase the security dramatically.