In addition to my previously published (very early) project to stream NSQ messages directly to BigQuery, I am happy to presents a modified version of nsq-to-s3 that supports streaming NSQ messages directly Google Cloud Storage.
Grab it while its hot from the nsq-to-gs repo.
I do see a future for a merged version of these two projects that supports both S3 and Google Cloud Storage but this would have to be enough for now.
The current version has the same functionality as the latest nsq-to-s3 version and was adapted to support Google Storage with minor modifications (such as the default path and filename formats).
In the spirit of nsq-to-XXX such as nsq-to-http and nsq-to-file – I bring you the very first version of nsq-to-bigquery.
nsq-to-bigquery, as the name suggest, streams data from an NSQ channel into Google’s BigQuery using the Streaming API and provide very effective means to stream data that should be then further analysed and aggregated by BigQuery’s excellent performance.
This is a (very) initial version so it has some limitations and assumptions.
Limitations / Assumptions
- The BigQuery table MUST exist prior to streaming the data
- The NSQ message being sent MUST be a valid JSON string
- The JSON format MUST be a simple flat dictionary (key and simple value. Value can’t be another dictionary or list)
- The JSON format MUST match the format of the BigQuery table
- At the moment there is no support for batching so each message will issue an API call to BigQuery with a single line of data
- Support batching with flushing based on X number of rows or Y amount of time passed since last flush
- Flushing will happen in parallel with receiving information so there is almost no delay
Stay tuned on the github repo for more news.