A while back Twitter announced the Snowflake service. Snowflake is a unique ID generator that is fast and generate 64bit integer unique ids that are “roughly sortable”. That is, newer ids are bigger than older ones, up to a certain point.

The service was originally written in Scala (which runs on the JVM) and has a Thrift interface, which means you can talk to it from almost any thinkable programming language.

The project was shared on GitHub.

Personally, I don’t really like the JVM. It’s rather bloated in memory terms and can make quite a mess when you need to fine tune it to low memory environments. Also, the Snowflake service code is rather simple and rarely allocate a lot of new objects, which means allocation wise, its rather fixed.

I’ve re-implemented the service in Python using the same Thrift interfaces for both testing as well as being able to run it on low memory environments without the need to fine tune the JVM.

This implementation is rather naive and doesn’t work too much around CPython’s Global Interpeter Lock (GIL) so it yields much less IDs per second than the Scala implementation, however you can compensate for it by running multiple processes.

You can grab the service code from here: https://github.com/erans/pysnowflake

I’ve also written a very simple Python client (it should support connecting to multiple Snowflake services, but the current version disregards this) which I only tested with PySnowflake (the Python server I created). I didn’t test it against the original Scala service.

You can grab the Python client code here: https://github.com/erans/pysnowflakeclient

While I do use some of this code in production, it is far from being fully tested and checked and I would use it as a reference or study it well and load test it before deploying it.