Tornado’s secure cookie support in Flask

tornado-cookie-flaskI’ve recently had the chance to write a new project on AppEngine.

It’s been a long time since I tried I was too lazy (as always) to setup servers just for that.

I’ve decided to use Python but just to be sure I won’t be vendor locked into various AppEngine services I’ve decided to use:

  • Flask (instead of webapp2)
  • Cloud SQL (instead of DataStore)

This will ensure that I can break out of AppEngine easily with minimal code changes.

This was the first major Flask project I’ve written and I found its current cookie support a bit lacking compared to Tornado’s secure cookies (I won’t go into the debate of why it should be kept like that and why I’m not using a session cookie that points to the real session data somewhere else).

I’ve decided to create a small module to add Tornado’s secure cookie support into Flask.

It’s basically a modified version of the current Tornado Secure Cookie code and its quite easy to use in Flask as well.

Grab it and share your comments and opinions. It’s also available on PyPI under the name “flask-secure-cookie“.

Python Implementation of Twitter’s Snowflake Service

A while back Twitter announced the Snowflake service. Snowflake is a unique ID generator that is fast and generate 64bit integer unique ids that are “roughly sortable”. That is, newer ids are bigger than older ones, up to a certain point.
The service was originally written in Scala (which runs on the JVM) and has a Thrift interface, which means you can talk to it from almost any thinkable programming language.

The project was shared on GitHub.

Personally, I don’t really like the JVM. It’s rather bloated in memory terms and can make quite a mess when you need to fine tune it to low memory environments. Also, the Snowflake service code is rather simple and rarely allocate a lot of new objects, which means allocation wise, its rather fixed.

I’ve re-implemented the service in Python using the same Thrift interfaces for both testing as well as being able to run it on low memory environments without the need to fine tune the JVM.

This implementation is rather naive and doesn’t work too much around CPython’s Global Interpeter Lock (GIL) so it yields much less IDs per second than the Scala implementation, however you can compensate for it by running multiple processes.

You can grab the service code from here: https://github.com/erans/pysnowflake

I’ve also written a very simple Python client (it should support connecting to multiple Snowflake services, but the current version disregards this) which I only tested with PySnowflake (the Python server I created). I didn’t test it against the original Scala service.

You can grab the Python client code here: https://github.com/erans/pysnowflakeclient

While I do use some of this code in production, it is far from being fully tested and checked and I would use it as a reference or study it well and load test it before deploying it.

 

Determine if an Email address is Gmail or Hosted Gmail (Google Apps for Your Domain)

For my latest venture, MyFamilio, I needed to know if a user’s Email address is a Gmail one so that I could show the user his/her contacts from Gmail.

Figuring out if the user is on Gmail is usually easy – the Email ends with @gmail.com. But what happens for all of those Google Apps for Your domain (like my own, which uses the @sandler.co.il domain) ?

Well, you can easily detect that by running a DNS query on the MX record.

I wrote a small function in Python which uses dnspyhon to do just that, determine if an Email address is hosted on Gmail or not.

Check the gist here.

Check the gist here.

Extract GPS Latitude and Longitude Data from EXIF using Python Imaging Library (PIL)

I was searching an example of using Python Imaging Library (PIL) to extract the GPS data from EXIF data in images.

There were various half baked examples that didn’t handle things well, so I baked something of my own combining multiple examples.

You can get it here: https://gist.github.com/983821

Or see it embedded below:

Disco Tip – Crunching web server logs

At my day job we use Disco, a Python + Erlang based Map-Reduce framework, to crunch our web servers and application logs to generate useful data.

Each web server log file per day is a couple of GB of data which can amount to a lot of log data that needs to be processed on a daily.

Since the files are big it was easier for us to perform all the necessary filtering of find the rows of interest in the “map” function. The problem is, that it requires us to return some generic null value for rows that are not interesting for us. This causes the intermediate files to contains a lot of unnecessary data that has the mapping of our uninteresting rows.

To significantly reduce this number, we have started to use the “combiner” function so that our intermediate results contains an already summed up result of the file the node is currently processing that is composed only from the rows we found interesting using the filtering in the “map” phase.

For example, if we have 1,000 rows and only 200 answer a certain filtering criteria for a particular report, instead of getting 1,000 rows in the intermediate file out of which 800 have the same null value, we now get only 200 rows.

In some cases we saw an increase of up to 50% in run time (the increase in speed is the result of reducing less rows from the intermediate files), not to mention a reduction in disk space use during execution due to the smaller intermediate files.

That way, we can keep the filtering logic in the “map” function while making sure we don’t end up reducing unnecessary data.

New programming languages forces you to re-think a problem in a fresh way (or why do we need new programming languages. always.)

Whenever a new programming language appears some claim its the best thing since sliced bread (tm – not mine ;-) ), other claim its the worst thing that can happen and you can implement everything that the language provides in programming language X (assign X to your favorite low level programming language and append a suitable library).

After seeing Google’s new Go programming language I must say I’m excited. Not because its from Google and it got a huge buzz around the net. I am excited about the fact that people decided to think differently before they went on and created Go.

I’m reading Masterminds of Programming: Conversations with the Creators of Major Programming Languages (a good read for any programming language fanaticos) which is a set of interviews with various programming languages creators and its very interesting to see the thoughts and processes behind a couple of the most widely used programming languages (and even some non-so-widely-used programming languages).

In a recent interview Brad Fitzpatrick (of LiveJournal fame and now a Google employee) was asked:

You’ve done a lot of work in Perl, which is a pretty high-level language. How low do you think programmers need to go – do programmers still need to know assembly and how chips work?

To which he replied:

… I see people that are really smart – I would say they’re good programmers – but say they only know Java. The way they think about solving things is always within the space they know. They don’t think ends-to-ends as much. I think it’s really important to know the whole stack even if you don’t operate within the whole stack.

I subscribe to Brad’s point of view because   a) you need to know your stack from end to end – from the metals in your servers (i.e. server configuration), the operating system internals to the data structures used in your code and   b) you need to know more than one programming language to open up your mind to different ways of implementing a solution to a problem.

Perl has regular expressions baked into the language making every Perl developer to think in pattern matching when performing string operations instead of writing tedious code of finding and replacing strings. Of course you can always use various find and replace methods, but the power and way of thinking of compiled pattern matching makes it much more accessible, powerful and useful.

Python has lists and dictionaries (using a VERY efficient hashtable implementation, at least in CPython) backed into the language because lists and dictionaries are very powerful data structures that can be used in a lot solutions to problems.

One of Go’s baked in features is concurrency support in the form of goroutines. Goroutines makes the use of multi-core systems very easy without the complexities that exists in multi-processing or multi-threading programming such as synchronization. This feature actually shares some ancestry with Erlang (which by itself has a very unique syntax and vocabulary for scalable functional programming).

Every programming language brings something new to the table and a new way of looking at things and solving problems. That’s why its so special :-)

Google AppEngine – Python – issubclass() arg 1 must be a class

If you are getting the error “”issubclass() arg 1 must be a class”” with Google App Engine SDK for Python on Linux its probably because you are running Python 2.6 (and will probably happen to you when you run Ubuntu 9.04 – 2.6 is the default there).

Just run the dev server under python 2.5 (i.e. python2.5 dev_appserver.py)

My first post using a Blog Editor

I’ve decided I wanted to find a reasonable blog editor to post from instead of using the web interface of Blogger (which is nice, but not THAT nice)

After long searches and going through a lot of blog editors (some even cost money) I’ve found this one which is called Zoundry which is even written in Python.

It has some neat features in it like:

  • Tags support – including support for Technorati, Del.icio.us, Flicker, 43 things and more.
  • Preview with your OWN template. It even downloaded my template and enabled me to view this post as it would appear in the blog.Anyhow, this is my first post out of it to test it out and see how it is and if its worth using it all the time.