EuroPython 2011: Simon Willison on Challenges in developing a large Django site

Links: talk description and video and slides.


Simon Willison is the co-founder lanyrd.com, a social website for conferences.

Tips and tricks

Signing (from 1.4, currently in trunk)

Using cryptographic signing for various things can ensure that they haven't been tampered with, for instance a cookie or an unsubscribe link. If you encrypt your session cookies you don't have to hit the database anymore, you just need to check the proper signed cookie.

The speaker showed a couple of short code examples to demonstrate how simple it is to use, and how the interface is consistent with the other serialisation interfaces.

from django.core import signing
signing.dumps({"foo": "bar"})  # url safe
signing.loads(string)

cache_version

This is another way to do cache invalidation. You add a cache_version field to the model, that is incremented when calling the save() hook or a touch() method. In the template cache fragment, you use the primary key and the cache_version to invalidate.

You can also mass invalidate by updating the cache version of objects.all() using F() -- example from the slides:

topic.conferences.all().update(
    cache_version = F('cache_version') + 1
)

noSQL for denormalisation

Use noSQL to denormalise and keep the database and the cache/nosql in sync. It's more work but it's worth it.

For instance they use Redis sets to maintain lists such as username-follows, europython-attendees and then they simply need to do a set intersection to get the information they want. These are only lists of ids so they don't take that much space.

Hashed static asset filenames in CloudFront

They created a management command to push static assets, that compresses Javascript, changes the names/urls, etc. This way they can publish them in advance, and also keep static files around if there's a need to rollback. The different names are also good to prevent Internet Explorer caching.

Challenges

This part of the talk is about things they don't really have answers for.

HTTP Requests

e.g. talking to an API: what if it fails or take 30 seconds? Do you use urllib? What if people enter private urls from within your Intranet? :O

You have to handle connection timeouts, logging and profiling, url validation, and http caching. All of these are a common set of problems that should be baked into the framework.

Profiling and debugging production problems

Debugging in development rocks, with the django-debug-toolbar, the way error 500 are handled, pdb, etc.

Once you turn debug to False, you're blind. After a while, all the bugs, particularly performance bugs, only happen in production.

He showed us a code snippet for a UserBasedExceptionMiddleware, that if you access the page throwing a 500 error and is_superuser is True, you will see a traceback, not the default 500 error (so if one of your users reports a problem, you can go to the page straight off and see a traceback).

At the database level, there is a handy tool called mysql-proxy that is customisable using Lua. Using a wonderful, horribly documented library called log.lua, you can for instance turn on logging for a couple of minutes when needed.

He created an app called django_instrumented (unreleased, until it's cleaned up) that collects statistics and sticks them into memcached. He has a special bookmark to access them, they are stored for 5 minutes only  -- so they waste neither space or time.

This actually helped improve the performance: if you measure something and make it visible, people will improve it over time.

0 downtime deployments

Code-wise it's easy enough to do, but when there are database changes it's tougher. Ideally they try to make schema changes backwards compatible, then use ./manage.py migrate (using South) on another web server.

Having a read-only mode made a lot of problems easier! It's not 0 downtime but the content is still readable. It can be a setting or a Redis key.

Feature flags work in the same way but at a more fine-grained level, for instance turning off search while you update your solr cluster. There's quite a bit more work involved.

One lesson we keep on learning in Django

We went from one database to multi-databases, from one cache to multi-caches, from one haystack backend to multiple backends.

Debug is one single setting, that affects a lot of things.

The timezone setting also affects Apache log files.

The middleware concept is very powerful, but is executed on every single request: if there's a conditional it has to be done within the middleware.

Really, global settings should be flushed out of the project! They are evil settings that cannot be changed at runtime.

links

social