About this post

ABOUT: This entry was posted April 30, 2007 at 7:33 p.m. It is 995 words long, which, in case you're curious, translates to about 28 inches. There are currently 4 comments on this post. Click here to add your own.

SUMMARY: In which I plead with you to learn from our deployment mistakes.

TAGS: Django


Spread the love


Recent posts

Monday, March 17th, 2008
In which I describe how to use MySQL's spatial functions and Python to do point-in-polygon detection.

Wednesday, November 14th, 2007
In which I describe my last-minute entry to the Knight News Challenge.

Sunday, July 1st, 2007
Some tidbits I've collected over the last month and a half.

Saturday, May 5th, 2007
Backing up your databases is easy with S3, boto and mysqldump.

Monday, April 30th, 2007
In which I plead with you to learn from our deployment mistakes.

Django deployment lessons learned

Posted Monday, April 30th, 2007 at 7:33 p.m.

I had some time in the airport today to reflect on a couple Django projects we launched last week -- both of which we learned some tough lessons from. I can't speak for Brian, with whom I developed and launched these suckers, but I didn't have a lot of site deployment experience before this past month or so. And boy did I get a crash course.

First a little about the sites. Themaneater.com is the first Django site Brian and I developed (0.91 even). We did the whole thing, including the design (which incidentally explains why neither of us is a designer). Most of the coding was finished about a year ago, but we kept running into deployment issues, mainly absurdly long front-page load times and the occasional crash, so we kept it under wraps. With help from the talented Mr. Tigas -- the 'eater's current Web editor -- we finally got it figured out.

The Columbia Missourian was another matter, and in a lot of ways a much bigger mess. When we launched Vox back in November, things went off without a hitch -- surprising because we did almost no performance tuning, largely because we didn't know how. When we tried to launch the Missourian the same way a couple weeks ago, it crashed like Evel Knievel.

Which returns us to the subject at hand: Django deployment lessons. The short version is that Jacob, Adrian and the Django book are right. The long version is below:

Learn from our mistakes

1. Be mindful of your queries and indexes

Headlining the "no duh" category is the idea that database efficiency is important. Slow queries and poor indexing can blow your site to smithereens. I know because it happened to us.

Fortunately, Django's database API smartly creates efficient SQL. What it doesn't save you from is launching redundant queries, or evaluating entire querysets for small bits of information. When you're done writing your code, reread and cut as much as you can. And after that, BE ABSOLUTELY SURE to set up indexes on the fields you hit frequently. In our case, that meant mostly slug and date fields, although there were others. I believe MySQL indexes are handled via B-trees, which in geek terms can execute searches in O(log(n)) time vs. the O(n) time of a non-indexed search (someone please correct me if I'm wrong). That's many orders of magnitude faster and well worth the switch.

Optimizing your queries and indexes seems pretty obvious, but we simply overlooked it at first. Don't do the same unless you enjoy headaches.

2. Separate static from dynamic content

I admit I didn't think this would make a difference, but I was dead wrong. As the Django docs say, don't use the same Apache instance to serve both static and dynamic content. Save it for Django and use a server like lighttpd to handle your media files. In an ideal world, you'd have a separate machine for that stuff. In the real world, you can run them as separate processes on the same box.

Apache can have a nasty habit of sucking up memory as it serves static files. By comparison, lighttpd has a much smaller footprint, especially if you strip it down to its bare essentials. By transferring media serving duties to a different server, you also have the benefit of being able to turn off Apache's KeepAlive directive with minimal cost. KeepAlive, which allows multiple requests to be sent over the same TCP connection, can hog RAM like nothing else. We saw huge performance gains by shutting it off.

3. Hit the database as little as possible

If database queries create significant overhead, it would make sense to execute them as infrequently as possible. And the best way to ensure that is to implement caching.

Django has an excellent caching framework that truly is, as the docs say, "as granular as you need". For the Missourian, we cached a lot of our big-time front-page queries and cut our average load time dramatically. The docs recommend Memcached, which no doubt is the fastest caching system you'll find. We used file system caching because it worked for our purposes inside our limited hardware setup. Even though it's the slowest system available, it still made a world of difference.

4. Know your hardware

Naively, I had never seriously considered hardware constraints in development before these Missourian/Maneater nightmares. Pay very close attention to the hardware you're using and know how best to take advantage of it. Themaneater.com, for example, is hosted on a Virtual Private Server without a lot of RAM, so we had to chop out RAM-intensive processes in favor of alternatives elsewhere. The talented Mr. Tigas, on a lark, decided to switch our Apache MPM from prefork, which uses a ton of RAM, to worker, which uses less RAM but relies on a thread-safe stack. Combined with the other changes, the performance gains we saw were enough to green-light the launch.

On the Missourian side, we're running the site on a production machine that already hosts several other sites. RAM use is high, which is one of the reasons we used filesystem caching rather than storing the cache in memory. It's also a big reason we opted for lighttpd media serving and the killing of the KeepAlive directive.

More to come

All in all, I'm pretty happy with how things turned out, especially considering that we don't have a single full-time professional on our dev team. We still have a LOT of cleanup to finish, and we held back on launching some pretty cool features that we'll instead roll out over the summer. But the way we've started structuring data (babies/marriages/obits, crime, etc.) and the beginnings of our community interaction features are, at least to me, the essence of small-town Web-driven journalism.

Just two more sites to go ...

Comments | Post yours

  1. Michael Trier 4:04 p.m. on June 2

    Awesome info. Thanks.

  1. Tom White 12:48 p.m. on June 4

    Great post.

    Question: When you switched your Apache MPM from “prefork” to “worker”, did you have to change any other part of your Django environment? Did it affect any other virtual domains or apps that you are running? Or was it a straight switch without any other changes?

    Thanks.

  1. Dipankar Sarkar 1:19 p.m. on October 23

    Maybe you should try using nginx instead of apache2… its easy on the memory…

  1. Jeremy Dunck 8:32 a.m. on February 12

    FWIW, the Django book is out, with (somewhat) updated chapter on deployment:
    http://www.djangobook.com/en/1.0/chapter20/

    Also, Simon Willison is running a pretty popular site on a Linux VPS (so memory-constrained). He’s using nginx as a reverse proxy to wsgi (CherryPy? I dunno, some WSGI daemon.)

Post your comment

Optional