About this post
ABOUT: This entry was posted April 30, 2007 at 7:33 p.m. It is 995 words long, which, in case you're curious, translates to about 28 inches. There are currently 4 comments on this post. Click here to add your own.
SUMMARY: In which I plead with you to learn from our deployment mistakes.
TAGS: Django
Spread the love
- subscribe to its comments
- bookmark it on del.icio.us
- digg it
- bookmark it on ma.gnolia
- seed it to newsvine
- see who is bookmarking it
Recent posts
Django deployment lessons learned
I had some time in the airport today to reflect on a couple Django projects we launched last week -- both of which we learned some tough lessons from. I can't speak for Brian, with whom I developed and launched these suckers, but I didn't have a lot of site deployment experience before this past month or so. And boy did I get a crash course.
First a little about the sites. Themaneater.com is the first Django site Brian and I developed (0.91 even). We did the whole thing, including the design (which incidentally explains why neither of us is a designer). Most of the coding was finished about a year ago, but we kept running into deployment issues, mainly absurdly long front-page load times and the occasional crash, so we kept it under wraps. With help from the talented Mr. Tigas -- the 'eater's current Web editor -- we finally got it figured out.
The Columbia Missourian was another matter, and in a lot of ways a much bigger mess. When we launched Vox back in November, things went off without a hitch -- surprising because we did almost no performance tuning, largely because we didn't know how. When we tried to launch the Missourian the same way a couple weeks ago, it crashed like Evel Knievel.
Which returns us to the subject at hand: Django deployment lessons. The short version is that Jacob, Adrian and the Django book are right. The long version is below:
Learn from our mistakes
1. Be mindful of your queries and indexes
Headlining the "no duh" category is the idea that database efficiency is important. Slow queries and poor indexing can blow your site to smithereens. I know because it happened to us.
Fortunately, Django's database API smartly creates efficient SQL. What it doesn't save you from is launching redundant queries, or evaluating entire querysets for small bits of information. When you're done writing your code, reread and cut as much as you can. And after that, BE ABSOLUTELY SURE to set up indexes on the fields you hit frequently. In our case, that meant mostly slug and date fields, although there were others. I believe MySQL indexes are handled via B-trees, which in geek terms can execute searches in O(log(n)) time vs. the O(n) time of a non-indexed search (someone please correct me if I'm wrong). That's many orders of magnitude faster and well worth the switch.
Optimizing your queries and indexes seems pretty obvious, but we simply overlooked it at first. Don't do the same unless you enjoy headaches.
2. Separate static from dynamic content
I admit I didn't think this would make a difference, but I was dead wrong. As the Django docs say, don't use the same Apache instance to serve both static and dynamic content. Save it for Django and use a server like lighttpd to handle your media files. In an ideal world, you'd have a separate machine for that stuff. In the real world, you can run them as separate processes on the same box.
Apache can have a nasty habit of sucking up memory as it serves static files. By comparison, lighttpd has a much smaller footprint, especially if you strip it down to its bare essentials. By transferring media serving duties to a different server, you also have the benefit of being able to turn off Apache's KeepAlive directive with minimal cost. KeepAlive, which allows multiple requests to be sent over the same TCP connection, can hog RAM like nothing else. We saw huge performance gains by shutting it off.
3. Hit the database as little as possible
If database queries create significant overhead, it would make sense to execute them as infrequently as possible. And the best way to ensure that is to implement caching.
Django has an excellent caching framework that truly is, as the docs say, "as granular as you need". For the Missourian, we cached a lot of our big-time front-page queries and cut our average load time dramatically. The docs recommend Memcached, which no doubt is the fastest caching system you'll find. We used file system caching because it worked for our purposes inside our limited hardware setup. Even though it's the slowest system available, it still made a world of difference.
4. Know your hardware
Naively, I had never seriously considered hardware constraints in development before these Missourian/Maneater nightmares. Pay very close attention to the hardware you're using and know how best to take advantage of it. Themaneater.com, for example, is hosted on a Virtual Private Server without a lot of RAM, so we had to chop out RAM-intensive processes in favor of alternatives elsewhere. The talented Mr. Tigas, on a lark, decided to switch our Apache MPM from prefork, which uses a ton of RAM, to worker, which uses less RAM but relies on a thread-safe stack. Combined with the other changes, the performance gains we saw were enough to green-light the launch.
On the Missourian side, we're running the site on a production machine that already hosts several other sites. RAM use is high, which is one of the reasons we used filesystem caching rather than storing the cache in memory. It's also a big reason we opted for lighttpd media serving and the killing of the KeepAlive directive.
More to come
All in all, I'm pretty happy with how things turned out, especially considering that we don't have a single full-time professional on our dev team. We still have a LOT of cleanup to finish, and we held back on launching some pretty cool features that we'll instead roll out over the summer. But the way we've started structuring data (babies/marriages/obits, crime, etc.) and the beginnings of our community interaction features are, at least to me, the essence of small-town Web-driven journalism.
Just two more sites to go ...

Comments | Post yours
Post your comment