About this post

ABOUT: This entry was posted May 5, 2007 at 1:53 a.m. It is 775 words long, which, in case you're curious, translates to about 22 inches. There are currently 0 comments on this post. Click here to add your own.

SUMMARY: Backing up your databases is easy with S3, boto and mysqldump.

TAGS: Databases


Spread the love


Recent posts

Sunday, September 28th, 2008
In which I explain the advantages of using a popular fraud-detection tool in reporting.

Monday, March 17th, 2008
In which I describe how to use MySQL's spatial functions and Python to do point-in-polygon detection.

Sunday, July 1st, 2007
Some tidbits I've collected over the last month and a half.

Saturday, May 5th, 2007
Backing up your databases is easy with S3, boto and mysqldump.

Monday, April 30th, 2007
In which I plead with you to learn from our deployment mistakes.

Simple database backups with S3

Posted Saturday, May 5th, 2007 at 1:53 a.m.

Ever since I mistakenly deleted all the comments from my blog last year, I've been looking for a cheap but reliable way to ensure my databases are backed up somewhere safe. Today I finally decided to stop procrastinating and do something about it.

Using Boto, a juiced-up set of Python bindings for Amazon S3, I took an hour and rolled a small backup utility, both for myself and for some data at work. It wasn't rocket science, so I also decided to hook it up to Django. Now I've got an automated process that backs up my blog for an annual cost no highter than a Snicker's bar. Not a bad deal for reliable redundancy.

Here's how I did it:

S3, Boto and mysqldump

In case you've been on the moon this last year or so, you've heard of Amazon's S3: a dirt-cheap, infinitely scalable storage system set up on Amazon's servers. You only pay for what you use, to the tune of about $0.20 per gigabyte of storage and a little less for transfer. The servers are reliable, and your risk of data loss is practically zero.

In S3 terms, your data is stored in buckets, which can hold pretty much anything you want, carved into 5GB-or-less chunks. Each bucket is filled with key-value pairs, the keys being shorthand names for your files (or strings, or whatever else) and the values being the files themselves.

S3 comes with a set of lightweight Python bindings, but because I was strapped for time, I turned to the heavier but more user-friendly Boto bindings instead. Boto is a more well-developed Python interface to S3 that allows you to store and retrieve data with just a couple lines of code. It handles ugly stuff like mimetype guessing for you, so you only have to worry about shipping files to and from the servers.

In my case, the file I wanted to send was the output of the mysqldump utility -- a built-in MySQL tool that dumps your data into .sql files for backup purposes. The code I used looks something like this:

I turned it into a class because I intend to expand it, ideally to automate Subversion and other backups. And hell -- it's just easier to use this way. The meat of the code is in the backup_sql() method, which outputs the mysqldump to a string, zips it and sends it to S3. The get_sql() method pulls the dump down into a .zip archive of your choice.

Hooking into Django looks something like this:

As you can see, a simple import of the settings file gives you everything you need to back up your Django DBs. I added two new entries to settings.py as well: AWS_ACCESS_KEY_ID and AWD_SECRET_ACCESS_KEY, both of which you're assigned when you sign up for the S3 service. Now the script simply pulls its settings from my blog's settings.py and does its thing once a week, as directed by a cron job. That's it. No real magic involved.

The CAR implications are small but significant. RAID isn't failure-proof. If you maintain your own servers, IT might not back them up. S3 is a way to cover yourself on the cheap. Just don't upload anything you don't want subpoenaed.

Feel free to download my code and take it for a spin. You'll have to install Boto first.

Update: I changed the code samples above to reflect Adrian's suggestion.

Update2: I changed the code again to pipe the mysqldump ouput through bzip2, for brevity.

Post your comment

Optional