The Django ORM is really great. It allows you to design and code your web application in a very simple and readable way.
Some projects have special needs, or are very demanding on the DB, and in that case the ORM can become a bottleneck. Fortunately, Django has different ways to avoid getting in your way.
Inserting thousands of entries in the DB
After the Overmind 0.1 release, I have implemented the caching of provider images, so that the node creation form loads instantly. Once implemented though, a problem quickly became apparent: Amazon’s EC2 has nearly 3000 AMI images. Saving all of them using the Django ORM is extremely slow. It takes 10 minutes!
While Overmind will get a work queue and the user won’t have to wait after creating a new cloud provider, 10 minutes is way beyond reasonable. Then again, I really don’t want to start entering SQL code in this early stage of Overmind development, as we all know that “Premature optimization is the root of all evil”.
Fortunately, Django has a nifty and often overlooked feature: transactions.
When you look for a solution to bulk inserts with Django, most answers recommend SQL statements and when someone proposes using transactions, they answer that they are not real bulk inserts and they won’t bring very big performance improvements. Let us put that to the test.
The tests consists in creating an EC2 provider in Overmind, and meassure how long it takes to complete the action. In the process, the nearly 3000 EC2 images will be imported into the local Overmind SQLite DB.
The simple Image model:
class Image(models.Model): '''OS image model''' image_id = models.CharField(max_length=20) name = models.CharField(max_length=30) provider = models.ForeignKey(Provider) favorite = models.BooleanField(default=False) def __unicode__(self): return self.name class Meta: unique_together = ('provider', 'image_id')
The test has been run 3 times with and without transactions. To that end, the only change was adding a decorator to the Provider. import_images() method:
... @transaction.commit_on_success() def import_images(self): ...
|without transactions:||600 seconds (10 minutes)|
|with transactions:||16 seconds|
The results are outstanding. This is not a trivial improvement, it takes only 2.7% of the time it needed before!
If you want more performance, there are some custom solutions for real bulk saving, and if all else fails, Django even lets you enter custom raw SQL statements. django.db.transaction though, should be a beautifully simple and effective solution for a lot of projects.
: Famous quote by Donald Knuth