Micro-Pattern for abstracting PyMongo

When writing a database driven application, there is a need to abstract SQL calls away. Why? Isn’t SQL syntax simple enough?. It is, but when your application is basking in the glory that is Python code ;-), SQL seems like some arcane, clumsy dialect prone to obvious (SLQ injection anyone) and not-so-obvious security issues.

With a NoSQL DB like MongoDB, that may not be the case. While it is nice to have Django-like objects for your documents, such as MongoEngine provides, if write/update performance is critical, it may not be the best route, as my last experiments showed. But after deciding to use PyMongo directly (a reader pointed out that MongoKit allows the direct use of PyMongo, so check out if that framework suites your needs), and working with it a bit, I felt the need for at least some tiny abstraction layer. Why?

Continue reading

Benchmarking MongoDB

At work, I have started the implementation of a new data import backend. The current one has worked well, but as the number of products in the tracdelight network grows into the millions, the time needed to update the product list for every advertiser grows and the MySQL DB can no longer keep pace with it.

The new data import should be able to import or update a million products much quicker than the current 2 hours needed, and it should be able to keep scaling. MongoDB seems like a good candidate because it should be much faster than SQL. The new import will also be able to be executed as distributed jobs.

The new product store backend being written in Python, what abstraction should we use on top of MongoDB?. Working directly with PyMongo is an option, but usually an Object Document Mapper (like SQLAlchemy or Django ORM) is preferable.

Continue reading