Micro-Pattern for abstracting PyMongo

When writing a database driven application, there is a need to abstract SQL calls away. Why? Isn’t SQL syntax simple enough?. It is, but when your application is basking in the glory that is Python code ;-), SQL seems like some arcane, clumsy dialect prone to obvious (SLQ injection anyone) and not-so-obvious security issues.

With a NoSQL DB like MongoDB, that may not be the case. While it is nice to have Django-like objects for your documents, such as MongoEngine provides, if write/update performance is critical, it may not be the best route, as my last experiments showed. But after deciding to use PyMongo directly (a reader pointed out that MongoKit allows the direct use of PyMongo, so check out if that framework suites your needs), and working with it a bit, I felt the need for at least some tiny abstraction layer. Why?


Interfacing with MongoDB is very easy. You want to insert a product in a collection called ‘products’?:

from pymongo import Connection

connection = Connection()
db = connection['mydb']

db.products.insert({'article_number': '100', 'name': 'Cool Shoes'})

It is even too easy. Let’s insert a new product:

db.productts.insert({'article_number': '101', 'name': 'Sunglasses Aviator'})

What’s wrong? A typo will cause your new product to be inserted to a wrong collection named “productts”. No, you have not previously defined any such collection, but PyMongo will create it on the spot, potentially leading to bad, nasty bugs. For this and other reasons, you need to abstract your MongoDB collections, which I do with some light-weight classes:

from pymongo import Connection

connection = Connection()
db = connection['mydb']
Product.ensure_indexes()

class Product(object):
    collection = db.products

    def __init__(self, data):
        self.data = data

    @staticmethod
    def ensure_indexes(self):
        Product.collection.ensure_index(
            ('article_number', ASCENDING), unique=True)

    def save(self):
        current_date = get_timestamp()
        o = self.collection.find_one()
        self.collection.update(
            {'article_number': self.data['article_number']},
            {**self.data}, upsert=True, safe=True)

Now the error described previously won’t ever happen again, because you will always use your (non-instanciated) class:

Product.collection.insert(
    {'article_number': '101', 'name': 'Sunglasses Aviator'})

or

Product.collection.find_one({'article_number': '101'})

And you can build on your class as you like to add validation, timestamps, indexes, or any nicety you may need. In my example you can save a new document in this familiar way:

p = Product({'article_number': '101', 'name': 'Sunglasses Aviator'})
p.save()

Of course, if you see that your wrapper classes grow in complexity and/or start impacting write speed, you should better use MongoEngine or MongoKit. But I think that this solution strikes a nice middle ground.

Advertisements