Benchmarking MongoDB

At work, I have started the implementation of a new data import backend. The current one has worked well, but as the number of products in the tracdelight network grows into the millions, the time needed to update the product list for every advertiser grows and the MySQL DB can no longer keep pace with it.

The new data import should be able to import or update a million products much quicker than the current 2 hours needed, and it should be able to keep scaling. MongoDB seems like a good candidate because it should be much faster than SQL. The new import will also be able to be executed as distributed jobs.

The new product store backend being written in Python, what abstraction should we use on top of MongoDB?. Working directly with PyMongo is an option, but usually an Object Document Mapper (like SQLAlchemy or Django ORM) is preferable.

Continue reading