Solving The Software Engineer Shortage

Market research studies continuously appear decrying the shortage of engineers that we are suffering, specially in the software development field, even as not everyone agrees. To compound the problem, there also seems to be no way of getting more girls interested in programming.

Even though everybody knows that software is eating the world, programming is still not the number one career kids want to pursue, and who are we to blame them? Compared to a firefighter or an astronaut, how do we explain what we do? Typing and staring at black consoles all day long? Mighty impressive.

Continue reading

Beyond Continuous Integration

This is an excerpt from my original article published on about the new improved CI infrastructure at edelight.

Over the past decade continuous integration (CI) has become a staple of the web software development workflow and release process, so much so that it is nigh on unthinkable for a team to do without. While the benefits of CI itself is a matter long settled, continuous deployment can be a controversial topic, with some teams reporting great success and others not wanting to touch it with a ten foot pole.

You can continue reading at


ApacheCon EU Reflections

Two weeks ago I visited ApacheCon Europe together with some of my colleagues from edelight, being mostly interested in the Search and Big Data tracks.

We had a good time and learnt quite a few things. It was also nice to see some familiar faces from Berlin Buzzwords. Isabel Drost-Fromm was her usual awesome self with her very down-to-earth Apache Mahout talk, and others were there too.

The most relevant talks for us were from companies relating their experiences solving hard search related problems, like semantic search, or having a good color filter, because they are problems we are either trying to solve, or will want to in the near future. Not only can you learn of good approaches to them, but even if you don’t learn anything new it can also be kind of a sanity check that reassures us that what we are doing makes sense, that it is at least good compared to the rest of the e-commerce industry. Further, the buzz around Solr 4 and Elasticsearch was bigger than ever, they are both pretty cool solutions.

Continue reading

Micro-Pattern for abstracting PyMongo

When writing a database driven application, there is a need to abstract SQL calls away. Why? Isn’t SQL syntax simple enough?. It is, but when your application is basking in the glory that is Python code😉, SQL seems like some arcane, clumsy dialect prone to obvious (SLQ injection anyone) and not-so-obvious security issues.

With a NoSQL DB like MongoDB, that may not be the case. While it is nice to have Django-like objects for your documents, such as MongoEngine provides, if write/update performance is critical, it may not be the best route, as my last experiments showed. But after deciding to use PyMongo directly (a reader pointed out that MongoKit allows the direct use of PyMongo, so check out if that framework suites your needs), and working with it a bit, I felt the need for at least some tiny abstraction layer. Why?

Continue reading

Benchmarking MongoDB

At work, I have started the implementation of a new data import backend. The current one has worked well, but as the number of products in the tracdelight network grows into the millions, the time needed to update the product list for every advertiser grows and the MySQL DB can no longer keep pace with it.

The new data import should be able to import or update a million products much quicker than the current 2 hours needed, and it should be able to keep scaling. MongoDB seems like a good candidate because it should be much faster than SQL. The new import will also be able to be executed as distributed jobs.

The new product store backend being written in Python, what abstraction should we use on top of MongoDB?. Working directly with PyMongo is an option, but usually an Object Document Mapper (like SQLAlchemy or Django ORM) is preferable.

Continue reading