Solving The Software Engineer Shortage

May 29, 2014Miquel

Market research studies continuously appear decrying the shortage of engineers that we are suffering, specially in the software development field, even as not everyone agrees. To compound the problem, there also seems to be no way of getting more girls interested in programming.

Even though everybody knows that software is eating the world, programming is still not the number one career kids want to pursue, and who are we to blame them? Compared to a firefighter or an astronaut, how do we explain what we do? Typing and staring at black consoles all day long? Mighty impressive.

Continue reading →

Beyond Continuous Integration

June 30, 2013Miquel

This is an excerpt from my original article published on labs.edelight-group.com about the new improved CI infrastructure at edelight.

Over the past decade continuous integration (CI) has become a staple of the web software development workflow and release process, so much so that it is nigh on unthinkable for a team to do without. While the benefits of CI itself is a matter long settled, continuous deployment can be a controversial topic, with some teams reporting great success and others not wanting to touch it with a ten foot pole.

You can continue reading at labs.edelight-group.com

Processing Data for Search

January 26, 2013Miquel

This is an excerpt from my original article published on labs.edelight-group.com

When preparing external product data for search, you have to decide how much you want to pre-process and how much you want to do at query time on the Lucene level (Solr/Elasticsearch).

Basically your data import saves data into your storage backend, which then gets synced into a search index:

You can continue reading at labs.edelight-group.com

ApacheCon EU Reflections

November 19, 2012Miquel

Two weeks ago I visited ApacheCon Europe together with some of my colleagues from edelight, being mostly interested in the Search and Big Data tracks.

We had a good time and learnt quite a few things. It was also nice to see some familiar faces from Berlin Buzzwords. Isabel Drost-Fromm was her usual awesome self with her very down-to-earth Apache Mahout talk, and others were there too.

The most relevant talks for us were from companies relating their experiences solving hard search related problems, like semantic search, or having a good color filter, because they are problems we are either trying to solve, or will want to in the near future. Not only can you learn of good approaches to them, but even if you don’t learn anything new it can also be kind of a sanity check that reassures us that what we are doing makes sense, that it is at least good compared to the rest of the e-commerce industry. Further, the buzz around Solr 4 and Elasticsearch was bigger than ever, they are both pretty cool solutions.

Continue reading →

Micro-Pattern for abstracting PyMongo

May 13, 2011Miquel

When writing a database driven application, there is a need to abstract SQL calls away. Why? Isn’t SQL syntax simple enough?. It is, but when your application is basking in the glory that is Python code ;-), SQL seems like some arcane, clumsy dialect prone to obvious (SLQ injection anyone) and not-so-obvious security issues.

With a NoSQL DB like MongoDB, that may not be the case. While it is nice to have Django-like objects for your documents, such as MongoEngine provides, if write/update performance is critical, it may not be the best route, as my last experiments showed. But after deciding to use PyMongo directly (a reader pointed out that MongoKit allows the direct use of PyMongo, so check out if that framework suites your needs), and working with it a bit, I felt the need for at least some tiny abstraction layer. Why?

Continue reading →

Benchmarking MongoDB

February 28, 2011Miquel 8 Comments

At work, I have started the implementation of a new data import backend. The current one has worked well, but as the number of products in the tracdelight network grows into the millions, the time needed to update the product list for every advertiser grows and the MySQL DB can no longer keep pace with it.

The new data import should be able to import or update a million products much quicker than the current 2 hours needed, and it should be able to keep scaling. MongoDB seems like a good candidate because it should be much faster than SQL. The new import will also be able to be executed as distributed jobs.

The new product store backend being written in Python, what abstraction should we use on top of MongoDB?. Working directly with PyMongo is an option, but usually an Object Document Mapper (like SQLAlchemy or Django ORM) is preferable.

Continue reading →

Bulk-like inserts with Django

November 23, 2010Miquel 8 Comments

The Django ORM is really great. It allows you to design and code your web application in a very simple and readable way.

Some projects have special needs, or are very demanding on the DB, and in that case the ORM can become a bottleneck. Fortunately, Django has different ways to avoid getting in your way.
Continue reading →

Simply Code

Solving problems with code, in a simple way

Widgets

Search

Solving The Software Engineer Shortage

Beyond Continuous Integration

Processing Data for Search

ApacheCon EU Reflections

Micro-Pattern for abstracting PyMongo

Benchmarking MongoDB

Bulk-like inserts with Django