Processing Data for Search

This is an excerpt from my original article published on

When preparing external product data for search, you have to decide how much you want to pre-process and how much you want to do at query time on the Lucene level (Solr/Elasticsearch).

Basically your data import saves data into your storage backend, which then gets synced into a search index:


You can continue reading at


ApacheCon EU Reflections

Two weeks ago I visited ApacheCon Europe together with some of my colleagues from edelight, being mostly interested in the Search and Big Data tracks.

We had a good time and learnt quite a few things. It was also nice to see some familiar faces from Berlin Buzzwords. Isabel Drost-Fromm was her usual awesome self with her very down-to-earth Apache Mahout talk, and others were there too.

The most relevant talks for us were from companies relating their experiences solving hard search related problems, like semantic search, or having a good color filter, because they are problems we are either trying to solve, or will want to in the near future. Not only can you learn of good approaches to them, but even if you don’t learn anything new it can also be kind of a sanity check that reassures us that what we are doing makes sense, that it is at least good compared to the rest of the e-commerce industry. Further, the buzz around Solr 4 and Elasticsearch was bigger than ever, they are both pretty cool solutions.

Continue reading