Two weeks ago I visited ApacheCon Europe together with some of my colleagues from edelight, being mostly interested in the Search and Big Data tracks.
We had a good time and learnt quite a few things. It was also nice to see some familiar faces from Berlin Buzzwords. Isabel Drost-Fromm was her usual awesome self with her very down-to-earth Apache Mahout talk, and others were there too.
The most relevant talks for us were from companies relating their experiences solving hard search related problems, like semantic search, or having a good color filter, because they are problems we are either trying to solve, or will want to in the near future. Not only can you learn of good approaches to them, but even if you don’t learn anything new it can also be kind of a sanity check that reassures us that what we are doing makes sense, that it is at least good compared to the rest of the e-commerce industry. Further, the buzz around Solr 4 and Elasticsearch was bigger than ever, they are both pretty cool solutions.
Two worrying talks
There were also two talks that possibly highlight potential problems in some of the projects under the ASF umbrella.
Apache 2.4
The first one, by Rainer Jung, was about “What’s new in Apache HTTP Server 2.4”. The new features in the Apache Server proper, the very project that gave birth to the Apache Software Foundation, are very welcome. Even better configurability, logging improvements (finally), mod_sed…
There is a long list, but one of the biggest changes, the Event MPM becoming the new default, does not seem to be quite there. While the MPM was in Apache 2.2, marked as experimental, there has been a lot of work to make it ready for production use. The Event MPM supposedly negates nginx’s advantage, granted by its event driven, asynchronous architecture, but benchmarks so far don’t confirm that (though the ones I’ve seen until now are flawed one way or the other, benchmarking is hard!). It does not seem that it neither closes the performance gap for high concurrency nor, despite the release claims, it consume less memory. In any case it also has a big flaw: it doesn’t support SSL, which makes it irrelevant for an increasingly big number of users.
Rainer had a slide showing the project’s release history, which brings me to the “worrying” part. Apache 2.0 was released in the year 2000, 2.2 in 2005, and 2.4 in 2012, 7 years later, while managing to still power nearly two thirds of all sites, very impressive!
However, today’s environment is quite different, with smaller servers like nginx continuously growing market share (12% as of November 2012). It is only a matter of time until others have just enough functionality and battle-tested documentation for users to move to them in much greater numbers.
After the talk, I expressed my concerns to Rainer, and asked him whether there is going to be a change in the release policy. His answer was “there is no release policy”. If somebody (or most specifically, some company) works, or makes resources available to work on something big, and that something big is finished, there is a release. But he didn’t see currently a lot of activity going on that would indicate an earlier release. He could imagine something like newer developments on SPDY or HTTP 2.0 pushing a release to a two year frame, but that was it. To be honest the answer worries me even more, because not having a release policy *is* having a release policy, and one that doesn’t follow the Open Source Maxim “release early and release often”. I hope that changes in the future for the better.
Solr4
Very interesting was the Solr4 architecture talk. Now over the past couple of years Elasticsearch has been getting more and more attention, mainly because although Solr 3.x works wonderfully, it doesn’t offer a solution to the scaling problem beyond master/slave, while Elasticsearch is build from the ground up with sharding in mind.
Mark Miller concentrated on the “cloud architecture” part, that is, how sharding works. He explained how they evaluated several design possibilities, like having an all-leader architecture for each shard, but in the end went for a design that is, in all honesty, very similar to what Elasticsearch does. And that is great, because it makes sense. The main differences are not that relevant to the end user, for example using Zookeeper instead of the coordination being embedded like in Elasticsearch (that has both pros and cons).
Now, you would think that it would be pertinent, and even polite, to say that you are going for the same architecture than your “competitor”. And that Solr4 will be awesome in part because of that. Instead, as far as I can remember, Mark managed, in a 45 minutes talk, not to mention Elasticsearch even once.
Despite that and the overly-flashy slides, I still enjoyed some parts, like the Chaos Monkey, which was brilliant.
In Closing
At times, Apache projects, while rich and powerful, seem to live in a micro-world of their own, not connecting enough to the rest of Open Source or even ignoring existing projects and doing it their own way, also extremely Java-centric and company-centric,… On the other hand some of the projects solve very hard problems which nobody else solves that well if at all, and which by definition require cross-company alliances, so you could say that the ASF fulfils a unique role in the Open Source community.
Overall some of the talks made me think, which leaves me happy enough!