Cloudera’s recent Hadoop World 2011 event confirmed that the world of big data is getting even bigger. As I wrote of last year’s event, Hadoop, the open source large-scale data processing technology, has gone mainstream. And while 75% of the audience attended this year for the first time and so may not have realized the breadth of Hadoop’s acceptance, statistics announced in the opening keynote show widespread use of it. Mike Olson, Cloudera CEO, reported that the event was sold out, with 1,400 attendees from 580 organizations and 27 countries. In independent confirmation, our benchmark research shows that 54% of organizations are either using or evaluating Hadoop for their big-data needs.
Before or during Hadoop World, several vendors made announcements that further reinforced the growth of the market. Cloudera announced it has raised an additional $40 million to expand its operations. Oracle, a sponsor of the event, introduced its Big Data Appliance, which includes Hadoop. NetApp announced a partnership with Cloudera to provide a preconfigured, appliance-like solution called NetApp Open Solution for Hadoop. Hortonworks, a Cloudera competitor, unveiled its own distribution of Hadoop called Hortonworks Data Platform.
Despite these announcements, one of the main impressions I took away from the event is that this emperor has no clothes. If you recall the story, the fact that the emperor had no clothes was not a metaphor of his authority but a question, literally and figuratively, about whom and what he had surrounded himself with. By many accounts, Hadoop is in a similar situation. It is a powerful but immature technology that has grown popular despite its shortcomings. In his keynote, Olson acknowledged this, saying that it’s not enough to provide a platform for Java developers. Doug Cutting, Cloudera’s architect and creator of Hadoop, described it as the kernel of a distributed operating system for big data. Not many end users are prepared to work directly with the kernel of an operating system.
Those users need the technology to be easier to handle and more broadly accessible. This issue shows up in our research finding that staffing and training are the two biggest obstacles to analyzing large-scale data sets with Hadoop. MapR, another Cloudera competitor with its own Hadoop distribution, is trying to capitalize on the need to overcome this obstacle by offering free training resources. And some vendors recognize the need to surround Hadoop with “better clothes.” Karmasphere and Datameer provide tools that make Hadoop easier to use in the analytics process. Informatica recently announced HParser, which makes it easier to parse the unstructured data often collected and analyzed with Hadoop. I spoke with other vendors at the event that are still in stealth mode, but we can expect continued development in the Hadoop ecosystem. Giving this trend momentum, venture capitalist Accel Partners announced a $100 million fund to invest in big-data companies.
This market is evolving with many moving parts. Hadoop is not one thing, but a collection of multiple projects. Hadoop’s distributed file system (HDFS) and MapReduce have been the cornerstones of Hadoop adoption; the majority of organizations use those two components, our research confirms. HBase, a columnar database built on HDFS, received a lot of attention at the event and was the subject of several presentations, including one about Facebook using HBase for real-time data access. Attendees were offered a free copy of HBase: The Definitive Guide, and several commented that perhaps the event should have been called HBase World.
So while the emperor has no clothes, there are plenty of tailors making suits. I expect increasing competition among different distributions of Hadoop and among existing and new tool vendors trying to make Hadoop easier to use. The advantage of all this interest in Hadoop is that the open source community is aware of many of the platform’s issues and is working to resolve them. The disadvantage is that with all the separate components and so many competitors that it will continue to be a confusing landscape until the market matures further.
David Menninger – VP & Research Director