Earlier this week EMC announced it will create its own distribution for Apache Hadoop. Hadoop provides distributed computing capabilities that enable organizations to process very large amounts of data quickly. As I have written previously, the Hadoop market continues to grow and evolve. In fact, the rate of change may be accelerating. Let’s start with what EMC announced and then I’ll address what the announcement means for the market.
EMC announced three new offerings, slated for the third quarter of 2011, that leverage its acquisition of Greenplum last year, ranging from an open source version to incorporation in its data warehouse appliance.
The EMC Greenplum HD Community Edition is a free, open source version of the Apache Hadoop stack comprising HDFS, MapReduce, Zookeeper, Hive and HBase. EMC extends Hadoop with fault tolerance for the Name Node and Job Tracker, both of which are well-known points of failure in standard Hadoop implementations.
The EMC Greenplum HD Enterprise Edition, interface-compatible with the Apache Hadoop stack, provides several additional features including snapshots, wide-area replication, a Network File System (NFS) interface and some management tools. EMC also claims performance increases of two to five times the performance over standard packaged versions of Apache Hadoop.
The EMC Greenplum HD Data Computing Appliance integrates Apache Hadoop with the Greenplum database and computing hardware. The appliance configuration provides SQL access and analytics to Hadoop data residing on the Hadoop Distributed File System (HDFS) as external tables, eliminating the need to materialize the data in the Greenplum database.
Until now Cloudera has dominated the emerging commercial Hadoop market and faced little or no competition since it introduced the Cloudera Distribution for Hadoop (CDH). The EMC announcements are both good and bad news for Cloudera. On the one hand they suggest – you might even say validate – that Cloudera has chosen a valuable market. EMC seems to be willing to invest heavily to try to get a share of it. On the other hand, Cloudera now faces a competitor that has significant resources. For customers competition is generally a good thing, of course, as it pushes vendors to innovate and improve their products to win more business.
EMC’s approach to the market differs dramatically from IBM’s strategy. IBM announced on Twitter at its Big Data Symposium held this week that it is putting all its weight behind Apache Hadoop in the hope of avoiding the fragmentation that plagued the UNIX market for years. EMC’s Enterprise Edition promises to tackle issues well known to the Hadoop market, but EMC faces competition from others who are also tackling these issues. If lower-cost or free competitive offerings adequately address these issues it could seriously undercut the market for EMC’s Enterprise Edition. While EMC brings more enterprise credentials to the Hadoop market than Cloudera, it has less experience with Hadoop. Multiple vendors are attempting to bring enterprise class capabilities to Hadoop, and it’s too soon to see who will succeed. However, overall, the Hadoop market will benefit from all the attention and investment.
I find it interesting and a little ironic that prior to its acquisition by EMC, Greenplum (along with Aster Data, now part of Teradata) helped popularize MapReduce, one of Hadoop’s most commonly used components, by embedding MapReduce as part of its databases. These proprietary implementations could be credited with helping to bring Hadoop into the mainstream big-data market because they combined data warehousing with MapReduce. It spawned a debate in which database guru Mike Stonebraker at first dismissed MapReduce and then embraced it. The debate attracted attention, a key ingredient in building any new market. Now EMC Greenplum completes the circle by embracing Hadoop.
To its credit, EMC aligned a dozen partners around these announcements, creating an ecosystem of third-party products and services. Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, MicroStrategy, Pentaho, SAS, SnapLogic, Talend and VMware all announced their support for the EMC products in one form or another. Most of these companies also partner with Cloudera, so this is a good move but not a coup for EMC.
The Hadoop market continues to evolve. We are now analyzing the data collected in our benchmark research on the state of the large-scale or now called the big data market, including Hadoop. Stay tuned for the results. It will be interesting to see where the market ends up. I expect more changes and innovation driven in part by the increased competition.
The Hadoop market is no longer a one-elephant race.
Regards,
David Menninger – VP & Research Director

LinkedIn
Twitter
Facebook Fan Page
Ventana Research Website
10 comments
Comments feed for this article
May 14, 2011 at 5:01 am
Alternatives for Hadoop/MapReduce data storage and management | DBMS 2 : DataBase Management System Services
[...] EMC’s Data Computing Division, nee’ Greenplum, made a lot of Hadoop noise this week. Unlike Yahoo, IBM, and Cloudera, EMC really is forking Hadoop. I’m not talking with the EMC/Greenplum folks these days, but the whole thing was covered from various angles by Lucas Mearian, Doug Henschen, Derrick Harris, and Dave Menninger. [...]
May 18, 2011 at 3:07 pm
IBM Chooses Hadoop Unity, Not Shipping the Elephant «
[...] behind the Apache Hadoop project – a stark contrast to EMC’s decision to do exactly that, announced earlier in the week. As an indication of IBM’s approach, Anant Jhingran, vice president and CTO for information [...]
May 18, 2011 at 4:59 pm
IBM Chooses Hadoop Unity, Not Shipping the Elephant : Php App Engine
[...] behind the Apache Hadoop project – a stark contrast to EMC’s decision to do exactly that, announced earlier in the week. As an indication of IBM’s approach, Anant Jhingran, vice president and CTO for information [...]
May 26, 2011 at 11:21 pm
SAP’s Opens Road for HANA and Big Data at SAPPHIRE NOW «
[...] market, where a lot is going on now. A main focus has been on the open source application Hadoop. Commercialization of Hadoop by providers such as EMC and Cloudera illustrates that we are living in the era of large-scale data. IBM is focusing on the [...]
June 20, 2011 at 5:21 pm
Cloudera Supports Hadoop with New Distribution and Enterprise Version «
[...] Hadoop market has attracted attention in the form of alternatives to Hadoop both direct, such as EMC offering its own distribution of Hadoop, and indirect, such as LexisNexis offering an open source version of its high-performance cluster [...]
June 29, 2011 at 6:48 am
Yahoo Spins Out Hadoop to Create Hortonworks «
[...] also indicated its support for the Apache distribution of Hadoop. Other vendors including Cloudera, EMC Greenplum and MapR have announced their own distributions of Apache Hadoop, rather than relying solely on the [...]
July 1, 2011 at 7:23 pm
Two Elephants Face Off: Hadoop and Oracle «
[...] technology ecosystem around big data. For example, check out these ones we have analyzed: EMC, Pentaho, Informatica and even IBM. Smaller providers like Cloudera are making Hadoop safer for [...]
October 15, 2011 at 11:33 am
Teradata: All in the Family of Appliances and Big Data «
[...] Aster MapReduce appliance will compete with other recently announced MapReduce appliances from EMC Greenplum and Oracle. However, the Aster version is based on its own, patented SQL-MapReduce implementation [...]
January 3, 2012 at 4:13 pm
What Enterprises Can Learn from Major Events and Surprises in 2011 «
[...] called Hortonworks. MapR entered the market with its own distribution of Hadoop that is part of EMC Greenplum’s big data offerings. Even Oracle announced its intention to distribute Hadoop as part of a big-data appliance [...]
April 16, 2012 at 2:05 pm
IBM Chooses Hadoop Unity; Not Shipping the Elephant «
[...] behind the Apache Hadoop project – a stark contrast to EMC’s decision to do exactly that, announced earlier in the week. As an indication of IBM’s approach, Anant Jhingran, vice president and CTO for information [...]