Living in the Era of Hadoop and Large-Scale Data

Written by David Menninger | Mar 3, 2011 5:47:05 AM

It’s clear that now we are living in the era of big data. The stores of data on which modern businesses rely are already vast and increasing at an unprecedented pace. Organizations are capturing data at deeper levels of detail and keeping more history than they ever have before. Managing all of the data is thus emerging as one of the key challenges of the new decade.

The solutions to this challenge vary, but interest in them seems to be universal. The largest database vendors and others that wish to compete with them are developing or acquiring various technologies, among them database appliances, massively parallel databases, and columnar databases.

I have written recently about another contender in this arena: an open source, parallel processing technique named Apache Hadoop, which has gained popularity as a way to deal with very large sets of data. The rise of Hadoop has been dramatic. It has been successfully deployed at some of the largest Internet-based organizations in the world, including eBay, Facebook, Google and Yahoo. Seeing this, other organizations whose business depends on managing large amounts of data have begun to explore Hadoop as well.

Last October 12 in New York City, Hadoop World attracted approximately 900 attendees. Local Hadoop user groups are popping up around the world and attracting hundreds of users . The Hadoop User Group on LinkedIn has more than 4,000 members. More and more software vendors area adding support for Hadoop. Daniel Abadi, who has done extensive work on a project called HadoopDB , recently received funding for a big-data start-up company.

However, because the level of interest in Hadoop has grown so rapidly, there is little research on the business applications that use Hadoop and the IT requirements necessary to support a successful Hadoop deployment. Many audiences could benefit from independent research, information and analysis in this area. CIOs considering Hadoop need more information about the business case and the characteristics of successful deployments. Organizations developing Hadoop and related software products need to understand better the characteristics of the developing markets they seek to serve. And software companies looking to incorporate Hadoop need guidance on how best to adapt their own product lines.

Another consequence of the dramatic rise in popularity of Hadoop is much confusion among vendors as well as buyers. The rapidly changing market situation makes it difficult for vendors to determine how to allocate investments in Hadoop in ways that will maximize revenues. Without reliable information, vendors are left to guess at what potential customers need, and so they risk creating a flawed strategy based on inaccurate assumptions about the market.

To help address these issues, Ventana Research is conducting benchmark research that will gather data on the enterprise use of Hadoop, its key components and related technologies; we expect it to provide previously unavailable insights. The research will establish how Hadoop is being used and for what business purposes. It also will explore whether Hadoop is replacing existing technologies and, if so, which ones. In addition, this program will identify benefits that organizations are realizing from the use of Hadoop as well as issues they have encountered.

Background about this benchmark research can be found here as will the results once they are available and if you are looking to participate. Whether you are working with Hadoop or using some other technology to tackle your big-data challenges, please participate in this research to help advance the state of the industry with respect to managing and analyzing large-scale data.

Regards,

David Menninger – VP & Research Director

View full post