There’s been some speculation in the market that Hadoop may be disappearing. Some of this speculation has been driven by vendors that have recently downplayed Hadoop in their marketing efforts. For example, the Strata+Hadoop World conference is now known as the Strata Data Conference. The Hadoop Summit is now known as the Dataworks Summit. In Cloudera’s S-1 filing with the SEC for its initial public offering, the term “Hadoop” appears only 14 times, while the term “machine learning” appears 83 times. So, if some of the vendors that created the market appear to be pivoting away from Hadoop, does your organization need to do something similar, or is there a role for Hadoop in your IT architecture?
In light of this speculation, I recently compiled findings from the last six years of research into Hadoop adoption rates and the ways in which organizations store big data. Ventana Research conducted one of the first research studies that looked at Hadoop adoption rates in 2011. Our most recent benchmark research, on data preparation, was just completed and will be published soon. In the six years in between, every data and analytics benchmark research undertaking we performed included questions about which technologies organizations use to store big data. I’ve included here a chart that shows the results over time.
The chart requires some explanation and some caveats. The orange line represents Hadoop deployment rates and the yellow line represents RDBMSs. For one thing, we didn’t always ask the question in the exact same way. For instance, in our Internet of Things benchmark research we asked how organizations store event data (not big data). The dip below 40 percent in RDBMS usage corresponds to one particular study, the IoT study, in which we asked how IoT event data, a form of big data, was stored. Still, the trend over time from nearly 90 percent RDBMS usage in 2011 to about 60 percent usage in 2017 represents a significant decline.
Also remarkable was the overall consistency of organizations’ adoption rates for Hadoop, which have hovered around 20 to 25 percent. In two instances, higher adoption rates were reported, but the benchmark research in both cases was specifically about big data, so the audience may have been biased toward Hadoop.
The chart also depicts a decline in Hadoop adoption rates by a few percentage points over time. Does that mean Hadoop has peaked and is now in decline? Perhaps. But I’d argue that the more significant trend depicted in the chart is the decline of RDBMS usage for big data. In our benchmark research we ask organizations to list all the technologies they use to store big data. It appears to me that Hadoop (and perhaps other technologies) are being used to manage data that used to be stored in RDBMSs.
What does this mean for you? Based on the research data and my conversations in the marketplace, Hadoop is here to stay and some of the workloads that were previously handled using RDBMSs are shifting to Hadoop and NoSQL technologies. RDBMSs still predominate but the research suggests that workloads are going to continue to be split among different environments.
You can help your organization by ensuring that end users have access to and can work with a variety of technologies. In addition, your ability to make Hadoop data easier to access and use will also make an impact, as our Big Data Analytics research shows that only one in six organizations have the Hadoop skills that they need.
Hadoop can be made more approachable in several ways. The research shows that users prefer to access big data using SQL and there are many SQL-on-Hadoop alternatives. There are also vendors such as AtScale, Kyvos and others that provide a semantic layer on top of Hadoop so your organization can access Hadoop with existing BI tools. Data virtualization vendors such as Denodo provide yet another approach that makes Hadoop data available via SQL and standard BI tools.
We continue to study big data deployments and are currently conducting Dynamic Insights research on the topic of data lakes. Please participate in this research as it will help us create a more accurate picture of the market that we can share with you in our future publications. As I have written previously, data lakes can be a useful way to help your organization get more value from its big data deployments.
The ultimate objective of your big data efforts should be to enable your organization to focus on the A’s of big data – analytics, awareness, anticipation, action. Big data is here to stay. The research suggests it will take many forms and by recognizing this issue you can adopt big data strategies that will encompass those many forms and yield improved operational results for your organization.
SVP & Research Director