David Menninger's Analyst Perspectives

Cloudera Consolidates Its Data Platform

Posted by David Menninger on Jan 22, 2021 3:00:00 AM

Organizations are dealing with exponentially increasing data that ranges broadly from customer-generated information, financial transactions, edge-generated data and even operational IT server logs. A combination of complex data lake and data warehouse capabilities are required to leverage this data. Our research shows that nearly three-quarters of organizations deploy both data lakes and data warehouses but are using a variety of approaches which can be cumbersome. A single platform that can provide both capabilities will help address organizations’ requirements.

Ventana_Research_Dynamic_Insights_02_Data_Lake_vs_Data_Warehouse_200504Cloudera is an enterprise data platform company that provides services to manage and secure all data from ingest to experimentation, from the edge to AI, in any cloud or data center. Cloudera caters to various industries offering products that range from multi-function data management software to analytics software, including data engineering, data warehousing, streaming data and analytics, operational databases and machine learning. Their Cloudera Data Platform (CDP) is an enterprise data platform built on open-source software that offers a broad set of services with the key data analytics and artificial intelligence functionality. CDP can leverage all data types, including structured and unstructured data, relational data and streaming data from any point in the data lifecycle. To support a wider variety of requirements, Cloudera offers software subscriptions and public cloud services for the CDP solution-set, and software subscriptions for traditional on-premises data platforms.

Cloudera recently announced general availability of its Cloudera Data Platform Private Cloud (CDP Private Cloud). It can run either on-premises or in public clouds. CDP Private Cloud provides a disaggregation of compute and storage, and allows independent scaling of compute and storage clusters. Cloudera also announced enterprise data platform services CDP Data Engineering, CDP Operational Database, and CDP Data Visualization. These new services are designed specifically for data specialists to navigate through data silos operating across multiple public and private clouds.

  • CDP Data Engineering is an Apache Spark service on Kubernetes that includes capabilities such as visual-based GUI-based monitoring, native Apache Airflow and APIs for scheduling and automating jobs.
  • CDP Operational Database is a NoSQL database service that touts “evolutionary” schema support, auto-scaling based on the workload utilization of the cluster, and multi-model client access with NoSQL key-value using HBase APIs and relational SQL with JDBC.
  • CDP Data Visualization enables users to easily curate visual dashboards, reports and charts. This enables technical teams to share analysis and machine learning models using drag-and-drop custom interactive applications.

With these new releases, Cloudera is improving its data platform, extending its capabilities to work with more complex data processes across multiple clouds.

This newly generated data can live nearly anywhere, however, which makes it a challenge to bring everything together and analyze the data in a way that drives real value. Fortunately, the company has a long history working with AI and machine learning to help enterprises manage contained data, and has invested heavily in their platform. Cloudera Data Warehouse is a cloud-native self-service, elastic data warehouse that can be deployed in private or public clouds. In addition to data warehouse capabilities, the company has added other functionality that put itself in a better position to address a broader range of workloads and deployments.

  • Cloudera Dataflow (CDF) is a scalable, real-time streaming data platform that ingests, curates and analyzes data for key insights and immediate actionable intelligence.
  • CDP Data Engineering can help data scientists to use centralized data from IT departments without transferring or siloing that information first. They can then create individual virtual clusters for each project or team to use.
  • Cloudera Visual Applications for CDP Analytical Experiences gives analysts and data scientists an easy way to share and explain the results of their data analysis by creating rich, visual dashboards.
  • Cloudera Operational Database can help developers to build more data-driven business applications.

Ventana_Research_DI_Machine_Learning_03_ML_Benefits_200519 (1) The added functionality is relatively new for its potential users, and most of the company growth depends on customer adoption of CDP, but the offering makes Cloudera more competitive in the already highly competitive cloud data services market. AI & ML-based analyses deliver significant value to organizations Nearly one-third (31%) of organizations report their primary benefit of these capabilities was a competitive advantage, and another one-third (31%) reported improved customer experience. Cloudera is one of the largest independent vendors providing big data capabilities. What they do helps determine what happens in the big data market. Relational data warehouse vendors are now adding more unstructured and streaming capabilities, and are investing heavily in bringing their own products to the cloud.

Cloudera has survived a tumultuous and constantly changing big data market, and CDP represents a significant change fromits original focus on Hadoop. The product still involves many parts which may be difficult to learn and understand. Additional integration and simplification will make it easier for organizations to adopt and deploy the platform. In addition, Cloudera should invest in improving the automation process and adding more modeling tools to make the platform easier to use.

We recommend that organizations that want a fully integrated data platform, access to a large community of developers and the benefits of open-source machine learning that run on-premises, in the public cloud and/or in the private cloud will find Cloudera Data Platform compelling.


David Meninnger

Topics: business intelligence, embedded analytics, Analytics, Collaboration, Data Governance, Data Preparation, Information Management, Data, data lakes, AI and Machine Learning

David Menninger

Written by David Menninger

David is responsible for the overall research direction of data, information and analytics technologies at Ventana Research covering major areas including Analytics, Big Data, Business Intelligence and Information Management along with the additional specific research categories including Information Applications, IT Performance Management, Location Intelligence, Operational Intelligence and IoT, and Data Science. David is also responsible for examining the role of cloud computing, collaboration and mobile technologies as they affect these areas. David brings to Ventana Research over twenty-five years of experience, through which he has marketed and brought to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes. Prior to joining Ventana Research, David was the Head of Business Development & Strategy at Pivotal a division of EMC, VP of Marketing and Product Management at Vertica Systems, VP of Marketing and Product Management at Oracle, Applix, InforSense and IRI Software. David earned his MS in Business from Bentley University and a BS in Economics from University of Pennsylvania.