David Menninger's Analyst Perspectives

Karmasphere Makes Sense of Big Data

Posted by David Menninger on Sep 23, 2011 11:03:05 AM

Recently Karmasphere introduced version 1.5 of its Analyst product which helps organizations analyze “big data” stored in Hadoop, the open source large-scale data processing technology. An independent software vendor focused exclusively on the Hadoop market, Karmasphere made available a community edition of its developer product in September 2009 and launched the company in March 2010. Since then it has been active and visible in Hadoop-related events including Hadoop World, the IBM Big Data Symposium and others.

Fundamentally, Karmasphere focuses on making Hadoop easier to use and more accessible for both developers and analysts, who need help in this area. Our recent benchmark research on Hadoop and Information Management shows a significant shortage of skills: Hadoop users cited staffing and training as the two most significant obstacles in analyzing large scale data sets, impacting 80% and 74% of organizations, respectively.

Karmasphere Analyst 1.5 provides an interactive, graphical environment for analyzing data in Hadoop. To begin the process, it helps users understand the data structures available in Hadoop by presenting a table-based view of existing data and the ability to create new tables. In addition, Karmasphere Analyst combines information from multiple Hadoop data stores to present a unified view. Users assemble queries with a SQL-based development environment that includes syntax checking and prompts to help in the process. More than 100 user-defined functions (UDFs) are included for many common tasks and analyses. Once assembled, these queries can be stored, reused and combined together into a “query chain” or workflow involving multiple steps that are often necessary in the data preparation and analysis process. Karmasphere Analyst provides visual query plans and explanations that make it easier to understand and modify the queries. Users also can visualize the results of queries in graphical or tabular displays.

Later on in the process Karmasphere helps users prepare and move jobs into production. It includes embedded Hive and Hadoop capabilities for desktop prototyping so users can test and debug on their desktops. Then they can package and export the jobs for deployment to a cluster. Karmasphere also provides capabilities for monitoring jobs and optimizing job performance. It works with a variety of Hadoop sources including Amazon Elastic MapReduce, Apache, Cloudera, EMC Greenplum, IBM and MapR <http://www.mapr.com>. Given the proliferation of sources for Hadoop, including the recently formed Hortonworks with its focus on Apache Hadoop, the ability to work with multiple version could be valuable to organizations in the evaluation process and to those who have chosen to work with multiple versions, which is the case with nearly half the participants in our benchmark research cited above.

Karmasphere has carved out a niche in the big-data market where there are unmet needs. However, it will face competition from bigger vendors as they incorporate features into their business intelligence and information management platforms that make it easier to work with Hadoop. One way Karmasphere could maintain a unique position would be to broaden its capabilities for advanced analytics. Our research shows that 69% of organizations working with Hadoop use it for advanced analytics including data mining and predictive analytics. Another way Karmasphere could improve its position with respect to larger vendors would be to provide better integration with tools beyond Excel and Tableau, which it offers today.

In the meantime, if you work with Hadoop and are looking for ways to be more productive or empower a broader range of analysts, you can try some of Karmasphere’s features for yourself here.

Regards,

David Menninger – VP & Research Director

Topics: Big Data, Data Warehouse, Predictive Analytics, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Customer & Contact Center, Financial Performance, Hadoop, Karmasphere, Workforce Performance

David Menninger

Written by David Menninger

David is responsible for the overall research direction of data, information and analytics technologies at Ventana Research covering major areas including Analytics, Big Data, Business Intelligence and Information Management along with the additional specific research categories including Information Applications, IT Performance Management, Location Intelligence, Operational Intelligence and IoT, and Data Science. David is also responsible for examining the role of cloud computing, collaboration and mobile technologies as they affect these areas. David brings to Ventana Research over twenty-five years of experience, through which he has marketed and brought to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes. Prior to joining Ventana Research, David was the Head of Business Development & Strategy at Pivotal a division of EMC, VP of Marketing and Product Management at Vertica Systems, VP of Marketing and Product Management at Oracle, Applix, InforSense and IRI Software. David earned his MS in Business from Bentley University and a BS in Economics from University of Pennsylvania.