You are currently browsing the tag archive for the ‘Karmasphere’ tag.

Cloudera’s recent Hadoop World 2011 event confirmed that the world of big data is getting even bigger. As I wrote of last year’s event, Hadoop, the open source large-scale data processing technology, has gone mainstream. And while 75% of the audience attended this year for the first time and so may not have realized the breadth of Hadoop’s acceptance, statistics announced in the opening keynote show widespread use of it. Mike Olson, Cloudera CEO, reported that the event was sold out, with 1,400 attendees from 580 organizations and 27 countries. In independent confirmation, our benchmark research shows that 54% of organizations are either using or evaluating Hadoop for their big-data needs.

Before or during Hadoop World, several vendors made announcements that further reinforced the growth of the market. Cloudera announced it has raised an additional $40 million to expand its operations. Oracle, a sponsor of the event, introduced its Big Data Appliance, which includes Hadoop. NetApp announced a partnership with Cloudera to provide a preconfigured, appliance-like solution called NetApp Open Solution for Hadoop. Hortonworks, a Cloudera competitor, unveiled its own distribution of Hadoop called Hortonworks Data Platform.

Despite these announcements, one of the main impressions I took away from the event is that this emperor has no clothes. If you recall the story, the fact that the emperor had no clothes was not a metaphor of his authority but a question, literally and figuratively, about whom and what he had surrounded himself with. By many accounts, Hadoop is in a similar situation. It is a powerful but immature technology that has grown popular despite its shortcomings. In his keynote, Olson acknowledged this, saying that it’s not enough to provide a platform for Java developers. Doug Cutting, Cloudera’s architect and creator of Hadoop, described it as the kernel of a distributed operating system for big data. Not many end users are prepared to work directly with the kernel of an operating system.

Those users need the technology to be easier to handle and more broadly accessible. This issue shows up in our research finding that staffing and training are the two biggest obstacles to analyzing large-scale data sets with Hadoop. MapR, another Cloudera competitor with its own Hadoop distribution, is trying to capitalize on the need to overcome this obstacle by offering free training resources. And some vendors recognize the need to surround Hadoop with “better clothes.” Karmasphere and Datameer provide tools that make Hadoop easier to use in the analytics process. Informatica recently announced HParser, which makes it easier to parse the unstructured data often collected and analyzed with Hadoop. I spoke with other vendors at the event that are still in stealth mode, but we can expect continued development in the Hadoop ecosystem. Giving this trend momentum, venture capitalist Accel Partners announced a $100 million fund to invest in big-data companies.

This market is evolving with many moving parts. Hadoop is not one thing, but a collection of multiple projects. Hadoop’s distributed file system (HDFS) and MapReduce have been the cornerstones of Hadoop adoption; the majority of organizations use those two components, our research confirms. HBase, a columnar database built on HDFS, received a lot of attention at the event and was the subject of several presentations, including one about Facebook using HBase for real-time data access. Attendees were offered a free copy of HBase: The Definitive Guide, and several commented that perhaps the event should have been called HBase World.

So while the emperor has no clothes, there are plenty of tailors making suits. I expect increasing competition among different distributions of Hadoop and among existing and new tool vendors trying to make Hadoop easier to use. The advantage of all this interest in Hadoop is that the open source community is aware of many of the platform’s issues and is working to resolve them. The disadvantage is that with all the separate components and so many competitors that it will continue to be a confusing landscape until the market matures further.

Regards,

David Menninger – VP & Research Director

Recently Karmasphere introduced version 1.5 of its Analyst product which helps organizations analyze “big data” stored in Hadoop, the open source large-scale data processing technology. An independent software vendor focused exclusively on the Hadoop market, Karmasphere made available a community edition of its developer product in September 2009 and launched the company in March 2010. Since then it has been active and visible in Hadoop-related events including Hadoop World, the IBM Big Data Symposium and others.

Fundamentally, Karmasphere focuses on making Hadoop easier to use and more accessible for both developers and analysts, who need help in this area. Our recent benchmark research on Hadoop and Information Management shows a significant shortage of skills: Hadoop users cited staffing and training as the two most significant obstacles in analyzing large scale data sets, impacting 80% and 74% of organizations, respectively.

Karmasphere Analyst 1.5 provides an interactive, graphical environment for analyzing data in Hadoop. To begin the process, it helps users understand the data structures available in Hadoop by presenting a table-based view of existing data and the ability to create new tables. In addition, Karmasphere Analyst combines information from multiple Hadoop data stores to present a unified view. Users assemble queries with a SQL-based development environment that includes syntax checking and prompts to help in the process. More than 100 user-defined functions (UDFs) are included for many common tasks and analyses. Once assembled, these queries can be stored, reused and combined together into a “query chain” or workflow involving multiple steps that are often necessary in the data preparation and analysis process. Karmasphere Analyst provides visual query plans and explanations that make it easier to understand and modify the queries. Users also can visualize the results of queries in graphical or tabular displays.

Later on in the process Karmasphere helps users prepare and move jobs into production. It includes embedded Hive and Hadoop capabilities for desktop prototyping so users can test and debug on their desktops. Then they can package and export the jobs for deployment to a cluster. Karmasphere also provides capabilities for monitoring jobs and optimizing job performance. It works with a variety of Hadoop sources including Amazon Elastic MapReduce, Apache, Cloudera, EMC Greenplum, IBM and MapR <http://www.mapr.com&gt;. Given the proliferation of sources for Hadoop, including the recently formed Hortonworks with its focus on Apache Hadoop, the ability to work with multiple version could be valuable to organizations in the evaluation process and to those who have chosen to work with multiple versions, which is the case with nearly half the participants in our benchmark research cited above.

Karmasphere has carved out a niche in the big-data market where there are unmet needs. However, it will face competition from bigger vendors as they incorporate features into their business intelligence and information management platforms that make it easier to work with Hadoop. One way Karmasphere could maintain a unique position would be to broaden its capabilities for advanced analytics. Our research shows that 69% of organizations working with Hadoop use it for advanced analytics including data mining and predictive analytics. Another way Karmasphere could improve its position with respect to larger vendors would be to provide better integration with tools beyond Excel and Tableau, which it offers today.

In the meantime, if you work with Hadoop and are looking for ways to be more productive or empower a broader range of analysts, you can try some of Karmasphere’s features for yourself here.

Regards,

David Menninger – VP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 14 other followers

David Menninger – Twitter

Ventana Research

Top Rated

Blog Stats

  • 41,015 hits
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: