Recently Karmasphere introduced version 1.5 of its Analyst product which helps organizations analyze “big data” stored in Hadoop, the open source large-scale data processing technology. An independent software vendor focused exclusively on the Hadoop market, Karmasphere made available a community edition of its developer product in September 2009 and launched the company in March 2010. Since then it has been active and visible in Hadoop-related events including Hadoop World, the IBM Big Data Symposium and others.
Topics: Big Data, Data Warehousing, Predictive Analytics, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Customer & Contact Center, Financial Performance, Karmasphere, Workforce Performance, Strata+Hadoop
For months the speculation was rampant, and now the rumors have proven to be true. Yahoo has officially announced that it will become a player in the emerging Hadoop market. Hadoop provides distributed computing capabilities that enable organizations to process very large amounts of data quickly. Backed by Yahoo and Benchmark Capital, a new entity called Hortonworks has formed around a team from Yahoo that consists of more than 20 key architects of and contributors to the Apache Hadoop project. The company will start with some 25 employees and “will be hiring aggressively from our collective networks,” according to Rob Bearden, Hortonworks president and COO.
Topics: Big Data, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Customer & Contact Center, Financial Performance, Operational Intelligence, Workforce Performance, Strata+Hadoop
Cloudera is riding the wave of big data. I first learned about the company while working at Vertica, one of Cloudera’s partners. Customers that managed large amounts of structured relational data also needed to process large amounts of semistructured data such as the type found in web logs and application logs. The emerging channel of social media provided another source of data lacking the structure that would lend itself to analysis in a relational database. Other organizations needed to perform calculations and analyses that were difficult to express in SQL. Seeing this market Cloudera recognized earlier than others an opportunity to leverage the Apache Hadoop project; it has been offering the Cloudera Distribution for Hadoop (CDH) since early 2009.
Topics: Big Data, Predictive Analytics, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, CDH3, Cloudera, Customer & Contact Center, Information Management, Strata+Hadoop
Informatica has announced version 9.1 for Big Data. I wrote previously about Informatica 9.1,the latest iteration of the company’s data integration platform, following its industry analyst summit. At that event in February, the company officials alluded to future plans regarding Hadoop and other big-data sources yet to be finalized. This announcement reveals those plans. Informatica will support three types of “big data”: big transaction data from relational databases and data warehouse system, big interaction data from social media, customer interaction systems and other systems, and big data processing, which means Hadoop, the open source software framework. Let’s look at each of these types.
Topics: Big Data, MapReduce, Social Media, Supply Chain Performance, Business Collaboration, Business Mobility, Business Performance, Customer & Contact Center, Data Integration, Informatica, Strata+Hadoop
Last week I attended the IBM Big Data Symposium at the Watson Research Center in Yorktown Heights, N.Y. The event was held in the auditorium where the recent Jeopardy shows featuring the computer called Watson took place and which still features the set used for the show – a fitting environment for IBM to put on another sort of “show” involving fast processing of lots of data. The same technology featured prominently in IBM’s big-data message, and the event was an orchestrated presentation more like a TV show than a news conference. Although it announced very little news at the event, IBM did make one very important statement: The company will not produce its own distribution of Hadoop, the open source distributed computing technology that enables organizations to process very large amounts of data quickly. Instead it will rely on and throw its weight behind the Apache Hadoop project – a stark contrast to EMC’s decision to do exactly that, announced earlier in the week. As an indication of IBM’s approach, Anant Jhingran, vice president and CTO for information management, commented, “We have got to avoid forking. It’s a death knell for emerging capabilities.”
Topics: Big Data, EMC, Analytics, Business Analytics, Business Intelligence, Cloud Computing, Cloudera, Customer & Contact Center, Greenplum, IBM, Information Applications, Information Management, InfoSphere, Location Intelligence, Operational Intelligence, IT Performance Management (ITPM), Strata+Hadoop
Earlier this week EMC announced it will create its own distribution for Apache Hadoop. Hadoop provides distributed computing capabilities that enable organizations to process very large amounts of data quickly. As I have written previously, the Hadoop market continues to grow and evolve. In fact, the rate of change may be accelerating. Let’s start with what EMC announced and then I’ll address what the announcement means for the market.
Topics: Big Data, EMC, Social Media, Teradata, Business Analytics, Business Collaboration, Business Intelligence, Cloud Computing, Cloudera, Customer & Contact Center, Greenplum, Information Management, Strata+Hadoop
It’s clear that now we are living in the era of big data. The stores of data on which modern businesses rely are already vast and increasing at an unprecedented pace. Organizations are capturing data at deeper levels of detail and keeping more history than they ever have before. Managing all of the data is thus emerging as one of the key challenges of the new decade.
If you enjoyed my previous blog, “Hadoop Is the Elephant in the Room,” perhaps you’d be interested in what your organization might do with Hadoop. As I mentioned, the Hadoop World event this week showcased some of the biggest and most mature Hadoop implementations, such as those of eBay, Facebook, Twitter and Yahoo. Those of you who need 8,500 processors and 16 petabytes of storage like eBay likely already know about Hadoop. But is Hadoop relevant to organizations with less data that is still a lot?
Earlier this week I attended Hadoop World in New York City. Hosted by Cloudera, the one-day event was by almost all accounts a smashing success. Attendance was approximately double that of last year. There were five tracks filled mostly with user presentations. According to Mike Olson, CEO of Cloudera, the conference’s tweet stream (#hw2010) was one of the top 10 trending topics of that morning. Cloudera did an admirable job of organizing the event for the Hadoop community rather than co-opting it for its own purposes. Certainly, this was not done out of altruism, but it was done well and in a way that respected the time and interests of those attending.