I recently spent time at Strata+Hadoop World 2016 in New York. I attended this event and its predecessor, Hadoop World, off and on for the past six years. This one in New York had a different feel from previous events including the most recent event in San Jose at the end of March. Perhaps because of its location in one of the financial and commercial hubs of the world, the event had much more of a business orientation. But it’s not just location. Past events have been held in New York also, and I see the business focus as a sign of the Hadoop market maturing.
It’s part of my job to cover the ecosystem of Hadoop, the open source big data technology, but sometimes it makes my head spin. If this is not your primary job, how can you possibly keep up? I hope that a discussion of what I’ve found to be most important will help those who don’t have the time and energy to devote to this wide-ranging topic.
It has been more than five years since James Dixon of Pentaho coined the term “data lake.” His original post suggests, “If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state.” The analogy is a simple one, but in my experience talking with many end users there is still mystery surrounding the concept. In this post I’d like to clarify what a data lake is, review the reasons an organization might consider using one and the challenges they present, and outline some developments in software tools that support data lakes.
Topics: Big Data, Data Science, Predictive Analytics, Social Media, Business Analytics, Business Intelligence, Data Governance, Data Lake, Governance, Risk & Compliance (GRC), Information Management, Uncategorized, Strata+Hadoop
On Monday, March 21, Informatica, a vendor of information management software, announced Big Data Management version 10.1. My colleague Mark Smith covered the introduction of v. 10.0 late last year, along with Informatica’s expansion from data integration to broader data management. Informatica’s Big Data Management 10.1 release offers new capabilities, including for the hot topic of self-service data preparation for Hadoop, which Informatica is calling Intelligent Data Lake. The term “data lake” describes large collections of detailed data from across an organization, often stored in Hadoop. With this release Informatica seeks to add more enterprise capabilities to data lake implementations.
I want to share my observations from the recent annual SAS analyst briefing. SAS is a huge software company with a unique culture and a history of success. Being privately held SAS is not required to make the same financial disclosures as publicly held organizations, it released enough information to suggest another successful year, with more than $2.7 billion in revenue and 10 percent growth in its core analytics and data management businesses. Smaller segments showed even higher growth rates. With only selective information disclosed, it’s hard to dissect the numbers to spot specific areas of weakness, but the top-line figures suggest SAS is in good health.
MicroStrategy, one of the largest independent vendors of business intelligence (BI) software, recently held its annual user conference, which I attended with some of my colleagues and more than 2,000 other attendees. At this year’s event, the company emphasized four key themes: mobility, cloud computing, big data and social media. In this post, I’ll assess what MicroStrategy is doing in each of the first three areas. My colleague, Mark Smith, covered MicroStrategy’s social intelligence efforts in his blog. I’ll also share some opinions on what might be missing from the company’s vision.
Topics: Big Data, MicroStrategy, Mobile, Predictive Analytics, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Cloud Computing, Customer & Contact Center, Financial Performance, Workforce Performance, Strata+Hadoop
Talend recently announced version 5 of its information management platform, which emphasizes unifying its various components. Through a combination of development activities, acquisitions and partnerships, Talend has been steadily building its portfolio of information management capabilities. In addition to its core data integration capabilities, it has added data quality, master data management, application integration and with this release business process management (BPM).
Topics: Big Data, Data Quality, Master Data Management, Talend, Business Analytics, Cloud Computing, Data Governance, Data Integration, Governance, Risk & Compliance (GRC), Information Applications, Information Management, Strata+Hadoop
Cloudera’s recent Hadoop World 2011 event confirmed that the world of big data is getting even bigger. As I wrote of last year’s event, Hadoop, the open source large-scale data processing technology, has gone mainstream. And while 75% of the audience attended this year for the first time and so may not have realized the breadth of Hadoop’s acceptance, statistics announced in the opening keynote show widespread use of it. Mike Olson, Cloudera CEO, reported that the event was sold out, with 1,400 attendees from 580 organizations and 27 countries. In independent confirmation, our benchmark research shows that 54% of organizations are either using or evaluating Hadoop for their big-data needs.
Topics: Big Data, Datameer, MapR, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Cloudera, Customer & Contact Center, Financial Performance, Hortonworks, Informatica HParser, Karmasphere, NetApp, Workforce Performance, Strata+Hadoop
Informatica recently introduced HParser, an expansion of its capabilities for working with Hadoop data sources. Beginning with Version 9.1, introduced earlier this year, Informatica’s flagship product has been able to access data stored in HDFS as either a source or a target for information management processes. However, it could not manipulate or transform the data within the Hadoop environment. With this announcement, Informatica starts to bring its data transformation capabilities to Hadoop.
Topics: Big Data, MapReduce, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Performance, Customer & Contact Center, Data Integration, Financial Performance, Information Management, Workforce Performance, Strata+Hadoop
Oracle made several announcements at its recent Open World event demonstrating its strengths in the business computing market but also that it is standing on the shoulders of giants. The company has developed the expertise, processes and market share to scale out the ideas and innovations of others. Don’t get me wrong: That statement is not an indictment. Large organizations often have challenges with innovation. They are not as nimble as their smaller competitors. On the other hand, small organizations often have challenges scaling out their successes. In an earlier post I characterized the software market as a sort of ecosystem, and this is how it works. Large organizations often look to imitate or acquire smaller firms for their innovations.
Topics: Big Data, Sales Performance, Social Media, Supply Chain Performance, Business Analytics, Business Intelligence, Business Performance, Cloud Computing, Customer & Contact Center, Financial Performance, NoSQL, Oracle, Workforce Performance, Strata+Hadoop, Digital Technology