Predictive analytics is a rewarding yet challenging subject. In our benchmark research on next-generation predictive analytics at least half the participants reported that predictive analytics allows them to achieve competitive advantage (57%) and create new revenue opportunities (50%). Yet even more participants said that users of predictive analytics don’t have enough skills training to produce their own analyses (79%) and don’t understand the mathematics involved (66%). (In the term “predictive analytics” I include all types of data science, not just one particular type of analysis.)
It has been more than five years since James Dixon of Pentaho coined the term “data lake.” His original post suggests, “If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state.” The analogy is a simple one, but in my experience talking with many end users there is still mystery surrounding the concept. In this post I’d like to clarify what a data lake is, review the reasons an organization might consider using one and the challenges they present, and outline some developments in software tools that support data lakes.
Topics: Big Data, Business Analytics, Business Intelligence, Data Governance, Data Lake, data science, Governance, Risk & Compliance (GRC), Information Management, Predictive Analytics, Social Media, Uncategorized, Strata+Hadoop