Ventana Research recently published the findings of our benchmark research on Data Preparation, which examines the practices organizations use to accomplish data preparation. We view data preparation as a sequence of steps: identifying, locating and then accessing the data; aggregating data from different sources; and enriching, transforming and cleaning it to create a single uniform data set. Using data to accomplish organizational goals requires that it be prepared for use; to do this job properly, businesses need flexible tools that enable them to enrich the context of data drawn from multiple sources and collaborate on its preparation as well as ensure security and consistency. Users of data preparation tools range from analysts to operations professionals in the lines of business to IT professionals.
A variety of new factors are changing the data preparation process, including the growing importance of streaming data sources flowing into big data repositories and a resulting need to apply data science techniques to derive meaning from it. These technical factors will likely increase the need for IT professionals to be involved in preparing data. Nonetheless, the trend toward deploying tools that support self-service data preparation in the lines of business is growing. Self-service tools enable analysts to perform all or many of the data preparation tasks without the assistance of IT. These two trends can lead to conflict for organizations that want to derive maximum business value from their data as quickly as possible while still maintaining the appropriate data governance, security and consistency.
Advances in data preparation have unquestionably provided an opportunity for organizations to change the way they approach information management, but overall, organizations have not embraced these changes. Four years ago our Information Optimization Performance Index analysis found that more than half of organizations (52%) placed at the top two levels of our performance hierarchy; this most recent research places only 43 percent at those levels. This decline in performance suggests that many organizations need to improve their use of data preparation with a dedicated approach.
Two changes may be driving this decline: the growing complexity of data both in volume and variety and a greater focus on enabling line-of-business users to work with data independent of their IT organizations. It’s worth noting, though, that lackluster performance is not necessarily an indication of organizations’ interest in data preparation: in this research, 88 percent of participants said that self-service data preparation is important to their organizations. Despite this high level of interest in providing self-service data preparation, the reality is that organizations have not succeeded in deploying these capabilities. (Those organizations that didn’t consider it important cite security, governance or risk issues as their main concerns.)
Drilling down into the results, data preparation tools are meeting organizations’ needs in some cases but the research suggests plenty of room for improvement. Just more than half (56%) said they consider their data preparation technologies completely or mostly adequate. A slightly higher percentage (62%) reported confidence in their organization’s ability to prepare data. However, fewer than half (44%) said they are comfortable allowing users to work with data not prepared by IT. Furthermore, many users complain that their data preparation technology is not flexible or adaptable when change is needed and IT’s top complaint is that it requires too many resources. This difference points to a broader disconnect between business units and IT: They do not always see eye to eye on data preparation issues. Nearly half (45%) of participants report that the top issue between the two groups is their differing view on access to data, with business units preferring an expansive approach and IT preferring a controlled approach.
Many organizations (45%) expect to be reevaluating the way they assess and select data preparation technology in the next 12 to 18 months. When considering technology options and vendors, organizations rated usability and functionality the most important evaluation criteria. However, cost is a barrier for many organizations: Nearly six in 10 organizations (58%) cite it, making it far and away the most often reported barrier issue, followed by inadequate skills (35%), limited awareness (33%) and lack of resources (33%). On the other hand, issues such as latency, big data and scalability are least likely to be barriers, suggesting the obstacles are organizational rather than technical.
As you evaluate your data preparation processes, consider the findings in this benchmark research. Understand the primary use cases and the specific people and technology requirements of your organization. Create clear goals for your data preparation efforts and encourage your organization to adopt a cross-functional approach for designing and deploying data preparation tasks. These steps can help your organization realize the full potential of its data preparation efforts.
SVP & Research Director