Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

    Services for Investment Firms

    We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

    Consulting & Strategy Sessions

    Ventana On Demand

      Services for Technology Vendors

      We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

      Analyst Relations

      Demand Generation

      Product Marketing

      Market Coverage

      Request a Briefing



        David Menninger's Analyst Perspectives

        << Back to Blog Index

        Why Your Data Lake Needs Bad Data

        Ventana_Research_Benchmark_Research_Data_Prep17_12_Benefits (1)Everyone talks about data quality, as they should. Our research shows that improving the quality of information is the top benefit of data preparation activities. Data quality efforts are focused on clean data. Yes, clean data is important. but so is bad data. To be more accurate, the original data as recorded by an organization’s various devices and systems is important.

        Data quality routines should not “white wash” bad data. In other words, don’t simply replace bad data with good data. That would be like putting a fresh coat of paint over rotting wood. Even replacing the rotting wood isn’t necessarily sufficient. You need to find the cause of the problem and address it. The same is true with bad data. There are plenty of techniques to clean bad data. For example, missing numerical data could be replaced with the previous value collected, or the average of values collected, or the minimum or the maximum, or zero, or an interpolation based on the previous value and the next value that occurred. Whatever correction is made, the original data should be retained. That’s part of the value of a data lake – storing and retaining original raw data for various types of analysis. Data warehouses, on the other hand, tend to offer only clean data, often summarized so it’s impossible to identify anomalies in the source systems and data.

        Knowing what data occurred is the first clue in trying to identify and solve the problem. Let’s look at the types of bad data that might occur.

        • Missing data - Why was the data missing? Was it a system outage? A network outage? A failed data load? Monitoring how often it occurs would be very useful.
        • Invalid categorical data or unexpected values - Have you properly accounted for all the possible categories? Are there new categories in some of the source applications? How often are the categories changing? Is this something that you need to account for in your analytics? Do you need to adjust historical data for proper comparisons?
        • Data out of range - either anomalies or out of the defined range of possible values: Are the values valid, but unexpected, e.g., unusually high temperatures or sea levels due to global warming? High electricity consumption due to the installation of electric vehicle charging stations? Negative energy consumption due to selling back energy to the grid?
        • Variations on a theme - Robert, Rob, Bob, Bobby. Are you capturing the right information, such as the legal name versus a nickname?

        Seeing the history of specific data problems is also useful. Data quality monitoring should include tracking the frequency of issues over time. The remedy for data quality issues should also become data that is retained in the data lake. Machine learning algorithms can examine the quality issues and the associated resolutions to create models that help automate correction of errors. As I’ve written previously, you want to automate your data operations as much as possible so your organization can be as agile as possible.

        The bottom line is to make sure your information architecture includes plans to capture and retain the original data – good or bad. Make sure you clearly identify the type of data so those who want to look at the original data can do so. Use this information to create data quality scorecards and set goals for improving or maintaining data quality. Most importantly, identify and resolve the source of data quality problems so your organization can operate with the most accurate data possible.

        Regards,

        David Menninger

        Authors:

        David Menninger
        Executive Director, Technology Research

        David Menninger leads technology software research and advisory for Ventana Research, now part of ISG. Building on over three decades of enterprise software leadership experience, he guides the team responsible for a wide range of technology-focused data and analytics topics, including AI for IT and AI-infused software.

        JOIN OUR COMMUNITY

        Our Analyst Perspective Policy

        • Ventana Research’s Analyst Perspectives are fact-based analysis and guidance on business, industry and technology vendor trends. Each Analyst Perspective presents the view of the analyst who is an established subject matter expert on new developments, business and technology trends, findings from our research, or best practice insights.

          Each is prepared and reviewed in accordance with Ventana Research’s strict standards for accuracy and objectivity and reviewed to ensure it delivers reliable and actionable insights. It is reviewed and edited by research management and is approved by the Chief Research Officer; no individual or organization outside of Ventana Research reviews any Analyst Perspective before it is published. If you have any issue with an Analyst Perspective, please email them to ChiefResearchOfficer@ventanaresearch.com

        View Policy

        Our Analysts

        Subscribe to Email Updates

        Posts by Month

        see all

        Posts by Topic

        see all


        Analyst Perspectives Archive

        See All