David Menninger's Analyst Perspectives

Dataiku Streamlines AI/ML

Written by David Menninger | Apr 14, 2021 10:00:00 AM

Organizations are becoming more and more data-driven and are looking for ways to accelerate the usage of artificial intelligence and machine learning (AI/ML). Developing and deploying AI/ML models can be complicated in many ways, often involving different tools and services to manage these solutions from end to end. Accessing and preparing data is the most common challenge organizations face in this process, and consequently, AI/ML vendors typically incorporate tools to address this part of the process. But there are many other steps in the process as well, such as coordinating the handoff between data scientists and IT or software engineers for deployment to production. This can potentially slow down the entire data-to-insights process. End-to-end platforms for AI offer the promise of simplifying these processes, allowing teams that work with data to improve organizational results.

Our market assertion is that through at least 2022, AI/ML software platforms will remain largely independent of business intelligence (BI) platforms and, as a result, will require three-quarters of organizations to maintain multiple, separate skill sets. AI/ML deployment requires a set of specialized skills not only for model development but also for operational processes, including data operations (dataops) and ML operations (MLOps). Our Machine Learning Dynamic Insights research identifies accessing and preparing data, limited budget and a lack of skilled resources as the top three challenges organizations face in applying ML.

Robust AI platforms should help organizations manage the personnel and operational aspects of AI/ML as well as model development and deployment.

Dataiku is a data science and ML platform offering a suite of data-science tools for business personnel, analysts and data scientists to collaborate around AI/ML. It is a higher-level tool, with integrations for ML libraries and notebook integrations supporting model development using Python, R, Scala and Tensorflow. It also includes an AutoML interface and a workflow-orchestration tool to manage data and tasks involved in the end-to-end processing of AI/ML. Dataiku also provides collaboration among the various participants and governance of the various assets involved. Dataiku’s AutoML is a no-code ML platform that allows data scientists to upload data as spreadsheets, choose a target variable, and have the platform choose and automatically optimize a ML model. Dataiku for MLOps allows data scientists to track and visualize drift over time for all models and implement automatic data validation policies. Dataiku also provides a model deployment API to help automate, operationalize and monitor data pipelines.

Version 8.0, released last year, added new functionality to create Dataiku applications, automated model documentation and a feature called ‘Flow Zones’. Dataiku applications allow designers to make their projects available as an application, enabling business personnel to set parameters, upload data and run the applications. ‘Flow Zones’ help organize bigger orchestration workflows into sub-parts, called zones, making them easier to read and understand. Dataiku recently released version 9.0, adding interactive scoring, data-preparation enhancements, model diagnostics and model assertions, as well as other features listed here.

As noted above, the development and deployment of AI/ML requires specialized tools, skills and multiple steps. Dataiku recognizes and addresses the various steps in the process and, in its latest releases, has added features to make those steps easier to perform. There are new features that enhance data preparation, model development, deployment and governance. Our research shows that AI/ML is used in many business departments and they can all benefit from these new features to get more value from their data-science investments.

Organizations need to find ways to continue to expand their use of AI/ML. It’s complicated, but Dataiku has made it easier to build data-science models and pack them into self-service applications accelerating business results in any department. But the process is not done. Dataiku should continue to focus on making it easier to produce and consume models. Further integration with other parts of data and analytics ecosystems, such as BI tools, will make Dataiku even more valuable. The value of AI/ML can be multiplied throughout an organization if the output of the process is easily shared with others. Not everyone is a data scientist, but nearly everyone can benefit from AI/ML. Data science teams should consider Dataiku not only for developing models but also for managing the end-to-end processes required for successful AI/ML deployments.

Regards,

David Menninger