I’ve never been a fan of talking about semantic models because most of the workforce probably doesn’t understand what they are, or doesn’t recognize them by name. But the findings in our recent Analytics and Data Benchmark Research have changed my mind. The research shows how important a semantic model can be to the success of data and analytics processes. Organizations that have successfully implemented a semantic model are more than twice as likely to report satisfaction with analytics (77%) compared with a 33% overall satisfaction rate. Therefore, I owe it to all of you to write about them.
So, let’s start with a definition. A semantic model is a way to describe the organization of a set of information, including the relationships within that information set. I deliberately chose to use the term “information” here instead of “data.” I also chose to define “semantic model” not “semantic data model.” The reason for that distinction is because of the importance of metrics and other types of information which are derived from the raw data set. For a semantic model to be truly useful, it must include the calculated pieces of information that are critical to understanding any organization’s operations.
Creating and sharing a semantic model within an organization provides a way to make sure everyone understands and agrees on the definitions included in the model. It’s important to understand that the semantic model is a logical or conceptual model. It can be implemented as a physical model using a variety of technologies. One of the most common ways a semantic model for analytics process is implemented is using a relational database. Nearly three-quarters (72%) of organizations report using relational databases as the data platform for analytics processes, making it the most popular alternative.
But it is very difficult to capture and express an organization’s entire semantic model in relational technology. Specifically, it is difficult to express interrow calculations in relational technology – for example, interest expense as a function of the prior period outstanding loan balance. Or a much more complicated example, the net present value of a future stream of cash flows. You might be able to express some of these concepts, but it’s not what the relational model was designed for and it’s not easy.
More likely, you would implement the calculations of your semantic model in an analytics engine. I’ll use this term to include business intelligence products, online analytical processing engines, planning products and other similar technologies that have ways to express a variety of calculations. These would include artificial intelligence and machine learning technologies as well. Given the proliferation of AI/ML models, it is important for organizations to capture the outputs of these models, too.
Organizations that have successfully implemented a semantic model/layer:
- Are significantly more satisfied with analytics (77% compared with 33% overall)
- Have more of the workforce engaged in analytics (43% compared with 23% have more than one-half the workforce using analytics)
- Find analytics capabilities completely adequate (62% vs. 33% overall)
- Say data governance capabilities are completely adequate (51% vs. 25% overall)
- Are more comfortable with self-service: (54% very comfortable vs. 14% overall)
Those are significant differences. I have to include the standard caution that correlation is not necessarily causation. However, whether it is causation or not, it appears to be a best practice among those organizations that are succeeding in analytics efforts.
One of the challenges of managing a semantic model is that it is likely be implemented in a combination of technologies. Most likely, some portion of the model is implemented in a relational database. Then one or more analytics engines are used to implement some of the calculated portions of the model. Semantic model definitions should be captured as part of an organization’s data catalog. The data catalog is a natural place to capture and share this information. Unfortunately, most data and analytics technology vendors have implemented their own catalogs and there is not yet enough integration among them. In fact, we assert that through 2024, more than one-half of all organizations will continue to utilize multiple data catalog technologies, a choice that leads to silos of information knowledge.
So, let’s start talking about semantic models more often. They appear to have a positive effect on the success of an organization’s data and analytics processes. There is still more work to do on capturing and sharing semantic models across data and analytics technology boundaries, but that shouldn’t stop organizations from embracing semantic models today.