Analytics is a System

Aug 10, 2022

Hey folks, I’ve had a draft of this lying around for a while. I recently came across some quotes from the undermentioned book and decided to finish it up. I think these ideas wrap up neatly enough for a first post — enjoy!

Analytics is a likely candidate for an organization’s first data use case

Someone eventually asks about sales trends for a particular product line, broken down by key regions. Someone else asks what the lift in usage was after that marketing campaign last quarter (that one was pretty expensive, wasn’t it?).

You can get pretty far in answering questions like these with some combination of built-in analytics from SaaS tools and an extensive collection of spreadsheets. When requirements and data sources get more varied though, specialized tooling usually enters the picture. Fast-forward, and you likely now have something that starts to look like the modern data stack. For the most part, this means collecting data, storing and potentially transforming it in a data warehouse, and visualizing it in some reporting or business intelligence tool.

In this article, I’ll share a few thoughts about this kind of analytics process. In particular, I’ll talk about analytics viewed through the lens of systems thinking as Donella H. Meadows describes it in Thinking in Systems: A Primer.

Thinking in Systems

At the risk of delivering a primer to a primer, here are three of the main ideas described early on in the book:

A system is a composition of interconnected elements that share a purpose - think the organs responsible for breathing or a company that sells a product.
A stock is an element of the system that accumulates with activity - for example, the oxygen or carbon dioxide that accumulates in the body during breathing.
A flow is a process that increases or decreases a stock - for example, the sales that decrease inventory and increase revenue.

The author goes on to expand these definitions, discussing types of systems, the significance of particular stocks and flows as well as mechanisms for affecting them, and common failure modes within systems. I recommend taking a read if you’re already intrigued!

Analytics as a System

Returning to our discussion about tooling — there are some clear parallels between these ideas and parts of the data stack. The purpose of the analytics system is to answer questions with data. Data collection is a flow that increases your stock of stored data, transformation is a flow that increases your stock of data that is fit for analytics, and reporting and visualization is a flow that increases your stock of answered questions.

The technologies that enable these flows make up the mechanics of analytics. If you pick a streaming collection tool, you will have a stock of streaming jobs. Similarly, if you choose a SQL-based reporting tool, you will have a stock of SQL queries. Integrating these technologies is akin to connecting the stocks and flows in the analytics system.

If technology is a critical (sub)system within the analytics system, personnel is another, even more, critical subsystem. Being a system, it has its own stocks, flows, and interactions with other systems. For example, a good data scientist or analyst can act as a feedback loop, balancing the stock of analytics needs before a single query is written. This raises a few questions. What needs to be in place to hire, engage and retain people for these roles? What responsibilities, technical and otherwise, will they have once hired?

A quick aside, I think this line of thinking has struck a chord in the data industry, and the culmination of that thinking seems to be the role of the analytics engineer. It is now someone’s job to think of analytics as a system. In fact, I am one of those people. More on that in future articles 😉

Realistically, there are often several systems in action at any given time. Each of these has elements that may also be elements (or entire subsystems) of other systems. Take, for example, a company that does machine learning, causal inference, and analytics. A data lake designed to store training data for machine learning will likely not instantly meet the requirements of storing a dimensional model for analytics or statistics for causal inference. In the personnel system, machine learning engineering, statistics, and data analysis roles carry different responsibilities, have distinct hiring processes, and may not even be in the same reporting structure.

When is this useful?

The framework of thinking in systems is a pretty general one. In the data context and more broadly, I think the primary value comes from the fact that it encourages looking at a problem from multiple perspectives and levels of abstraction. You can trace the stocks and flows at multiple levels to uncover where systems and their elements diverge in their intended goals or outputs. Fixing these divergences then "just" means adjusting the elements themselves or their configuration within the system.

Thank you for reading! I hope this article motivates you to consider the analytics system in your organization. I think this kind of idea benefits a lot from context so if your system looks entirely different than what I’ve described above, drop me a line!

Thanks to J.K. Dru for early feedback and discussion.

’til next time

— Alex

Data Platforms

Ready for more?