Where data roles come from
A few weeks of travel have pushed me slightly off schedule… but I’m still here! In this article, I decided to share my perspective on what I think most data work is for. Enjoy!
“Working” with “Data”
I started this newsletter to talk about working with data. For the most part, I meant the bits and bytes, problems like moving data around and handling technical requirements related to volume, latency, modeling, compute, etc.
Another interesting interpretation of working with data is working with people in data roles. While data-the-artifact does many things, data-the-group mostly does only one thing: support decision-making.
At least, decision-making is what I keep coming back to, regardless of my role in data. In some cases, this means supporting people that are making decisions. In others, it means maintaining machines or automated systems that make decisions. Most data work I’ve done fits into one or both of these buckets. The parameters of that decision-making determine the fine points of each role. In other words, data roles can be characterized by how they support decision-making.
When someone needs to decide whether to ship a feature, shift strategic priorities, or ramp up work on a product, they have the option of consulting The Data(TM). This consulting can mean anything from launching a full deep dive to looking at a dashboard for a few seconds. There are different schools of thought regarding how data should factor into decision-making. A few keywords for your web search are “data-informed” and “data-driven”.
The primary concern for most people in data roles is making data available and supporting this decision-making process in whatever form it takes. The constraints of the decision-making process create the need for the breadth of roles in data. Here are a few examples:
High-quality dashboards and other communication are essential if data is shared widely with potential decision-makers. Data Analysts tend to have the skill set to do this communication.
Data must often arrive on a fixed cadence and in a consistent format. This requirement calls for Data Engineers with expertise in building pipelines.
Decisions often have multiple potential outcomes. Data Scientists are in charge of understanding these and quantifying how probable they are, sculpting vague ideas into solid hypotheses and measurements.
While many decision-makers tend to be in leadership roles, most people make decisions that could benefit from data. Consider a software developer working on optimizing serialization in a high-traffic part of a web application.
Let’s say that after two weeks of designing, coding, and testing, she finishes a solid prototype. One option is to ship the prototype and compare serialization times from before and after. Instead, she engages the data team and plans an A/B test. It turns out that the prototype has faster serialization, that part of the web application has faster load times, and there’s a statistically significant increase in revenue.
Experimentation often gets a negative reputation as a decision gatekeeper. In scenarios like this, it bolsters decision-making, expanding the detected impact of the decision.
I won’t bury the lede here; this is about machine learning. By “automated system,” I mean one that produces an output with little or zero human intervention. By “using data,” I mean training a model that computes that output. For example, some systems decide what someone said, what clothing to recommend, or even how to play a game.
Much like human decision-making, automated decision-making has constraints that create the need for data roles:
In a production ML system, models are trained, evaluated, and updated using data created by user interactions. This process also needs to be continuously monitored for regressions in user experience or platform health. Machine Learning Engineers build pipelines to support this.
Machine learning is quite a dynamic field. Every year researchers develop new methods and beat existing benchmarks. Much of this work is done in academia, but Research Scientists in industry also do a sizeable amount.
There is often a gap between research methods and production implementation. Research Engineers transform ideas from research methods into executable and deployable artifacts.
This decision-making perspective has been helpful for me because it contextualizes a lot of data work. It turns updating yet another pipeline or dashboard into work that supports tangible goals. Most people value feeling like their work is part of something larger.
For human decision-making, it can provide a yardstick to measure the group’s effectiveness. If decisions are consistently made without data, data people are probably being ignored. If decisions supported by data are made too slowly, there might be a lack of alignment or skill sets. This perspective can highlight undesirable situations.
Finally, I’ll note that it’s particularly important that automated systems be seen as decision-makers. Making a decision implies a sense of certitude and often finality. Once a system makes and realizes a decision, it usually can’t take it back, including the harm that the decision might have rendered. Fortunately, many constraints directly related to this significance, such as fairness, interpretability, accountability, transparency, and privacy, are creating new work and responsibilities for folks in data roles.
Thank you for reading!
‘til next time