A DataOps strategy relies heavily on collaboration as data flows between managers and consumers throughout the enterprise. Collaboration is essential to the success of DataOps, so it’s important to start with the right team to drive these initiatives forward.
It’s natural to think of DataOps as just DevOps for data, not quite. It would be more accurate to say that DataOps is trying to do for data what DevOps does for coding: a dramatic improvement in productivity and quality. However, DataOps has other problems to solve, in particular how to keep a mission-critical system in continuous production.
The distinction is important when it comes to thinking about building a DataOps team. If the DevOps focus is a template, with Product Managers, Scrum Masters, and Developers, the focus will end on delivery. DataOps must also focus on ongoing maintenance and requires other frameworks to work with.
A key influence on DataOps has been lean production techniques. Managers often use terms borrowed from the classic Toyota Production System, which has been much studied and imitated. There are also terms like data factory when talking about data pipelines in production.
This approach requires a distinctive team structure. Let’s first look at some roles within a DataOps team.
Key Roles for DataOps
The roles described here are for a DataOps team that implements data science in mission-critical production.
What about teams that are less focused on data science? Do they also need DataOps, for example for a data warehouse? Certainly, some of the techniques may be similar, but a traditional extract, transform, and load (ETL) team of developers and data architects will likely work just fine. A data warehouse, by its nature, is less dynamic and more constant than an Agile pipelined data environment. The following DataOps team roles handle the much more volatile world of self-service pipelines, algorithms, and users.
However, DataOps techniques are becoming more relevant as data warehousing teams strive to become increasingly agile, especially with cloud deployments and data lake architectures.
Let’s start by defining the roles required for these new analytics techniques.
the data scientist
Data scientists investigate. If an organization knows what it wants and just needs someone to implement a predictive process, look for a developer who knows algorithms. The data scientist, on the other hand, explores for a living, discovering what is relevant and meaningful as they do so.
In the course of exploration, a data scientist may test numerous algorithms, often on ensembles of various models. They can even write their own algorithms.
Key attributes for this role are restless curiosity and interest in the domain, as well as technical acumen, especially in statistics, to understand the importance of what they discover and the real-world impact of their work.
This diligence matters. It is not enough to find a good model and stop there because business domains evolve rapidly. Additionally, while not everyone may work in areas with compelling ethical dilemmas, data scientists in all domains sooner or later run into personal or business privacy issues.
This is a technical role, but don’t overlook the human side, especially if the organization is only hiring a data scientist. A good data scientist is a good communicator who can explain findings to a non-technical audience, often executives, while also being forthright about what is and isn’t possible.
Finally, the data scientist, especially one working in a domain that is new to them, is unlikely to know all operational data sources (ERP, CRM, HR systems, etc.), but they certainly need to work with the data. In a well-governed system, they may not have direct access to all of a company’s raw data. They need to work with other roles that have a better understanding of source systems.
the data engineer
In general, it is the data engineer who moves the data between the operating systems and the data lake, and from there, between the areas of the lake, such as the raw data, cleaning, and production areas.
The data engineer also supports the data warehouse, which can be a demanding task in itself, as it must maintain history for reporting and analysis while providing ongoing development.
At one time, the data engineer may have been called a data warehouse architect or an ETL developer, depending on their experience. But data engineer is the new technical term and better captures the operational focus of the role in DataOps.
The data operations engineer
Another engineer? Yes and one focused on operations. But the DataOps engineer has a different area of expertise: supporting the data scientist.
Data scientist skills focus on modeling and gaining insights from data. However, it is common to find that what works well on the workbench can be difficult or expensive to implement in production. Sometimes an algorithm runs too slowly on a production dataset, but also uses too much compute or storage to scale effectively. The DataOps engineer helps here by testing, tuning, and maintaining models for production.
As part of this, the DataOps engineer knows how to maintain a model’s score accurately enough over time as the data moves. They also know when to retrain the model or reconceptualize it, even if that job falls to the data scientist.
The DataOps engineer keeps models running within budget and resource constraints that they probably understand better than anyone else on the team.
the data analyst
In a modern organization, the data analyst can have a wide range of skills, ranging from technical knowledge to an aesthetic understanding of visualization and so-called soft skills, such as collaboration. They are also less likely to have had much technical training compared to, say, a database developer.
Your data ownership – and influence – may depend less on your position in the organizational hierarchy and more on your personal commitment and willingness to take ownership of an issue.
These people are in all departments. Look around. Someone is “the data person” who, regardless of position, knows where the data is, how to work with it, and how to present it effectively.
To be fair, this role is becoming more formalized today, but there are still a large number of data analysts who have come to the role from a business background rather than a technical one.
the executive sponsor
Is the executive sponsor a member of the team? Maybe not directly, but the team won’t get far without one. A C-level sponsor can be instrumental in aligning the specific work of a DataOps team with the company’s strategic vision and tactical decisions. They can also ensure that the team has budget and resources with long-term goals in mind.
Adapt the team to the organization
Few organizations can, or will, immediately build a team of four or more just for DataOps. The capabilities and value of the team must grow over time.
How, then, should a team grow? Who should be the first employee? It all depends on where the organization starts from. But there has to be an executive sponsor from day zero.
It is unlikely that the team is starting from scratch. Organizations need DataOps precisely because they already have work in progress that needs to be better operationalized. They may have started looking at DataOps because they have data scientists pushing the boundaries of what they can manage today.
If so, the first hire should be a DataOps Engineer because their role is to operationalize data science and make it manageable, scalable, and comprehensive enough to be mission critical.
On the other hand, an organization may have a traditional data warehouse and there are data engineers involved and data analysts downstream of them. In this case, the first position on the DataOps team would be a data scientist for advanced analytics.
An important question is whether to create a formal organization or a virtual team. This is another important reason for the executive sponsor, who may have a lot to say in the answer. Many DataOps teams start out as virtual groups that work across organizational boundaries to ensure data and data flow are reliable.
Whether loosely or tightly organized, these discrete disciplines grow in strength and impact over time, and their strategic direction and use of resources will coalesce into a cohesive framework for exploration and delivery. As this happens, the organization can add more engineering to scale and govern and more scientists and analysts to gain insights. At this point, wherever organization has started, the team is likely to be more formally organized and recognized.
It is an exciting process. The DataOps team can make the difference between a company that occasionally does great things with data, and a company that runs efficiently and reliably on data, analytics, and insights.