In 2021, humans created 2.5 quintillion bytes of data every day. And as companies increasingly rely on data to generate business value, properly managing, storing, using, and securing all that information is a must.
That’s why data governance is vital for every organization.
The ability of organizations to capture and use data can be the difference between business success and failure — and yet, for many companies, it’s still unclear what a data governance framework is, why it’s important, and how to use it to gain operational advantage.
The first purpose of data governance is to ensure your organization’s data is of high quality throughout its lifecycle. Secondly, it’s to ensure this high-quality data is available, accessible, and usable by those who need it.
To achieve this, enterprises need the trinity of people, processes, and technologies to work together, which requires managers to choose a specific approach to data governance and create relevant policies in turn.
The combination of this approach to data management and associated rules, processes, and policies is referred to as a data governance framework. How that framework is reflected in day-to-day operations is an operational model.
A data governance framework sets out an organization’s attitude to its data and specifies rules for data management and stewardship (i.e., formal accountability for the data). It may also describe tools and technologies to be acquired, KPIs and metrics, definitions of terms used, etc.
Importantly, a data governance framework is not designed to achieve a single objective after which it becomes obsolete. Instead, it is a tool intended to embody data-related processes, standards, and outcomes over the long term.
The process of developing a data governance framework requires collaboration between managers, data owners, and data users. It usually takes the form of a document, repository, and/or rules that state:
The organization’s approach to data management and deployment
The roles and responsibilities of data stewards and others involved in managing data
How data is classified (e.g., confidential/non-confidential; commercially sensitive or otherwise), organized, stored, accessed, shared, and used
Standards for data quality and security
The regulations and laws to be complied with and the associated documentation
The desired outcomes and how progress/achievement is measured
A data governance framework will convey benefits throughout the organization and to its stakeholders. These can be divided into top-level organizational gains and operational improvements and advantages. Operational benefits include:
Better processes throughout the enterprise, from IT to customer service
Regulatory and legal compliance and accurate documentation
Elimination of inaccurate, old information from data pipelines
Better communication between data owners and users
Availability of larger volumes of data
Easier management of formal data provision, such as Freedom of Information, subject access requests, and proof of advertising claims
Top-level benefits for the organization include:
Commercial advantage, e.g., market/customer behavior insights arising from the analysis of clean, current, and complete datasets
Better and more accurate, informed, and timely business decisions
Reduction of risk in multiple areas that have the potential to inflict financial or reputational harm. E.g.,reduced risk of data breach, regulatory non-compliance, or failure to correctly perceive customer behaviors or enterprise capacity
Access to digital supply chains
We have noted that every organization must choose an approach. This will influence not only the content of its data framework but also how that framework is deployed and managed. Three main data governance strategies and corresponding frameworks reflect best practices, and there are prototypes for each of these approaches that you can use “off the shelf”.
In this framework, responsibility for data governance is usually assigned to those who already have executive and/or managerial power. It’s an authoritative, “top-down” approach that distributes power following the hierarchy of the organization. It often reflects a desire to take control of the organization’s data and focuses on measurable outcomes. There is a marked difference between data controllers (e.g., data analysts and IT team members exercising powers under the data governance framework) and data consumers (who may come from any part of the business with a need for data, like marketing, sales, and HR).
The DGI framework has a strong focus on whole-enterprise outcomes and controls, so it may be more easily implemented via a command-and-control approach.
In this approach, staff with current managerial accountability for data are formally allocated responsibility. Although this also follows hierarchies — albeit within a specific part of the organization — it may be considered more technocratic than command and control. A traditional data governance framework will usually prioritize data quality and seek to democratize access to it.
The DAMA DMBOK provides formal standards for data governance and suggests that technical (IT) professionals and analysts should have responsibility for compliance, with cross-enterprise collaboration that could be considered traditional. Although much depends on the culture and structure of the organization and the operational model used.
PwC’s enterprise data governance framework balances whole-enterprise objectives with optimal management and deployment of existing data resources, with a view to futureproofing. It is probably best described as traditional, but it has a strong future focus.
This approach tends to confirm, or formalize, the roles that staff already have in relation to data. The focus is usually on improving processes and outcomes, including those arising from investment in new technologies.
McKinsey’s data governance framework allows more scope for creativity and organizational change than the previous two. It focuses on domain expertise and collaboration between data and domain experts. It is therefore traditional or, perhaps, depending on the operational model chosen, non-invasive.
Similarly, the Eckerson Group’s framework prioritizes people and defines processes and roles, so it’s non-invasive.
Operational models are how frameworks are carried out in daily business practice. There are types of operational models, just as there are types of data governance frameworks. The operational model and framework combine to ensure you have a robust data governance strategy. Usually, as we note below, there are certain frameworks that correspond better with specific operational models. These include:
Roughly corresponding with the command and control/traditional approaches, a centralized operational model hands control of data to a small group of experts, usually in IT, and anyone requiring data must ask for it. This ensures control, but not agility, and is almost impossible to scale. In organizations of any size, data controllers quickly become overwhelmed with requests. This can lead to reduced quality and availability.
This makes data available to many more people within the enterprise, including through self-service models. This minimizes backlogs and can enhance data flows. However, it can also increase risks relating to security and accountability.
This is achieved by balancing the risks and benefits of centralized and decentralized models, and it looks different for every organization.
While the hybrid model is perhaps the hardest to achieve (although there are tools to make this easier), it is a good approach, particularly where enterprises must balance the need to be competitive with demands for regulatory and legal compliance.
How you choose to implement your data governance framework will depend on the culture and nature of your enterprise and the risks that must be managed. Every organization will have its own unique data governance framework according to its business, technology, and the goals it wants to achieve.
However, there are some key questions to consider before setting up your own data governance framework.
What data do you have, where is it, and how is it currently being stored? What condition is it in?
What format is your data in? (Consider all formats at this stage.)
What silos (cloud, hybrid, on-premise) do you need to consider?
How does this data need to be consumed?
Which stakeholders need access to this data? When? Why?
What regulatory considerations do you need to incorporate (GDPR, CCPA, etc.)?
What are you trying to achieve with your data governance framework? (I.e., SMART goals.)
What data will you have/need/process in the future?
These questions will all influence the framework and operational model you choose. For example, if you work in a public sector organization, such as a social services department, your operating culture likely focuses on people. Due to regulation and professional standards, you’ll need to prioritize ethics and data protection.
In this context, a flexible and diffused non-invasive approach (such as Eckerson) allows you to delegate responsibility for data protection to a wide range of staff while retaining strong governance throughout the organization and for regulatory standards.
Meanwhile, your staff’s duties will be formally recognized under the data governance framework, which aligns with a people-focused culture. It also helps you maintain data traceability and the ability to audit, while maximizing the security, quality, and availability of data required for decision-making.
However, things would look very different in another use case. In a highly competitive, hierarchical, and regulated sector (such as banking), you might be asked to prioritize risk management and ensure regulatory/legal compliance above all other concerns.
In this context, you might choose to roll out a command-and-control framework. This will reflect the reporting and management lines that already exist within the culture, with very clear lines of authority and duty to manage the massive financial and reputational risks involved.
With so many enterprise data governance strategies to choose from, and the scale of collaboration and documentation involved, you might find building and operating a framework a daunting prospect.
Fortunately, this doesn’t have to be the case if you choose your tools wisely. With Y42, you can easily keep data stewards accountable and reduce the margin of error for working processes, transforming a data team’s working style from reactive to proactive.
Y42’s native integration approach makes this possible, which spans across the whole data stack. Other key features include role-based access control, an asset ownership system, the implementation of data contracts, and data observability features, such as a data catalog and a data lineage.