A hitchhiker’s guide to Data City — 4 steps to build a clean interface between your data warehouse and BI tool

How to use DataOps best practices to get business users and data teams to reach their common goal of self-serve chart and table building with appropriate guardrails.

As a breed, us data engineers and data analysts aren’t typically the best at task management. That distracting 2:30 p.m. Slack message just as the post-lunch sleepiness kicks in: “Hey Hugo, just a quick question. Do you know if the users who buy pongo sticks also tend to buy a piñata? 🙏” turns into a 5:30 p.m., “Oh my god I still haven’t written that other query for Simona from sales, and Simona from sales isn’t happy.”

Being a gatekeeper to your company’s data is just a pain.

Can’t everyone just stop having questions, or ask more useful questions?

No, they can’t. And how can a company become data-driven (i.e., the only reason we have a job) if every time someone is curious about data, they hit a huge wall of process and higher priority requests?

The only answer is for us to stop being gatekeepers.

*Side note: The process described below can take many months, and in some cases, years. What follows is a description of the principles that should be applied from discussions with leading European data teams at both the early and later stages of development.

Step 1: Within-data-team documentation

Imagine the data warehouse as a bustling city with countless buildings — your data tables. The city is constantly growing and changing: new data is pouring in while old data is being updated or sometimes removed. The data is organized in ways that make sense to the architects and city planners, i.e., data engineers and analysts, but can seem impenetrable to others.

But that’s not how cities are supposed to work, right? The beauty of a city is its accessibility — its ability to allow anyone to navigate and find what they’re looking for. Within a data team, this starts with proper documentation. Documentation provides a layout of your Data City — a map with well-marked streets and landmarks. This blueprint should be your holy grail — a document that is constantly updated as the city expands, and one that is always consulted when making changes or adding new structures.

Additionally, every change should be properly logged, and this log should be easily accessible. This, by no means, eliminates the need for face-to-face (or Slack-to-Slack) communication. It simply provides a framework within which that communication can be more effective. You don’t ask the city planner where the nearest pizza place is; you look it up on the map. But if you want to open your own pizza place, you’ll need to have a chat with the planner.

By making this map readily available and up to date, we enable all members of the data team to navigate the city that is your data warehouse with ease. This means less time spent lost in the alleyways and more time spent gleaning insights from the data or building on the existing infrastructure.

This will be the foundation upon which we build the next steps, making Data City not just accessible to the data team but to everyone else who might benefit from its wealth of information.

Step 2: Build a glossary for business users to understand metric definitions

Now that we have a well-documented map of the city, it’s time to work on the language guide. Each city and culture has its own lingo — terms and phrases that are unique to it. These might be clear to the locals yet baffling to visitors. The same is true for your data warehouse.

Technical jargon, database names, column headers, acronyms, coded values — these are all part of Data City’s local dialect. Data engineers and analysts speak it fluently, but to a business user, it might as well be an alien language. Think of the glossary as a pocket language guide for tourists visiting a foreign city. It tells you that in Data City, “MAU” means “monthly active users,” not a strange breed of cat. It gives a short description to explain how the metrics are used.

This glossary needs to be easily accessible, ideally integrated within the BI tool itself.

The objective is not to eliminate the need for interaction between business users and the data team. The glossary is not meant to replace the local guides but rather to empower the visitors. The more they understand the language, the richer their experience and the better the questions they can ask.

Building this glossary will be a collaborative effort. It will require deep understanding from the data team and plenty of input from the business users. But the result will be a more empowered user base, more meaningful interactions, and, ultimately, a more data-driven organization.

Step 3: Calculate the same things the same way (DRY, macros, etc.)

Imagine that our city guidebook has multiple contributors. While this diversity of voices and expertise enriches the guidebook, it also poses a risk: the danger of inconsistency. If the same metrics in our data warehouse are calculated differently in different places, it leads to confusion and a loss of trust in the data.

This is where the principle of “don’t repeat yourself” (DRY) comes in, advocating for unified logic when calculating shared metrics. Just as a city guidebook would follow a style guide to ensure consistency across its pages, our metrics calculations should adhere to a standard formula. We could think of these standard formulas as macros, similar to a standard description that all guidebook authors would use when referring to the “Revenue Monument” or “Customer Churn Park.”

The guidebook might even keep these standard descriptions in a central repository — think of it as a style guide annex or a writer’s toolkit. Each time an author wants to refer to “monthly active users,” they consult this toolkit and use the standard description provided. In the data context, this toolkit would be a set of shared scripts stored in an accessible location that contain the logic for calculating commonly used metrics.

This also reduces the contributors’ workload, as they needn’t craft a new description each time they reference the monument. Instead, they only have to do it once, and all other mentions will follow suit.

Step 4: Offer business users 10x more flexibility, with just 10% more work (i.e., build a metrics layer)

You might assume that our city guide is thorough and user-friendly by now. It’s well-documented, it has a handy glossary, and its content is consistent. It’s a great tool for any explorer, but can we elevate the experience further? Absolutely.

Let’s consider adding an itinerary planner to our city guide — one that understands the intricate relationships between all the key attractions and gives tourists the freedom to explore the city in their own way. This is the metrics layer.

Setting up a metrics layer requires an initial investment from the data team. They need to describe the dimensions that can be used in analysis with any metric, so that when a business user reaches that metric, they have a clear understanding of the exploration they can perform from that point. For example, the “MAU” landmark indicates that it can be broken down by region, date, industry, and/or cohort.

To the business user, this intricate network mapping doesn’t just offer one route but multiple. They feel they can analyze a metric from any dimension they choose. Want to know the revenue by product category? No problem. Interested in understanding churn rate by customer demographic? The metrics layer has got you covered.

This empowers business users like never before. They are no longer dependent on data engineers or analysts to build new SQL queries for them. They can independently explore data, ask complex questions, and get accurate, consistent answers.

But the benefits aren’t one-sided. The data team, despite the initial work of building the metrics layer, eventually saves a significant amount of time as they no longer need to cater to every individual query. It’s a win-win.

Conclusion

Once upon a Slack message, we data folks were grumpy librarians at the data library, wrestling with SQL queries while the Simonas of the world queued up, each holding a sticky note with “Are our customers really into piñatas?” scribbled on it.

Now look at us. We’ve crafted a comprehensive city guide, built a handy glossary, published the ultimate cookbook of calculations, and unleashed the super-powered trip planner that is the metrics layer. Our Data City has morphed from a labyrinth to a user-friendly theme park, where business users can hop on the “Insight Roller Coaster” without first needing to pass an SQL proficiency test.

The goal is that one day, when Simona sends you a “quick question,” you can confidently say, “Simona, here’s your hitchhiker’s guide to Data City. Enjoy the ride!”