Back to Blog
Data Governance7 min read

RDM vs MDM: Navigating the Reference Data Management Landscape

Clarifying the terminology of Reference Data Management (RDM) vs Master Data Management (MDM) — why the distinctions matter and how to choose the right tool.

Alphyn.ai Engineering Team

RDM vs MDM: Navigating the Reference Data Management Landscape

Our team at Alphyn.ai has been working in the field of data management for nearly 20 years. We are currently developing a solution in the area of Reference Data Management (RDM) -- the product Ingresso One, which is part of the Alphyn Governance platform.

This article opens a series of materials about the tasks facing modern reference data management systems (RDM), ways of solving them, typical misconceptions, and implementation strategies. Here we will try to understand the differences in terminology and why this understanding can be important when choosing the right tool.

A note on terminology: In some markets, a legacy umbrella term is used that conflates RDM, MDM, and DQ into a single concept. This article treats them as distinct disciplines, in line with international frameworks such as DAMA DMBOK.

Our Starting Point

The original plan was to start with applied scenarios, but as the work progressed it became obvious: without a common foundation and clarification of terminology it is difficult to talk about details. Especially given how significantly certain enterprise traditions differ from international practice.

Many believe that local experience is entirely sufficient. But in custom and internal development we often rely on global best practices. They are formalized, validated by time and a diversity of cases, and as a result provide greater variability for decision-making.

Important: despite the general character of this article, we will refer to how our team solves these tasks within the product Ingresso One. It reflects our vision of this area, and our approach may differ from the traditional one.

What is RDM: The International View

In the international classification (for example, within the DAMA DMBOK framework) RDM (Reference Data Management) is the management of code lists:

  • classifiers,
  • reference books,
  • hierarchies,
  • cross-reference mappings,
  • lists of values that change rarely and are used across many systems.

Key characteristics:

  • Centralized management of reference books
  • Distribution of versions to different systems and processes
  • A drive toward one reference book equaling one source of truth

It is important to distinguish RDM from MDM (Master Data Management).

At first glance, both deal with "reference books" -- for example, clients, counterparties, products. But there is a fundamental difference:

| Parameter | RDM | MDM | |-----------|-----|-----| | Data source | Centralized | Multiple transaction systems | | Problems | Value reconciliation, data versioning, user data quality | Duplicates, discrepancies between sources | | Tasks | Maintenance, distribution, validation | Matching and merging, harmonization, forming the "golden record" |

MDM requires:

  • Deduplication (and its occurrence is practically guaranteed, as by definition the work is with data about the same physical entity but described differently in different transaction systems)
  • Formation of a unified representation of the entity
  • For all of this -- implementation of complex comparison and data merging algorithms

RDM is only part of the entire MDM domain, covering a subset of its tasks. But it is nonetheless distinguished, especially when discussing products and the tasks they solve.

Reference data is understood as data that serves to classify and categorize other data: both master data (the domain of MDM responsibility) and transactional data.

Why the Terminology Gets Mixed Up

In some markets, a legacy umbrella term has long been used to describe what international frameworks separate into distinct disciplines. Formally this term corresponds to RDM. But in practice, it is often understood to mean somewhat more:

  • reference books (as in RDM);
  • master data (as in MDM), usually in a format close to analytical MDM -- that is, deduplication tasks and golden record determination, although those exact terms are typically not used;
  • data quality tasks in an extended form (as in DQ) -- that is, not just control of user data entry, but also, for example, automatic conversion of various formats to a unified one.

How did this come about?

  1. Historically: These systems were created as a data exchange hub between ERP, CRM, DWH, and so on. At the same time, by "reference data system" people often mean centralized management of the enterprise's key data (product catalog, counterparties, branches, etc.) -- which is already broader than just code lists.
  2. Business expectations: "Since we maintain a counterparty reference book in this system -- let it also clean duplicates and format tax IDs." But it is easy to notice that such tasks arise only if you start populating it from different sources rather than maintaining it centrally. Which already, of course, belongs to MDM.
  3. Organizationally: Companies often lack a separate MDM system, and everything falls on the reference data system.

As a result:

In many enterprises, the "reference data system" ends up being asked to do RDM + part of MDM + a bit of DQ.

Harmonization -- a Frequently Confused Term

Harmonization is a term from MDM/ETL (Extract Transform Load -- usually referring to both the tools and the practice of implementing data transformation processes), and it appears periodically in the requirements presented to reference data systems. It means bringing values to:

  • a unified format (for example, dates, tax identification numbers);
  • classifications (for example, ISO 3166 for countries);
  • business logic (for example, rules for populating attribute fields).

Examples:

  • "United States," "US," "USA" -> US (ISO 3166)
  • 8 (999) 345-67-89 -> +7-999-3456789

In MDM, harmonization is part of DQ (Data Quality). In RDM -- at most simple validation (and not always even that).

But in some markets, clients often expect that a "reference data system" can do everything in the extended sense.

Why This Matters: Different Expectations Lead to Different Approaches

In international practice:

  • RDM and MDM are different classes of systems;
  • RDM often complements MDM by providing user reference data for correctly configuring rules in the MDM system;
  • Some vendors offer them in the same product line (Ataccama, Informatica, and others), but as separate products.

In certain enterprise traditions, clients fairly often want to have a single solution for all circumstances.

The idea is appealing, but:

  • Developing such a system requires thousands of man-days;
  • The more functions -- the higher the risk of getting a product that "does everything, but poorly."

Our view: in IT nothing is impossible, but to avoid having to choose between functionality and quality, somewhat more time is needed. If demand does not drop significantly from its current level, then over the horizon of 3-5 years one could already speak about such universal solutions. But for now we would select a product for each separate task being solved, rather than trying to find the rarely seen unicorn.

Our current approach (as the team working on Ingresso One):

  • RDM should be a specialized solution;
  • MDM -- a different one;
  • Yes, they can be from the same vendor, but the tasks and teams are different.

A Forward-Looking Thought: RDM Boundaries Can Be Extended

Typically, RDM is about reference books. But at Alphyn.ai, the Ingresso One team looks more broadly.

We see value in using the RDM platform also for applied tasks, such as:

  • Managing data corrections in a DWH (data warehouse -- here not referring to a specific architecture but to the concept of data corrections in the analytical perimeter of a company)
  • Loading and delivering to recipients of user datasets
  • Supporting data science experiments
  • Managing the distribution of tabular data according to a role model, and so on

Why this works:

  • Such tasks also require versioning, access control, auditing, approvals, user UI, and the ability for full-fledged collaborative work with data;
  • Business needs a solution that in such tasks will make it less dependent on the limited IT resource ("Come back next quarter, the backlog for this quarter is full").

Yes, this is not canonical RDM, but it is also not that "universal" solution. We try to reason not only in terms of technical tasks, but also applied ones, slightly more understandable to the business. In our view, they are not so far in their requirements from what can be addressed, in particular, by our product. Many of these tasks centrally use data whose access must be provided to business users, not just IT, with extended capabilities for controlling data quality and further delivery to the destination. And this, as we examined earlier, overlaps with the expectations from RDM systems.

In subsequent articles we will cover more specialized topics and problems solved by RDM-class tools. Stay tuned!

Topics

rdmmdmdata-governancereference-dataingresso-one

Ready to Modernize Your Data Platform?

See how Alphyn Lakehouse can transform your data infrastructure.