Zeno - software voor uitgeverijen

Data cleaning and deduplication: more than just a technical exercise

Last week, Wouter and Chantal carried out an intensive cleaning and deduplication day for a customer — exactly two years after the conversion. And that turned out to be no superfluous luxury.

During a conversion, you'd rather include too much than too little data. As a result, data from different sources is merged and an abundance of (sometimes redundant) information is quickly created. Pollution creeps in even after the conversion: by users who are not sharp enough when entering, or by external web shops that create relationships via our API without sufficient control for duplicates. In case of doubt, a new relationship will still be created — because you don't want to disrupt the ordering process.

The result: duplicate and contaminated data that makes work difficult. Users find it difficult to find the right relationship, links with other systems (such as marketing automation) work less well, and analyses and cross-sell insights lose value.

Types of pollution

We distinguish three main problems:

Quantitative Pollution: outdated or irrelevant data, such as relationships from a years-old conversion or subscribers to titles that haven't existed for a long time.
Qualitative pollution: incorrect or incomplete fields, often caused by concessions in conversions or incorrect use by users.
Duplicate: relationships that occur multiple times, with activity on both, often because matching does not work enough when imported or via web shops.

Zeno helps — and so do we

Zeno offers various features to prevent this: accelerated search and input options, smart matching via the API, and input validations. As a user, you can also find and merge duplicates yourself.

However, we see that in practice, this is often not addressed structurally — either it lacks time or it seems too complex. That is why we regularly organize cleaning days. Together with the departments involved (such as marketing, ICT and customer service), we analyse what is essential and what can be removed. In this way, we not only ensure cleaner data, but also greater mutual understanding about its use.

How we work

We start by analyzing the database and jointly determine the biggest bottlenecks. First, we clean up irrelevant relationships (with attention to, for example, linked notes or features and external systems). Then we deduplicate: direct merges or markings for the user to check. New agreements are often also made during this process to prevent future pollution.

The result

In this case, we reduced the number of relationships from more than 22,000 to 12,000. A significantly cleaner database, ready for more efficient use and better customer communication.

Are you curious whether summer cleaning can also be valuable for your organization? Then feel free to contact us.

Are you participating in our summer cleaning?

Meer nieuws

Moving forward together: from an interesting 2025 to a promising 2026

Looking back on a very inspiring afternoon

Are you participating in our summer cleaning?