Data Quality Management

HPC
Written by Benjamin Simonneau, on 06 February 2018

In a previous article, I explained the human element in data issues. Now, I'd like to delve into some competencies specific to Data Quality. Regardless of the industry, companies struggle with the volume of data to manage, either due to fiscal or regulatory concerns (such as Banks, Insurance, etc.), or for strategic purposes (Customer Knowledge, R&D, Product Development, etc.).

Introduction - Current Limitations

The teams involved in these matters may not necessarily be fully aware of all the ins and outs of the data quality they handle. That's why governance is often put in place, mainly to deploy roles, skills, and personnel to support data quality efforts.

The observation is easy to admit: an undersized or under-resourced team will either lack the time or feel real frustration due to the inability to demonstrate a total and mastered understanding of their informational assets. This is because of the volume and the problems resulting from it.

Development of the Idea

Context

Today, the best tools on the market allow for data exploration and the creation of simplified, segmented, attractive views of the data based on their dimensions and values. However, being able to build these views and interpret the results requires the establishment of a competency within the enterprise: Data Science. Data Science, to simplify, is the ability to use functional expertise to contextualize a dataset and build a result in line with a problem or objective of the enterprise.

Attention to CNIL constraints if you are working on the scope of personal data. The following examples do not take these constraints into account.

Example 1: A bank wants to establish consumption profiles of its clients. Data Science enables the identification of customer purchases based on transaction data and, using segmentation rules, defining profiles based on customer habits and personal information.

Example 2: An insurance company is over-provisioning funds on a product scope, heavily impacting its result. Data Science, using contract and client data, can isolate an abnormal volume of extraordinary rates among a significant number of clients.

You understand here that without expertise and tools, it is impossible to quickly and qualitatively deliver such a study.

"Test and learn," "Quick and dirty,"... of course! Taking such risks with the DGFIP or ACPR? Moving blindly into your next marketing strategy? Designing a new product without assessing its market impacts?

Going Further

I propose an idea here. The recipients would be more oriented towards Data Quality, Data Governance, and Big Data-focused editors.

From data sources, through value analysis, working on dimensions and variances, pushing contradictory analyses, a tool is capable of isolating volumes of data. What if the next step was an exchange, a discussion between the tool's intelligence and the user?

By analyzing its own results, on an infinitesimal level, some interesting findings could be returned to the user to co-create a discussion space and propose a set of useful rules in the company's Quick and Dirty Data (QDD) approach.