Overview

Here you will find all the important information about why data health is important, how we detect data quality issues, how we can improve your data health in the peopleIX platform and what you need to do to get the best out of your data.

Data Health Score

What is the Data Health Score?

The Data Health Score is a metric that indicates how clean your data is. Its purpose is to provide a quick overview about the reliability and accuracy of your data in a single metric. The score can range from 0 to 100, with a higher score indicating better data quality.

Untitled

Why is the Data Health Score important?

A higher score means that the conclusions drawn from your data are more reliable and accurate. Only if your data is clean and of high quality, you should rely on it. When analyzing the data, you should always include the Data Health Score to interpret the results. Therefore, achieving a high Data Health Score is crucial before analyzing the data.

Data Quality Issues

How do we detect data quality issues?

When you import your data into our platform, our algorithm automatically runs to detect data quality issues. These include:

Missing values
Duplicate values
Irrelevant values
Likely erroneous values

Screen Shot 2023-06-02 at 18.05.36.png

1. Missing Values

The data health algorithm does not simply identify all empty fields, as in the case of HR data, some fields may intentionally be left blank. The algorithm is designed to differentiate between intentionally empty fields and genuinely missing values by following logical rules. For example, if a person is hired but no offer date is entered, the algorithm recognizes that logically a value is missing in this field.

2. Duplicate values

Duplicates refers to instances where there are duplicates or inconsistencies in the labels of the data. For example, if one employee has their nationality listed as "German" and another employee has it listed as "Deutsch," it is important to merge and standardize these values into a single category.