Data quality dimensions: Completeness

In this latest post, we look at one of the DAMA six dimensions of data quality – completeness.

Definition:

The proportion of stored data against the potential of “100% complete”

Measure:

A measure of the absence of blank (null or empty string) values or the presence of non-blank values.

In the main, when we generally consider completeness, when we’re looking to see if the attribute contains a value. For example, is there a value in the postcode field. If there isn’t the record’s postcode attribute is not complete. Note that, we’re not assessing whether the value that is entered is a postcode (validity check) or is actually the postcode for this data entity (accuracy check). We’re simply checking whether a value is entered or not in the field.

Another aspect of completeness is that of data protection and in particular minimisation. So, for example, having an email address for a person might be considered to break GDPR rules on only holding minimal data about a data subject. For example, it may have been decided that you do not need the email address of a person to conduct business with that person. Therefore, the completeness check can also be used in reverse, i.e. I should have 100% blanks in the email address field!

The completeness of data is often brought into focus during business change activity. For example, we’d like to email all of our customers, but only 20% of them have an email address! Or regulatory change, we need to hold details as to the type of cladding used on property blocks but this wasn’t collected historically and we now need to retrospectively collect it.

Data completeness is a further problem when you come to make an informed decision. How can you know the impact of a service for people in the age range 18-30 if you don’t have their date of birth. The percentage completeness of your data can be a significant factor.

You may ask how does it come to pass that we haven’t captured this data? Why not make it a mandatory field in the source application? Sometimes data is not always captured at the same time. This may mean that you might create a new customer record on the Monday, but not get their date of birth for example until they send through proof in the form of a passport or DVLA licence copy. As such there is a time delay on the data capture and it is not possible to make the field mandatory in the data collection system as it would not be possible to capture other information. The means of assessing completeness over the time of a process and independently of the source application becomes necessary.

Finding missing data is easy to do with tools like infoboss and the best time to fix it with a high degree of accuracy, is when data is first captured. We can apply rules to check for completeness and have them check results after a period of time has elapsed from when the record is first created or modified. Any exceptions can be automatically alerted to the data owner as an escalation to fix.

Look out for our other posts on the six data quality dimensions.

To discover more about how infoboss can help support your data quality and data protection initiatives, please get in touch.