Data Quality

The Importance of Cleaning Dirty Data for Improved Operations and Customer Success

minute read

Post Image

Imagine trying to cross the ocean with a boat that has holes in it. You’ll get wet. You might even sink. You certainly won’t make it across smoothly.

The chances of this happening are quite small, as any sensible person would thoroughly check their boat before embarking on such an endeavor.

But what about the CRM data your business uses to contact leads, segment customers, and make strategic decisions? Do you ever check if that has holes in it?

You should.

Dirty data negatively affects workflows, marketing efforts, and your customers’ experience. It can even get you into legal trouble.

But what exactly is dirty data?

What is dirty data?

Dirty data, or unclean data, is data that is in some way faulty: it might contain duplicates, or be outdated, insecure, incomplete, inaccurate, or inconsistent. Examples of dirty data include misspelled addresses, missing field values, outdated phone numbers, and duplicate customer records.

When ignored, dirty data can cause serious issues for your business. It can jeopardize the customer experience, lead to the misrepresentation of business results, and negatively impact strategic decisions.

To avoid the risks of poor data quality, regular data cleansing is essential. We’ll discuss how to clean data further down this post. But first, let’s have a look at how data gets dirty.

How data gets dirty

Data can get dirty when it’s entered, stored, or used incorrectly. Oftentimes, this comes down to human error or a lack of standardization rules for data entry, but technical issues can also lead to dirty data.

Examples of dirty data

Duplicate data

Duplicate data refers to records that partially or fully share the same information. They come about when the same information is entered multiple times, sometimes in different formats. A typical duplicate dirty data example is when one customer exists in your CRM multiple times. This often happens because the customer’s name is written slightly differently each time.

For example:

  • Patty J. Greenfield
  • Patty Julia Greenfield
  • Patricia J. Greenfield
  • Patricia Julia Greenfield

Because customer information is scattered across different records, duplicate customer data leads to:

  • Poor customer service
  • Incorrect tracking and reporting
  • Double (or triple) marketing targeting

Insecure data

Insecure data is data that is not encrypted or access controlled. It’s accessible by anyone in your company and—in worst case scenarios—even by third parties. Insecure data constitutes not just a privacy risk, but also a legal threat as companies risk being non-compliant with laws such as GDPR and CCPA.

Incomplete data

An example of dirty data that’s incomplete would be if your newsletter sign-up form has a field for the lead’s first name, but the field isn’t a required field. Leads are then able to sign up without leaving their name, which would render your personalized email campaigns less effective.

Inaccurate data

Inaccurate data is data that contains mistakes. An example of inaccurate data would be a customer entering their last name on one of your forms, but making a typo. In this case, you have the customer’s last name but it’s inaccurate. It’s a dirty record.

Another example would be if a sales representative logs an incorrect phone number for a lead in Salesforce. In this case, it’s crucial to improve Salesforce data to continue the conversation with this lead.

Outdated data

Outdated data is inaccurate not because it was entered incorrectly, but because it used to be accurate and now it isn’t anymore. A typical example of dirty data that’s outdated is if your CRM still lists a customer’s old address after they’ve moved.

Other examples of outdated data are:

  • Email addresses that are no longer in use
  • Titles of people who’ve switched jobs
  • Out-of-date email segments

Incorrect data

Incorrect data is data that falls outside of previously specified parameters. As such, it is easier to prevent. An example would be if a customer enters their birthdate using a dropdown menu. Your system will likely only allow them to select one out of 12 months, one out of 31 days, and perhaps they also won’t be able to select a birth year that would make them older than 130 years.

Inconsistent data

Inconsistent data is also known as data redundancy. It occurs when companies store the same information in different places without syncing that information. A prime example would be a company storing customer information both in its CRM and in its email marketing tool.

How to clean data

All of the above types of dirty data create risks for your company, so cleaning data and avoiding these situations is crucial.

Here’s how to process data from dirty to clean:

Create data quality guidelines

Before you start to data clean, define what a clean data set looks like for your company and which best practices should be followed to keep your data as clean as possible.

Standardize data

Having a data quality strategy includes defining a way to standardize data as soon as it enters your system. List all the ways you are gathering data right now, what the points of entry are for that data, and how you’ll ensure that all of that data is input in the same way, regardless of the point of origin.

Perform an audit

Once you’ve established your company’s data quality rules and are sure that all new data will be entered in a standardized way, it’s time to perform an audit of your existing data. Unfortunately, finding all dirty data is not easy, and while you should aim for 100 percent detection, know that you’re likely to miss some issues. That’s why it’s important to do an audit not just once, but regularly.

One way to make this process easier is to continuously gather feedback from the various departments within your company that work with data. This type of feedback shows you where dirty data is causing issues in day-to-day activities.

An example: Your marketing team shares that it has spotted how first names in personalized emails sometimes lack capitalization. This tells you that first name values are not always formatted in the same way—probably because email subscribers don’t always bother capitalizing their own names.

Clean dirty data

Once you have an overview of your dirty data, start the cleaning process. Data cleansing can be a gruesome, time-consuming task. There are different ways to go about it, each with its own pros and cons.

1. Manually

Manually cleaning data should only be done sparingly. It’s okay to clean up a record you need to use right now, but manually cleaning all data your company owns is an impossible task.

Not only would it take forever, but you’re also bound to miss things and make mistakes, causing even more errors.

2. Using Excel

Using Excel formulas can speed up the cleaning process, but it’s still quite manual. You need to build the formulas yourself, and some data issues might be too complicated to solve with an Excel formula.

On top of that, Excel can’t handle massive sets of data, so you’d have to work in bits and pieces, taking note of which data sets you’ve already cleaned.

Lastly, you’re forced to upload static data sets into Excel. When you import customer data on Monday, it’s likely already outdated by Friday.

3. Relying on a third party

If you don’t want to allocate internal time to your data cleanse, hiring a data consultant can be a good option. Data consultants are specialists who do more than just clean up your dirty data. They can also run an audit for you and help improve your existing data processes so there’s less chance of dirty data being created in the future.

The downsides to hiring consultants include the high costs and the fact that you’ll likely have to give them access to all of your data, which may lead to some privacy concerns.

4. Hiring dedicated developers

As data management is an ongoing project, you could hire one or more developers who dedicate themselves fully to keeping your data clean. Since these people will work in-house, they’ll likely be more loyal to your company than an outside consultant would be, and they’ll be able to become more familiar with your offer.

Plus, hiring someone for an ongoing project such as data maintenance is often cheaper.

5. Using software

There’s a variety of tools out there that help you identify and clean dirty data. These tools are often cheaper than hiring a consultant or a dedicated developer, and they don’t make human mistakes.

However, not all of these tools are created equal. Pick one that can spot data mismatches, check formatting (of dates, for example), and recognize which fields to merge.

You’ll also want to run a few tests on small data samples to make sure the tool works the way it’s supposed to. If you don’t do this and let it loose on your entire database, you risk ending up with larger problems than you started with.

Set up ongoing database management

Hopefully, you already have database management in place. If not, it’s high time to set it up. While you’ll likely need to clean your data at regular intervals, it’s bad practice to let issues build up until they undermine the overall quality of your database.

As a company, you are constantly gathering, organizing, storing, and manipulating new data. Ongoing data management includes the processes and practices needed to safeguard the quality of that data and prevent it from getting dirty.

Dirty data requires ongoing management

With the volume of data companies gather and handle nowadays, it’s practically impossible to avoid some of that data getting dirty. Different types of dirty data will have different consequences for your business, As such, you’ll want to clean records on a regular basis to avoid issues escalating.

You can clean data manually, use Excel, hire a third party, build an in-house team of data cleaners, and/or rely on specialized software.

Want to learn more?

For a step-by-step guide to cleaning your CRM data, check out our eBook: “The Dirt on Data Quality.”