Data Quality Is a Line Function

Thomas Redman, "The Data Doc," Navesink Consulting Group | 6/21/2011 | 7 comments

Thomas Redman
In an earlier post, I noted that, from a quality perspective, the two most interesting moments in a datum’s lifetime are the moment of creation and the moment of use. In this post I explore in a bit more detail how organizations with the highest-quality data think about these moments.

It is significantly easier to make sure data is easy to use if people take care where it is created in marketing, sales, product development, manufacturing, planning, operations, finance, planning, financial reporting, and so forth. In other words, data quality is a line function.

First, good organizations think about quality in customer terms. My favorite definition of data quality is based on “fitness for use” and stems from the work of the great quality guru Dr. Joseph Juran: “Data are of high quality if they are fit for their uses (by customers) in operations, decision-making, and planning. They are fit for use when they are free of defects and possess the needed features to complete the operation, make the decision, or complete the plan.”

It is a demanding definition. It explicitly recognizes customers and the moments when they put data to use. It recognizes that there can be many customers, each with his, her, or its own needs. Thus, a barcode reader requires data in a different format than does a human being. The definition unites two aspects very different aspects of quality. “Freedom from defects” means the data are accurate enough, up-to-date, properly defined, and so forth. “Possess desired features” means that they are the “right data” (exactly as needed), that they are presented in intuitive, easy-to-understand ways, link with other data, and so on.

There are two basic approaches to achieving fitness for use. Organizations can find and fix errors, or they can prevent them at their sources. Not surprisingly, organizations with the highest quality data “think prevention.” The rationale is simple: By finding and eliminating the source of even a single category of error, an organization can prevent thousands of errors and, not coincidentally, save itself the trouble of correcting thousands of errors later on. As the rates of new data creation continue to grow, it is the only approach that makes any sense.

I hope readers find this thinking intuitive, maybe even obvious in retrospect. But too many organizations don’t think about data quality this way, and it leads to bad practice. They may, for example, task their IT departments with “cleaning up the data.” Doing so only leads to added cost, unhappy customers, and frustration all around. Worse, when organizations make this assignment permanent, they doom themselves to mediocre data that can't be trusted.

So how is this actually accomplished, and at scale? Following are ten habits of data quality:

  • Focus on the most important needs of the most important customers.
  • Apply relentless attention to process, especially processes that create data needed by customers.
  • Manage all critical sources of data, including external suppliers.
  • Measure quality at the source and in business terms.
  • Employ controls at all levels to halt simple errors and keep projects progressing.
  • Develop a knack for continuous improvement. The goal, of course, is to close the gap, as indicated by measurements.
  • Set and achieve aggressive targets for improvement.
  • Formalize management accountabilities for data.
  • Lead the effort with a broad, senior group. A few architects from IT and data stewards are simply not adequate for the task.
  • Recognize that the hard data quality issues are soft, and actively manage the needed cultural changes.

In future posts, we will talk about some of those habits in more detail, but the key is to focus on that definition of data quality and put it to work in your business. If you do, you’ll start seeing much more consistent and profitable payoff from your business intelligence and data analytics programs.

View Comments: Newest First | Oldest First | Threaded View
Thomas Redman   Data Quality Is a Line Function   6/24/2011 7:47:06 AM
Re: All Replies

Thank you for the thoughtful insights.  Let me try to answer a couple of questions and build on these insights.

Curtis, it is indeed true that the more people that touch the data, the more opportunities there are to foul it up.  I find the health care example you cite particularly egregious.  Every time a patient goes to the next clinic, he or she gets asked the same questions.  It guarantees confusion.  Your example underscores the need to manage business processes end-to-end. 

Ashish, for relatively simple and static data, such as state codes, vendor names, etc, it does indeed seem possible to have a single version of the truth.   I don’t know whether it is possible for more complex and faster-paced data.  Indeed, it may not even be desirable.  People (and departments, companies, etc) need nuanced data to do their work.  And building nuance into data can make them more valuable.  This point points away from a single version of the truth.  This subject is really important.  Thanks for bringing it up.  Perhaps I’ll do a post on it.

MS Akkineni, your point about data transmission is well taken.  Indeed, proper controls should be built in.  Taking your point one step further, good process design calls for proper controls every step along the way.

David, there are many layers to your points and questions.  On one level, one of the most common data quality problems is employees entering “9999” into a field they don’t understand.  The impact downstream can be devastating.  Training people is the only way to go and it is a big winner.

On another level, of course better forms make data entry easier and poorer ones make it harder.  If a line operation is having trouble using a form, it should demand a new one from IT.

On a deeper level, the relationships between tech, data, and quality are many and subtle.  For example, it seems to be the case the “automating a process that produces junk just allows you to produce more junk faster.”  Dr. Deming made this observation decades ago  about manufactured goods and it is no less true for data.  This subject is also worth a separate post

User Ranking: Blogger
eethtworkz   Data Quality Is a Line Function   6/23/2011 2:02:37 PM
Re: Too Many Hands Spoil the Data?

I could'nt have put it better myself.

Basically what you are saying is "Too many cooks spoil the Broth"...

Regarding your viewpoints here,


Letting lots of people touch lots of repositories of what is essentially the same data seems to me a recipe for disaster. I simply don't understand the rationale behind that sort of design.


This is just what happens in an enterprise with very high employee turnover and with very little concept of Hierarchies or discipline in place.

We have got to figure a more smoother,simpler and effective way to manage Data rationalisation.

The current modus operandi just does'nt seem to cut it.


CurtisFranklin   Data Quality Is a Line Function   6/23/2011 12:04:55 PM
Re: Too Many Hands Spoil the Data?
Is it possible to have only one version of Truth out there?

@Ashish, it seems to me that data rationalization was supposed to take care of this very issue: Data is entered once, and kept in a single, consistent format that's useful for everyone in the organization. Some functions may only require that a portion of the data be delivered to a user, but everyone gets their data from (and, if necessary, updates) a single data pool. Letting lots of people touch lots of repositories of what is essentially the same data seems to me a recipe for disaster. I simply don't understand the rationale behind that sort of design.
eethtworkz   Data Quality Is a Line Function   6/22/2011 2:30:47 PM
Re: Too Many Hands Spoil the Data?

I agree entirely.

The more hands Data has to pass through the more iterations it has to go through and consequently the more the chances of inconsistency of data.

Is it possible to have only one version of Truth out there?[When it comes to the data in place]???


CurtisFranklin   Data Quality Is a Line Function   6/22/2011 10:23:19 AM
Too Many Hands Spoil the Data?
@Tom, this is a great post on a critical topic. I have a question, though: In the research you've seen, are there any correlations between the number of people who have a chance to alter data and its fitness for use? I ask because, in areas like healthcare, the same information is gathered countless times, by innumerable people. It seems to me that the opportunities for data inconsistency would climb with each new individual who has a chance to touch the data, especially if the data forms are themselves inconsistent. Looking forward to your take on this...
MS.Akkineni   Data Quality Is a Line Function   6/22/2011 9:48:53 AM
Critical and Complex....
Manage all critical sources of data, including external suppliers.

Very much key factor indeed.

No matter how much ever caution is applied at either ends (Suppliers & Receivers), some times file transmission process would introduce data errors like adding some special characters, messing up date formats etc. Data validation plays an important role before a data dump occurs. Proper checks for validation would certainly be a great help. 
David Wagner   Data Quality Is a Line Function   6/21/2011 5:19:11 PM
Keeping People In Line

Thanks for a very interesting post. Preventing data errors before they happen seems like an obvious idea, but I can't even always wash the dishes every night instead of letting them pile up in the sink. I have to say I'm skepitical that you can train line employees to be better at entering data without IT giving them a better tool to enter the data to begin with. Is that fair to say?

Or do you know a company that has been able to do this mostly without IT intervention?

The blogs and comments posted on do not reflect the views of TechWeb,, or its sponsors., TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.

More Blogs from Thomas Redman
Thomas Redman   7/21/2011   7 comments
Over the last several years, interest in and excitement about analytics has grown to a feverish level. "Big data" is just the latest buzz phrase. But analytics is tough. It requires lots ...
Thomas Redman   5/13/2011   6 comments
In a previous post, I talked about the dangers of bad data and the serious strategic advantage of running a good data quality program. Bad data handicaps companies in so many ways, from ...
Thomas Redman   4/5/2011   12 comments
When it comes to managing data assets, most organizations are in the Stone Age. People simply can't find the data they need. "Knowledge workers," in particular, spend 30 percent of their ...
Latest Blogs
Larry Bonfante   4/9/2014   10 comments
When every capital expenditure is put under a microscope, it's harder than ever to continue to make the necessary investments in refreshing the technology our companies need to compete in ...
Brien Posey   3/4/2014   5 comments
Right now there seems to be a mild sense of anxiety among healthcare providers regarding the impending deadline to make the transition to ICD-10 coding. Not only are there operational ...
Michael Hugos   2/19/2014   21 comments
If you are a CIO who wants to ensure your place in the organization, a good place to start is with the CMO. That is because the CMO is most likely the C-suite executive under the most ...
Brian Moore   2/10/2014   56 comments
Ease of use matters when you are slaying dragons.
Brien Posey   1/7/2014   22 comments
If 2013 was the year of BYOD (bring-your-own-device), then 2014 could easily be the year of CYOD.