In an earlier post, I noted that, from a quality perspective, the two most interesting moments in a datum’s lifetime are the moment of creation and the moment of use. In this post I explore in a bit more detail how organizations with the highest-quality data think about these moments. It is significantly easier to make sure data is easy to use if people take care where it is created in marketing, sales, product development, manufacturing, planning, operations, finance, planning, financial reporting, and so forth. In other words, data quality is a line function.
First, good organizations think about quality in customer terms. My favorite definition of data quality is based on “fitness for use” and stems from the work of the great quality guru Dr. Joseph Juran: “Data are of high quality if they are fit for their uses (by customers) in operations, decision-making, and planning. They are fit for use when they are free of defects and possess the needed features to complete the operation, make the decision, or complete the plan.”
It is a demanding definition. It explicitly recognizes customers and the moments when they put data to use. It recognizes that there can be many customers, each with his, her, or its own needs. Thus, a barcode reader requires data in a different format than does a human being. The definition unites two aspects very different aspects of quality. “Freedom from defects” means the data are accurate enough, up-to-date, properly defined, and so forth. “Possess desired features” means that they are the “right data” (exactly as needed), that they are presented in intuitive, easy-to-understand ways, link with other data, and so on.
There are two basic approaches to achieving fitness for use. Organizations can find and fix errors, or they can prevent them at their sources. Not surprisingly, organizations with the highest quality data “think prevention.” The rationale is simple: By finding and eliminating the source of even a single category of error, an organization can prevent thousands of errors and, not coincidentally, save itself the trouble of correcting thousands of errors later on. As the rates of new data creation continue to grow, it is the only approach that makes any sense.
I hope readers find this thinking intuitive, maybe even obvious in retrospect. But too many organizations don’t think about data quality this way, and it leads to bad practice. They may, for example, task their IT departments with “cleaning up the data.” Doing so only leads to added cost, unhappy customers, and frustration all around. Worse, when organizations make this assignment permanent, they doom themselves to mediocre data that can't be trusted.
So how is this actually accomplished, and at scale? Following are ten habits of data quality:
Focus on the most important needs of the most important customers.
Apply relentless attention to process, especially processes that create data needed by customers.
Manage all critical sources of data, including external suppliers.
Measure quality at the source and in business terms.
Employ controls at all levels to halt simple errors and keep projects progressing.
Develop a knack for continuous improvement. The goal, of course, is to close the gap, as indicated by measurements.
Set and achieve aggressive targets for improvement.
Formalize management accountabilities for data.
Lead the effort with a broad, senior group. A few architects from IT and data stewards are simply not adequate for the task.
Recognize that the hard data quality issues are soft, and actively manage the needed cultural changes.
In future posts, we will talk about some of those habits in more detail, but the key is to focus on that definition of data quality and put it to work in your business. If you do, you’ll start seeing much more consistent and profitable payoff from your business intelligence and data analytics programs.
Thank you for the thoughtful insights.Let me try to answer a couple of questions and build on these insights.
Curtis, it is indeed true that the more people that touch the data, the more opportunities there are to foul it up.I find the health care example you cite particularly egregious.Every time a patient goes to the next clinic, he or she gets asked the same questions.It guarantees confusion.Your example underscores the need to manage business processes end-to-end.
Ashish, for relatively simple and static data, such as state codes, vendor names, etc, it does indeed seem possible to have a single version of the truth.I don’t know whether it is possible for more complex and faster-paced data.Indeed, it may not even be desirable.People (and departments, companies, etc) need nuanced data to do their work.And building nuance into data can make them more valuable.This point points away from a single version of the truth.This subject is really important.Thanks for bringing it up.Perhaps I’ll do a post on it.
MS Akkineni, your point about data transmission is well taken.Indeed, proper controls should be built in.Taking your point one step further, good process design calls for proper controls every step along the way.
David, there are many layers to your points and questions.On one level, one of the most common data quality problems is employees entering “9999” into a field they don’t understand.The impact downstream can be devastating.Training people is the only way to go and it is a big winner.
On another level, of course better forms make data entry easier and poorer ones make it harder.If a line operation is having trouble using a form, it should demand a new one from IT.
On a deeper level, the relationships between tech, data, and quality are many and subtle.For example, it seems to be the case the “automating a process that produces junk just allows you to produce more junk faster.”Dr. Deming made this observation decades ago about manufactured goods and it is no less true for data.This subject is also worth a separate post
Is it possible to have only one version of Truth out there?
@Ashish, it seems to me that data rationalization was supposed to take care of this very issue: Data is entered once, and kept in a single, consistent format that's useful for everyone in the organization. Some functions may only require that a portion of the data be delivered to a user, but everyone gets their data from (and, if necessary, updates) a single data pool. Letting lots of people touch lots of repositories of what is essentially the same data seems to me a recipe for disaster. I simply don't understand the rationale behind that sort of design.
@Tom, this is a great post on a critical topic. I have a question, though: In the research you've seen, are there any correlations between the number of people who have a chance to alter data and its fitness for use? I ask because, in areas like healthcare, the same information is gathered countless times, by innumerable people. It seems to me that the opportunities for data inconsistency would climb with each new individual who has a chance to touch the data, especially if the data forms are themselves inconsistent. Looking forward to your take on this...
Manage all critical sources of data, including external suppliers.
Very much key factor indeed.
No matter how much ever caution is applied at either ends (Suppliers & Receivers), some times file transmission process would introduce data errors like adding some special characters, messing up date formats etc. Data validation plays an important role before a data dump occurs. Proper checks for validation would certainly be a great help.
Thanks for a very interesting post. Preventing data errors before they happen seems like an obvious idea, but I can't even always wash the dishes every night instead of letting them pile up in the sink. I have to say I'm skepitical that you can train line employees to be better at entering data without IT giving them a better tool to enter the data to begin with. Is that fair to say?
Or do you know a company that has been able to do this mostly without IT intervention?
The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.