Data Quality Spell Book: To IT Or Not IT

“The goal is transform data into information, and information into insight.” – Carly Fiorina

I like to compare data quality to an oil refinery. The goal of a refinery is to transform crude oil into refined, clean, and usable oil — after all, no-one will even consider using dirty oil in their cars. It’s the same with crude data: Our goal is to transform it into cleansed, rationalized, and usable information. Using crude (or dirty) data will cause a whole range of short- and long-term problems. Dirty data will prevent the organizational engine from fulfilling its energy potential, making it slow and ineffective – which presents quite a problem in the economic race against other organizations.

Back to the refinery. A refinery consists of two main entities: infrastructure and oil products. The infrastructure enables production (e.g., reactors), transport (e.g., pipes, valves and pumps) and storage (e.g., silos and containers). Oil products are the raw, crude oil and wide array of products and byproducts we all use everyday or are used by petrochemical industries. In a refinery, life is fairly “simple.” Some define what products are required, when and where; some are responsible for process design; some are responsible for the production of high quality products; and others are responsible for the infrastructure to support all these activities. Clearly, the maintenance or engineering department is not responsible for the process design, production, or marketing and distribution. Their job is to provide the required facilities for production, storage, and transportation, and to prevent contamination and reduction of product quality as a result of poorly maintained infrastructure. They are not responsible for the quality of the content.

So here’s my point: IT in a modern organization is the maintenance and engineering in a modern refinery. IT provides us with facilities to produce data (e.g., forms); they are responsible for data storage (e.g., databases); they are responsible for data transportation and delivery (e.g., interfaces, reports, queries), but they are definitely cannot be responsible for the quality of the data produced – the content.

Suppose a young engineer in a factory needs a particular screw for a production process he is working on. He will dip into the ERP system, scan many product descriptions, and a large number of products classified as screws. Among them, he finds the following four descriptions among others:

· DIN 912 10x1x30-2.9 mat304
· ALLEN SCR M10x30 stainless steel
· SOCKET BOLT M10x1 LG30 SS
· M10x1mmx30mm SHCS-SS

Well, none of them seems to be the one he is looking for, according the data in the technical catalog he is using. The next logical action will be to generate a new product in the catalog and to order this desired screw. A more expert engineer might be able see that all the above screws are actually the same, in spite of the totally different descriptions. The outcome is a new product number, new order, and inventory of a product already in stock. The next time an engineer needs this particular screw, he will conduct another search, fail to find exactly what he’s looking for, and probably generate yet another new product number with a different description. (By the way, the above product descriptions were taken from a real customer catalog!)

Can we expect IT to be responsible for the content of the product data? They provided us with the entry form; relevant fields; stored it in the database; and allowed us to restore the product data when asked. But IT cannot be responsible for the content stored in the IT systems. The main problems with data quality are with the quality of the content stored in the IT system. That is why we cannot transform data into information and information into insight.

Data Quality Spell Book

Wednesday, November 14, 2007

To IT Or Not IT

No comments:

Me

Labels

Blog Archive

Worth Visiting

Personal Interests