<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8489393935068233036</id><updated>2012-02-16T17:33:43.354+02:00</updated><category term='Content'/><category term='Taxonomy'/><category term='Data quality'/><category term='No magics'/><category term='Data governance'/><category term='IT'/><title type='text'>Data Quality Spell Book</title><subtitle type='html'>A search for the elusive spell book...</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>9</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-4777671232342714549</id><published>2009-01-29T14:19:00.002+02:00</published><updated>2009-01-29T14:20:00.289+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Taxonomy'/><title type='text'>Taxonomy, Divide and Merge (Part 2)</title><content type='html'>&lt;p&gt;Following on my last post, I’d like to focus on the question: What can be considered as a good taxonomy? Given the fact that taxonomy is something between art and science, a consensus will be hard to achieve. Luckily, this discussion is focused on the product data realm and based on practical aspects gained through many years of experience working with product data.&lt;/p&gt;&lt;p&gt;&lt;b&gt;A product taxonomy should be practical&lt;/b&gt;&lt;/p&gt;&lt;p&gt;We need to have a taxonomy that enables us to search, compare, group, or analyze products quickly and easily. The pure 'academic' approach to defining categories is to group products that share exactly the same attributes, so that each group will constitute a category. This bottom-up approach will result in a long, flat list of categories. This kind of list will not serve us efficiently in searching and navigating through products and many categories will only contain a few products. &lt;/p&gt;&lt;p&gt;The alternative approach, the top-down approach, is based on logically dividing the product world into groups (e.g. hand, power, and machine tools as one group, fasteners as another group) and then continuing to divide those worlds into sub-worlds (e.g. fasteners is sub-divided into screws/bolts, nuts, nails, etc.) and so on. This approach results in a subjective structure and will be prone to errors. The best, practical method is a combination of the two approaches — resulting in an optimized taxonomy which incorporates categories that share the same technical attributes and includes many products as possible.&lt;/p&gt;&lt;p&gt;But let’s go back to the beginning. The first and most important factor in creating a taxonomy is the definition of the categories. A category should always reflect the essence or the nature of the classified object. It sounds trivial, but unfortunately, most of the categories I have come across are based on the usage of the object. For example, we had a customer (an academic research institute) that classified sugar (yes, sugar!) under several categories: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;Animal food&lt;/li&gt;&lt;li&gt;Chemical materials&lt;/li&gt;&lt;li&gt;Refreshments&lt;/li&gt;&lt;li&gt;Office supplies&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;But sugar is sugar, whatever you do with it. Another common example is capacitors. A capacitor is a capacitor, but in many organizations they are divided into Electrical Capacitors and Electronic Capacitors, while the products in both categories share the same technical attributes. &lt;/p&gt;&lt;p&gt;(By the way, the main reason why current taxonomies are usage-oriented is because among the first to define and build classification systems were maintenance departments. For them, the best way to classify was according to the usage/facility/machine. They preferred to say "a ball bearing for X machine" than "a ball bearing with diameter D, material M, etc.”)&lt;/p&gt;&lt;p&gt;&lt;b&gt;Notes on the relationship between categories and attributes&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Another interesting aspect of categories is the ability to transform attributes into categories and vice versa. This feature enables us to optimize our taxonomy for a given organization or situation. For example, in the public taxonomy UNSPSC, there are several categories of RAM memory: Random Dynamic RAMs, Random RAMs, and Static Random RAMs. The three categories share the most of the same attributes, so if your main business is not RAMs, you may prefer to have a single category of Random Access Memory and define a technical attribute called Type with three values: Random, Dynamic, and Static.&lt;/p&gt;&lt;p&gt;&lt;b&gt;A plate by any other name&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Another factor is the name of the category. It’s important to bear in mind that we all have our own perceptions, so if category names are not simple and clear and there are ambiguities, many products will be wrongly classified. Think, for example, of the word “plate.” It may refer to coating, a dish, or board and maybe there are more meanings. It is definitely a bad category name! &lt;/p&gt;&lt;p&gt;In my next post, I’ll take a closer look at the category hierarchy — the second most important factor in creating a good taxonomy.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-4777671232342714549?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/4777671232342714549/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=4777671232342714549' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/4777671232342714549'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/4777671232342714549'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2009/01/re-taxonomy-divide-and-merge-part-2.html' title='Taxonomy, Divide and Merge (Part 2)'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-5776143463635499910</id><published>2008-12-10T12:47:00.003+02:00</published><updated>2008-12-10T13:04:29.613+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Taxonomy'/><title type='text'>Taxonomy, The Constitution Of Product Data Quality (part 1)</title><content type='html'>&lt;p&gt;&lt;span style="font-family:verdana;"&gt;I think the most important, yet undervalued, factor in the product data realm is the taxonomy. The taxonomy is the skeleton and foundation of any reasonable product data quality (PDQ) strategy. A complete, well done taxonomy (and I will elaborate on this later) serves as the core domain knowledge for PDQ and increases its level of quality. Of course, having a good taxonomy cannot, by itself, guarantee high product data quality, but without it, high quality product data cannot be achieved at all – it’s simple as that. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;Taxonomy can be easily compared with the constitution. A good constitution is comprehensive, clear, consistent, balanced, practical, updated, and sets limits and borders while also allowing for ad hoc judgments and decision making. A good constitution embodies and accumulates values, positions, culture, experience, common sense, and serves as a guide for the society who creates it. But having a good constitution is not enough – it should be followed, enforced, and continuously maintained. The same with a taxonomy. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;To give a bit of background, taxonomy, the study of classification, is the basis for all science. We use taxonomy to structurally group similar things into categories, based on a set of common, category-specific characteristics. Aristotle made one of the earliest attempts to classify two major groups: plants and animals. Plants were separated according to size (structure)–herbs, shrubs, and trees, and animals were grouped according to where they lived–land, sea, or air. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;Carolus Linnaeus (1707-1778) was a Swedish naturalist who is considered the "Father of Taxonomy." He set out to examine, describe, classify and name every living species on earth and developed the system by which we name organisms today, which groups species according to shared physical characteristics. His task was to make sense out of chaos, and to devise an organizational system that would sort out any confusion. He grouped species together into genera based upon physical similarities, and then grouped genera into families based upon broader physical similarities, etc. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;In essence, taxonomy consists of four elements: &lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Categories (e.g. Manual Wrenches, Screw Drivers)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Category hierarchy (e.g., Assembly &amp;amp; Fastening -&gt; Wrenches -&gt; Manual Wrenches) &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Attributes related to each category (e.g., Manual Wrenches: Opening Size, Overall Length, Handle Type, Jaw Material, etc.) &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Values related to each category attribute (e.g., Manual Wrenches: Jaw Material: Alloy steel, Cast bronze, Aluminum-Magnesium, etc.)&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt; &lt;/p&gt;&lt;p&gt;&lt;span style="font-family:verdana;"&gt;In my next post, I will discuss "What constitutes a good taxonomy?" &lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-5776143463635499910?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/5776143463635499910/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=5776143463635499910' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/5776143463635499910'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/5776143463635499910'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/12/taxonomy-constitution-of-product-data.html' title='Taxonomy, The Constitution Of Product Data Quality (part 1)'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-7148244900847419350</id><published>2008-11-18T16:03:00.001+02:00</published><updated>2008-11-18T16:07:26.526+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data quality'/><title type='text'>Why Data Cleansing is Not Rational Enough</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Many of us use the term “data cleansing.” I have never liked this term because it actually says nothing about the state of the data before cleansing (Was it dirty? A little bit dirty?) and what is the state of  the data after cleansing (Less dirty? Totally clean? Or what?) Any improvement in data quality can be considered as cleansing, though the quality remains low. What term reflects the status of the data before, after, and the value it ultimately brings?&lt;br /&gt;&lt;br /&gt;In the product data realm, some use the term “rationalizing.” This is much better. It means that there was irrational data ("not in accordance with reason; utterly illogical." Dictionary.com) and after processing, it was rationalized ("proceeding or derived from reason or based on reasoning; agreeable to reason; reasonable." Dictionary.com). But rational is very subjective. What is rational for one person may be irrational for others. Furthermore, the term “rationalized” doesn’t even hint about the potential value.&lt;br /&gt;&lt;br /&gt;In generic terms, what we do is take raw, crude data, run it through several processes, and produce high value data that can be considered as a "single version of truth" and as such, can be used across the organization. This process can be easily considered as refinement ("to bring to a finer state or form by purifying." Dictionary.com). Bingo! Data Refinement has it all. It embodies the initial state of the data – raw; the final state – pure; and the value – refined, pure objects are considered to have higher value.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-7148244900847419350?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/7148244900847419350/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=7148244900847419350' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/7148244900847419350'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/7148244900847419350'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/11/why-data-cleansing-is-not-rational.html' title='Why Data Cleansing is Not Rational Enough'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-3304356921899926085</id><published>2008-11-10T13:54:00.005+02:00</published><updated>2008-11-10T20:48:07.843+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='Data quality'/><title type='text'>The Beginning of Wisdom is To Call Things by Their Right Names (Chinese proverb)</title><content type='html'>&lt;span style="font-family:verdana;"&gt;A few days ago, I had an interesting meeting with the CFO of a multinational company. He had recently tried to "optimize" his supply chain, or in other words, to cut costs. He knew that some products were causing him a major headache (and hole in his pocket), and had to do something about them. But which? The next step was to locate the products that account for most of the expenses (a kind of Pareto analysis) and then to find out the stock level, stock policy, average consumption, number of suppliers, the annual volume with each supplier, logistics (storage and transport costs) and so on. By doing so, he thought that he would be able to reduce the inventory and number of suppliers, to negotiate and get better purchasing conditions, and reduce the logistics costs. A good plan, indeed! Well, it is the core of Spending Data Management (SDM) and other supply chain optimization practices.&lt;br /&gt;&lt;br /&gt;Unfortunately, in spite of the Oracle Application ERP, data warehouses, BI software, and other goodies that that the company had invested in during the last few years, he couldn't get a reliable picture of the company’s spend. I asked him to send us his product data (in a text file, Excel, or something similar), so we could analyze and evaluate the data quality.&lt;br /&gt;&lt;br /&gt;I had a pretty good idea of what to expect, since I’ve seen it many times before. But we needed to put the evidence on the table, so to speak.&lt;br /&gt;&lt;br /&gt;Let's take valves as a typical example. We found that they were classified under more than 20 different categories. Here are just a few examples: &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Industrial Safety – Breathing Equipment – &lt;strong&gt;Valve/Diaphragm&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Control – Control Equipment – &lt;strong&gt;Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Lifting – Winch spares - &lt;strong&gt;Engine/Clutch/Relay Spares&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Liquid/Gas – Brass/Copper/Bronze Parts – &lt;strong&gt;Safety Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Liquid/Gas – Stainless Steel – &lt;strong&gt;Pneumatic Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Control – Control/Tubing Equipment – &lt;strong&gt;Electrical Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Control – Control/Tubing Equipment – &lt;strong&gt;Pneumatic Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:verdana;"&gt;Vacuum – Vacuum Installations – &lt;strong&gt;Right Angle Valve&lt;/strong&gt; &lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;In this scenario, ascertaining annual spend on valves, the valve inventory level, valve inventory turnover, and the number of valve suppliers is almost impossible. But, if all valves (irrespective of their usage) were classified under Valve, getting the required information could take a single click.&lt;br /&gt;&lt;br /&gt;Most companies have no suitable taxonomy and, as a result, all their product data quality efforts are built on shaky foundations. If exactly the same product is classified under several different categories, decision making regarding spending and supply chain efficiency becomes guesswork.&lt;br /&gt;&lt;br /&gt;I’ll talk more about taxonomy in future posts.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-3304356921899926085?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/3304356921899926085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=3304356921899926085' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/3304356921899926085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/3304356921899926085'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/11/beginning-of-wisdom-is-to-call-things.html' title='The Beginning of Wisdom is To Call Things by Their Right Names (Chinese proverb)'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-8638117140040255499</id><published>2008-10-13T12:00:00.004+02:00</published><updated>2008-10-13T12:22:51.932+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='No magics'/><category scheme='http://www.blogger.com/atom/ns#' term='Data quality'/><title type='text'>The Proof of The Pudding is In The Eating</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Lately, I met with a potential customer (a small-medium multinational enterprise) who (like many of our customers or potential customers) claimed there was no way the database had duplicate products. He’d established a dedicated team who was responsible for the creation of new item records (SKUs). The team has aimed to keep product descriptions as consistent as possible, inputting the product features in the same order and manner. Furthermore, the company has kept the same team for many years to ensure consistency.&lt;br /&gt;&lt;br /&gt;Well, it’s a nice approach, but I was skeptical. I don't believe that any human being, talented as one may be, can manually maintain master data at the same quality level that can be achieved by a suitable computerized system. The proof of the pudding is in the eating, so I asked him to send us some of their data and enable our domain experts to evaluate its quality. They did. On first sight, the data looked really good, relatively speaking – the best I have seen until now. But a deeper analysis by an expert quickly revealed the problems.&lt;br /&gt;&lt;br /&gt;Here’s a typical example. The company standardizes its product descriptions to eliminate duplications, listing each product’s diameter, steel code, length, and hardness:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt;&lt;strong&gt;Rod Dia 1" SAE4340 Lng 18' 420BH&lt;br /&gt;Bar Round Dia 25.4MM SNCM8 Lng 6M 45Rc&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;There’s just one problem. These seemingly different product descriptions are both the same product — but using different measurements and technical standards.&lt;br /&gt;&lt;br /&gt;· Rod is Bar Round&lt;br /&gt;· Diameter of 1" is 25.4mm&lt;br /&gt;· US standard SAE4340 is SNCM8 JIS standard&lt;br /&gt;· Length of 18' is 6m&lt;br /&gt;· Hardness 45Rc is 420BH&lt;br /&gt;&lt;br /&gt;No one can expect that a team responsible for creating new item records can be an expert in all domains, know all the standards and common abbreviations used in each domain, be able to correctly classify and understand the various technical features relevant to each domain, or even distinguish between the varied descriptions of the same product used by different suppliers. Not to mention the huge obstacle of different languages.&lt;br /&gt;&lt;br /&gt;The ultimate advantage of a computerized data quality system is the ability to harness and reuse the domain experts’ knowledge, creating a data quality firewall that prevents the creation of duplicate records.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-8638117140040255499?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/8638117140040255499/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=8638117140040255499' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8638117140040255499'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8638117140040255499'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/10/proof-of-pudding-is-in-eating.html' title='The Proof of The Pudding is In The Eating'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-791336152746157257</id><published>2008-04-15T11:49:00.003+03:00</published><updated>2008-04-15T12:01:32.891+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data governance'/><category scheme='http://www.blogger.com/atom/ns#' term='Data quality'/><category scheme='http://www.blogger.com/atom/ns#' term='Content'/><title type='text'>Humpty Dumpty Words with Tweedledee Logic</title><content type='html'>&lt;em&gt;"When I use a word," Humpty Dumpty said in a rather scornful tone," it means just what I choose it to mean — neither more nor less."&lt;br /&gt;"The question is," said Alice, "whether you can make words mean so many things."&lt;br /&gt;"The question is," said Humpty Dumpty, "which is to be master — that's all."&lt;br /&gt;&lt;br /&gt;"Contrariwise," continued Tweedledee, "if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic." — &lt;/em&gt;Alice through the Looking Glass, Louis Carroll&lt;br /&gt;&lt;br /&gt;Many times when I review product data, I can't avoid the impression that the product descriptions were written by Humpty Dumpty following Tweedledee logic. So, “which is to be master?” That’s the challenge of product data quality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-791336152746157257?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/791336152746157257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=791336152746157257' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/791336152746157257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/791336152746157257'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/04/humpty-dumpty-words-with-tweedledee.html' title='Humpty Dumpty Words with Tweedledee Logic'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-46950547785031633</id><published>2008-03-26T12:09:00.007+02:00</published><updated>2008-04-15T12:01:56.503+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='No magics'/><category scheme='http://www.blogger.com/atom/ns#' term='Data governance'/><category scheme='http://www.blogger.com/atom/ns#' term='Data quality'/><title type='text'>Data Quality, Fairy Tales, Dragons and Knights</title><content type='html'>&lt;span style="font-family:verdana;"&gt;&lt;em&gt;"At this moment her thoughts were interrupted by a loud shouting of `Ahoy! Ahoy! Check! and a Knight dressed in crimson armour, came galloping down upon her, brandishing a great club. Just as he reached her, the horse stopped suddenly: `You're my prisoner!' the Knight cried, as he tumbled off his horse." - &lt;/em&gt;Alice through the Looking Glass, Louis Carroll&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;A recent post on the &lt;a href="http://datagovernanceblog.com/an-information-management-fairy-tale"&gt;Data Governance blog&lt;/a&gt;&lt;/span&gt;&lt;span style="font-family:verdana;"&gt; mentioned this entertaining and thoughtful &lt;a href="http://youtube.com/watch?v=TbzQvswrOTw"&gt;video&lt;/a&gt; &lt;/span&gt;&lt;span style="font-family:verdana;"&gt;about a dragon named Data Quality — indeed an innovative way to spread the data quality message. It’s a nice fairy tale with knights, dragons and all that stuff, and there’s even a moral to the story.&lt;br /&gt;&lt;br /&gt;But in this fairy-tale, it took a costly knight with a bag full of tricks — and a really long time — before he succeeded in controlling the dragon. But it’s still a fairy tale. The problem is that in real life, there are many such knights, complete with shiny bags and glossy appearance, who all promise to conquer the dragon with almost no effort. Well, real life is no fairy tale. It takes more than a lone knight, no matter how shiny he looks, to conquer the problem. It takes powerful technology and experienced fighters to get control (and keep control) over the Data Quality dragon. No magic, no knights, no shortcuts. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-46950547785031633?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/46950547785031633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=46950547785031633' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/46950547785031633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/46950547785031633'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2008/03/data-quality-fairy-tails-dragons-and.html' title='Data Quality, Fairy Tales, Dragons and Knights'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-8779664293462656589</id><published>2007-11-14T15:18:00.001+02:00</published><updated>2008-03-26T12:21:09.762+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IT'/><category scheme='http://www.blogger.com/atom/ns#' term='Content'/><title type='text'>To IT Or Not IT</title><content type='html'>&lt;span style="font-family:verdana;"&gt;&lt;em&gt;“The goal is transform data into information, and information into insight.”&lt;/em&gt; – Carly &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Fiorina&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I like to compare data quality to an oil refinery. The goal of a refinery is to transform crude oil into refined, clean, and usable oil — after all, no-one will even consider using dirty oil in their cars. It’s the same with crude data: Our goal is to transform it into cleansed, rationalized, and usable information. Using crude (or dirty) data will cause a whole range of short- and long-term problems. Dirty data will prevent the organizational engine from fulfilling its energy potential, making it slow and ineffective – which presents quite a problem in the economic race against other organizations.&lt;br /&gt;&lt;br /&gt;Back to the refinery. A refinery consists of two main entities: infrastructure and oil products. The infrastructure enables production (e.g., reactors), transport (e.g., pipes, valves and pumps) and storage (e.g., silos and containers). Oil products are the raw, crude oil and wide array of products and byproducts we all use everyday or are used by petrochemical industries. In a refinery, life is fairly “simple.” Some define what products are required, when and where; some are responsible for process design; some are responsible for the production of high quality products; and others are responsible for the infrastructure to support all these activities. Clearly, the maintenance or engineering department is not responsible for the process design, production, or marketing and distribution. Their job is to provide the required facilities for production, storage, and transportation, and to prevent contamination and reduction of product quality as a result of poorly maintained infrastructure. They are not responsible for the quality of the content.&lt;br /&gt;&lt;br /&gt;So here’s my point: IT in a modern organization is the maintenance and engineering in a modern refinery. IT provides us with facilities to produce data (e.g., forms); they are responsible for data storage (e.g., databases); they are responsible for data transportation and delivery (e.g., interfaces, reports, queries), but they are definitely cannot be responsible for the quality of the data produced – the content.&lt;br /&gt;&lt;br /&gt;Suppose a young engineer in a factory needs a particular screw for a production process he is working on. He will dip into the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;ERP&lt;/span&gt; system, scan many product descriptions, and a large number of products classified as screws. Among them, he finds the following four descriptions among others:&lt;br /&gt;&lt;br /&gt;· DIN 912 10x1x30-2.9 mat304&lt;br /&gt;· ALLEN &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;SCR&lt;/span&gt; M10x30 stainless steel&lt;br /&gt;· SOCKET BOLT M10x1 LG30 SS&lt;br /&gt;· M10x1&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;mmx&lt;/span&gt;30mm &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;SHCS&lt;/span&gt;-SS&lt;br /&gt;&lt;br /&gt;Well, none of them seems to be the one he is looking for, according the data in the technical catalog he is using. The next logical action will be to generate a new product in the catalog and to order this desired screw. A more expert engineer might be able see that all the above screws are actually the same, in spite of the totally different descriptions. The outcome is a new product number, new order, and inventory of a product already in stock. The next time an engineer needs this particular screw, he will conduct another search, fail to find exactly what he’s looking for, and probably generate yet another new product number with a different description. (By the way, the above product descriptions were taken from a real customer catalog!)&lt;br /&gt;&lt;br /&gt;Can we expect IT to be responsible for the content of the product data? They provided us with the entry form; relevant fields; stored it in the database; and allowed us to restore the product data when asked. But IT cannot be responsible for the content stored in the IT systems. The main problems with data quality are with the quality of the content stored in the IT system. That is why we cannot transform data into information and information into insight.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-8779664293462656589?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/8779664293462656589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=8779664293462656589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8779664293462656589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8779664293462656589'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2007/11/to-it-or-not-it.html' title='To IT Or Not IT'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8489393935068233036.post-8720254370378885936</id><published>2007-11-01T10:34:00.001+02:00</published><updated>2008-03-26T12:22:32.815+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='No magics'/><title type='text'>The "install, run and ... poof" magic</title><content type='html'>&lt;span style="font-family:verdana;"&gt;I spent many years on the enterprise software side, hardly aware of the existence of something called “data quality.” In fact, like many people in the ERP realm, I probably contributed to the problem, because I was focused on slotting data into the right fields without stopping to consider the actual content. Today, we’re hearing a lot more about data quality, particularly when it comes to customer data, and increasingly when it comes to product data.&lt;br /&gt;&lt;br /&gt;It’s not really surprising that the term means different things to different people and is used for varying purposes. Data quality has become more of a marketing slogan than a well structured and defined concept, with many consultants and software companies jumping onto this amorphous bandwagon.&lt;br /&gt;&lt;br /&gt;I’ve spent the last three years in the development of computerized systems and best practices to solve the mess we helped to generate over many years. It is tough, complex, and requires a lot of experience and know-how as well as a profound understanding taxonomy and in many technical domains.&lt;br /&gt;&lt;br /&gt;Here’s the bad news: there are no real simple solutions to this very complex problem; furthermore, crappy data created by over the years can’t be automatically solved by a magic tool: “Install, run and… poof!”&lt;br /&gt;&lt;br /&gt;But here’s the good news: experience and know-how, best practices and methods, suitable software tools and hard work can solve the problem and bring the quality of the data to the right level.&lt;br /&gt;&lt;br /&gt;Lately I’ve been seeing more and more promises of magic wands and tools that automatically and painlessly fix all the data quality problems and live happily ever after. Well, I too am looking for such a magical spell book!&lt;br /&gt;&lt;br /&gt;In the meanwhile, I thought it will be more practical to share my thoughts with those involved in the data quality realm, bring the complex issue of PDQ down to earth, and maybe save some growing pains.&lt;br /&gt;&lt;br /&gt;I do not pretend to be objective – I am biased. I develop systems and practices, run projects all over the world and am confronted with new challenges every day. But I am going to write about the real world, without marketing hot air and without delving into the realm of theoretical concepts.&lt;br /&gt;&lt;br /&gt;I will be happy to receive your comments and to publish your thoughts regarding the data quality domain.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8489393935068233036-8720254370378885936?l=data-quality-spell-book.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://data-quality-spell-book.blogspot.com/feeds/8720254370378885936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8489393935068233036&amp;postID=8720254370378885936' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8720254370378885936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8489393935068233036/posts/default/8720254370378885936'/><link rel='alternate' type='text/html' href='http://data-quality-spell-book.blogspot.com/2007/11/install-run-and-poof-magic.html' title='The &quot;install, run and ... poof&quot; magic'/><author><name>Yossi Rissin</name><uri>http://www.blogger.com/profile/12411298911818600896</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://2.bp.blogspot.com/_s3yo4yNRnE0/SyU0weAtqjI/AAAAAAAAAEw/Pj0NEuc3hnI/S220/IMG_2996+cropped.JPG'/></author><thr:total>0</thr:total></entry></feed>
