"One man’s “magic” is another man’s engineering. “Supernatural” is a null word." – Robert A Heinlein

"If you torture data long enough, it will tell you anything you want!" – Unknown


Wednesday, December 10, 2008

Taxonomy, The Constitution Of Product Data Quality (part 1)

I think the most important, yet undervalued, factor in the product data realm is the taxonomy. The taxonomy is the skeleton and foundation of any reasonable product data quality (PDQ) strategy. A complete, well done taxonomy (and I will elaborate on this later) serves as the core domain knowledge for PDQ and increases its level of quality. Of course, having a good taxonomy cannot, by itself, guarantee high product data quality, but without it, high quality product data cannot be achieved at all – it’s simple as that.

Taxonomy can be easily compared with the constitution. A good constitution is comprehensive, clear, consistent, balanced, practical, updated, and sets limits and borders while also allowing for ad hoc judgments and decision making. A good constitution embodies and accumulates values, positions, culture, experience, common sense, and serves as a guide for the society who creates it. But having a good constitution is not enough – it should be followed, enforced, and continuously maintained. The same with a taxonomy.

To give a bit of background, taxonomy, the study of classification, is the basis for all science. We use taxonomy to structurally group similar things into categories, based on a set of common, category-specific characteristics. Aristotle made one of the earliest attempts to classify two major groups: plants and animals. Plants were separated according to size (structure)–herbs, shrubs, and trees, and animals were grouped according to where they lived–land, sea, or air.

Carolus Linnaeus (1707-1778) was a Swedish naturalist who is considered the "Father of Taxonomy." He set out to examine, describe, classify and name every living species on earth and developed the system by which we name organisms today, which groups species according to shared physical characteristics. His task was to make sense out of chaos, and to devise an organizational system that would sort out any confusion. He grouped species together into genera based upon physical similarities, and then grouped genera into families based upon broader physical similarities, etc.

In essence, taxonomy consists of four elements:

  • Categories (e.g. Manual Wrenches, Screw Drivers)
  • Category hierarchy (e.g., Assembly & Fastening -> Wrenches -> Manual Wrenches)
  • Attributes related to each category (e.g., Manual Wrenches: Opening Size, Overall Length, Handle Type, Jaw Material, etc.)
  • Values related to each category attribute (e.g., Manual Wrenches: Jaw Material: Alloy steel, Cast bronze, Aluminum-Magnesium, etc.)

In my next post, I will discuss "What constitutes a good taxonomy?"