Arbitrary Taxonomies and the Illusion of Precision (Pt 3)

3 05 2010

In Part One and Part Two of this blog post, which has now become an essay, I talked about how the books Getting Things Done and The Thinker’s Toolkit inspired me to do more with less, intellectually.  Although the train of thought meandered a bit through dales and valleys, my main point was that we have to be careful about assuming that we know more than we do.  Our work often requires precision, and sometimes we pursue the illusion of precision when we cannot access the real thing.  Like desperate cops who put naming a suspect above solving the case, we embark on a mission for an answer that is more convincing than correct.  Everyone from investors to software quality assurance analysts are impressed by evaluations that sound quantitative.  But science and statistics cannot unlock all doors.  In this case, the best approach is not to make stuff up, but to fess up to our limitations.  Be skeptical of decimal points when you know damn well that they cannot be obtained.

Everything that I’ve discussed so far as been relevant to the present.  However, these principals are even more relevant to our assessments of the future.  An example from software engineering is the tendency to over-design.  This happens in other fields of engineering too, but it is way too easy to over-design software.  The costs and consequences are often hidden from view until it’s too late to correct our errors.  It also doesn’t help that over-design is encouraged by a naive interpretation of common software quality attributes (i.e. the “-ilities”).  We hold up squishy words like “scalability” and “maintainability” as virtues, leading to design reviews where someone defends some completely superfluous and non-mandated feature as visionary.

The Agile programming crowd has a rule against over design.  For the project types they are engaged in, experience shows that refactoring is usually less work than building in hooks for feature expansion.  The reason is simple: you won’t know what you need until you need it.  You’ll probably waste time designing for something that never comes to pass while still spending what you would have anyway to implement the new functionality once it becomes necessary.   Not every program is amenable to this philosophy, because rework costs do tend to grow exponentially with program size.  But even for larger projects, you have to have some very concrete ideas about the dimensions of future growth before even attempting to design for them.  By concrete ideas, I mean ideas that are well formed enough to construct requirements that are no less rigorous than your other requirements.

This is the topic that brings me back to Getting Things Done. The book recommends that everyone use a flat alphabetic filing system.  If you’re using paper filing, that means you’re not creating special cabinets or drawers to organize your reference material by high level topics.  Instead, try to group papers by as low a level as possible and then file the folders alphabetically.  That’s it.  Don’t get fancy.  No matter how many drawers you have, you should only have one system distributed across all of them.

You might not think this principle applies to electronic filing, but it does.  If you know you have structured data, then of course you need your system to represent that structure.  But when I say structured data, I mean something you could put into a relational database.  For everything else, filing is never as easy as just hitting the search button.  Most of our data is not organized in a relational database. It’s composed of PDF’s, Excel spreadsheets and all of their companions.  We store them in folders and it is those folders that will lead to the death of our civilization, as we spend endless hours waiting for that Microsoft cartoon dog to find our stuff.  I’m not more hopeful (or trusting) of Google Desktop either.

Why do we do this to ourselves?  Computers make it too easy to make new folders.  They should use that dialogue box that asks you if you’re sure you want to delete something to ask you if you seriously think creating a new folder will clear up your electronic clutter or make it easier to find your stuff later.  How many times has someone asked you to send them that budget report again because they know they have it, but they just can’t find it right now?  If that same person has a folder tree on his or her computer that’s more than ten levels deep, then it’s time to schedule an intervention.

Our electronic folders are the arbitrary taxonomies of the title.  We think we understand that relationships among all that data that’s swimming on our hard drives – but we’re wrong. We’re just as wrong as when we think we can score a movie using fractional stars.  We’re just as wrong as when we think we can rank ten objects front to back.  We’re just as wrong as those guys who built tethers for zeppelins on the Empire State Building because they were designing for the future.  Companies like Google spend millions of dollars developing algorithms via which vast server farms can discover the taxonomies linking data.  You don’t stand a chance of doing that using your own brain.  It’s laughable that I even think I know how my data is related right now, let alone how I will use it in the future.  Save yourself a headache.  Use just a few folders.  Groups things together only if you know for sure they are strongly related.  Be careful how you name your files, so that it’s easy to find what you’re looking for.

In other words, KISS.

Advertisements