Data is artificially siloed because most data systems can't deal with heterogeneous data
- To strict, too autistic
- No room for incomplete data (ie. Fuzzy dates on the calendar)
- Only one kind of data at a time
- Data has fixed schemas
- Can't see more than 1 kind of data together at a time
- If you can't aggregate different kinds of data into a single view together, how do you detect patterns across your data set? How do you chunk them down into groupings other than the ones presented to you by the system? How do you gain a sense of how the different groupings relate to each other? (ie. Project A has more dependencies on Project B than vice versa AND Project B tasks are mostly due after Project A. Uh-oh.) [INSERT 2 GRAPHICS ILLUSTRATING SILOED VIEWS VERSUS OVERLAYED VIEWS] In other words, how do you ever wrap your head around the data?
- You have to do all of the aggregation and cross-comparison in your head and it's just too hard.
- If you have tasks stored in 8 different places: Inbox, Drafts folder, Digital Calendar, Paper calendar, Excel spreadsheet, Stickies, notepads, in your head, how do you keep track of things that are common to all of them (ie. All tasks I need to discuss with Gina)
- This is why writing was invented in the first place. So we could pin things down and reflect upn them, allowing us to overlay more and more layers of complexity in our ideas. Data management systems in software need to be able to do the same thing.
But, even if you were able to pull all of this heterogeneous data together, what would you do with it? It would be an big incomprehensible mess.
All [are] ghosts rising in a milk-white fog
Grokking or The Extraction of Knowledge from Raw Data is essentially a matter of pattern recognition. The more obvious the pattern (ie. Arabic numbers versus Roman alphabet), the easier it is for a person to construct a coherent picture out of what would otherwise be meaningless babble.
abcdefghijklmnopqrstuvwxyz
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33...
We're talking about the Gestalt effect, that which allows humans to recognize faces, shapes and colors in all of their variegated and ambiguous real-world manifestations. That thing that computers are still pretty bad at doing.
The transubstantiation of information from physical reality and conceptual incubation in the brain to virtual representation in the form of alpha-numeric strings on a computer screen is a process of rendering the tangible, intangible, a process that strips things of their multi-dimensionality. The process of data entry is a process of generalization, where things as disparate as zebras, sabre, bras, razes, and rabes conform to such a high degree of regularity that they are distinguished by a mere reordering of a uniform set of characters.
Gone is the distinction between things conceptual and things physical
Gone are the distinctions between things past, present, future, eternal and atemporal
Gone are distinctions in hue, saturation, brightness, scale, texture, number, pitch, harmony, sonority, timbre...
Unfortunately, the ability to differentiate between the various aspects of things as they exist in nature (ie. Color, Texture, Sound) that in turns allows us to see the similarities between the same aspects of different things (ie. All things that are Blue and All things that are Loud). In other words, without distinctions, there are no likeness. And without likeness, there are no patterns.
Enter cliche: A picture is worth a thousand words. Through the fog of encoding all of human knowledge into a generic character set, all things start to look alike thereby rendering pattern recognition in the information age, a challenging if not impossible task.
The generalization of the the data then makes it very hard for people to see natural groupings and patterns, because that's what generalization does, it makes everything look the same. Instead, you start making groupings based on how the text is arranged. Long words, short words, words that start with S, words with capital letters. Attributes of the word become the easiest way to group and detect patterns, not the semantic substance of the metadata itself.
Quick, what are the most obvious ways to group the list below?
dog
god
canine
Canaan
good
Case Study: DDC
Looking back to the Dewey Decimal example, here is a case where the stars align and the encoding of data into alpha-numeric strings helps to group a list of items into groups. However, it only works if the data items share the same name:
- General collections in Spanish and Portuguese and
- General collections in Slavic languages
It breaks down when the items don't have similar names.
- Data processing Computer science
- Computer programing, programs, data
- Special computer methods
- generalities.png: