r9 - 23 Sep 2005 - 13:09:13 - MimiYinYou are here: OSAF >  Journal Web  >  MimiYinNotes > ClassificationPaperOutline2 > WhyDoPeopleOrganize

People put information into data systems so that they can get information our of their data systems.

Scenario 1 In some sense, data systems are elaborate tracking devices, providing ways for you to describe your data when you put it in so that you can find it again when you need to take it out.

Scenario 2 In a less common, but equally important case, data organization is about pulling information together so you can see them in a single view.

The former amounts to large sets of data that share some common characteristic (ie. a folder of receipts for all your Amazon.com purchases) that you think you'll remember about them when it comes time to look for them. You don't necessarily ever look at the data set for the sake of looking at it as a group (unless your tracking your purchase patterns over time to identify shopping trends: Cookbooks in November, Diet books in January). The items themselves don't really cohere to form some kind of bigger picture in your mind. It's simply a means for targeted search and retrieval at some later date.

The latter usually results in small to medium groupings of data that oftentimes don't share a readily available common characteristic (ie. a folder of emails you need to review before your one-on-one with your manager).

As search improves and tag and label-based systems like Gmail, delicious and flickr become more mainstream, fewer and fewer people will feel the need to resort to actively filing items into folders in order to accomplish Scenario 1. Scenario 2 will always be around, but they may become ephemeral groupings that go away once the task at hand has been done.

  • Does this mean that we will eventually live in a world where capital-O Organization is irrelevant?
  • Where the gathering of data into compartments will become unecessary?
  • Where the construction of those compartments into some kind of structure of groupings will become simply a relic of a more primitive era?
  • Is Google really the end of road?

Scenario 3 What's missing from this picture are the guts of why people bother with capital-O Organization: To wrap their head around their data. To take their data, synthesize it and turn it into something that is greater than the sum of its parts. To extract knowledge from disjunct pieces of information. To make sense of it all. To impose coherence, shape and scope where there was only a blob of stuff.

ChaoLam I think this is an important point here. When you listed scenarios 1 & 2 above, I felt there was something subtle missing here: When people use a data system, their mind adapts to the system. So, not only do they put information into data systems, they put new information (meta-information) into their minds.

The act of creating a nested folder and placing an item into the nested folder, moulds the user's mind. Tagging , because it is intrinsically easier to do, moulds the mind much less. That's why people feel more "grounded" with folders, but quickly forget how they tagged something.

Every piece of data that you put into a system carries within it the hope that you will get more out of the system as a whole than the individual pieces you put in.


Case study Dewey Decimal Classification System. A lesson in data mining aka Extracting knowledge from your data.

In the graph below, I've laid out the DDC along the "Foof factor" dimension where Foof is a cross between Froofy and Poofy. The distribution of topic areas tells us the following story about the contents of Libraries:

  • The bulk of writing lies in the middle of the curvie in the soft sciences and humanities
  • There is considerably less writing at the ends of the spectrum: hard sciences and the arts
  • This makes sense since you could say that the primary by-product of the soft sciences and the humanities is expository (explanatory) writing whereas the hard sciences and the arts are more concerned with creating "things" as opposed to writing about things: ie. theorems, technologies or works of art.

  • DDC_Bell_Curve_small.png:
    DDC_Bell_Curve_small.png


The power of structure to express coherent narrative is at the root of why some people will persist in structuring their data by hand no matter how intelligent computers become.

Some people simply can't deal with someone else doing their homework for them. For such people, capital-O Organization represents their best way to synthesize their data, their best bet when it comes to looking for, identifying and resolving patterns in their data into some kind of coherent structure.

Another way of saying it is that we don't like to be overwhelmed, confused, or disoriented.


The need for separation of Church and State

It's a free country, if people need to Organize their data, they should be allowed to do so and software should provide tools to do so as well. (iTunes recently caved in by adding hierarhical folders in version 5.) Problems arise however when people are forced to make do in one-size-fits-all organizational paradigms that are supposed to be generic, panaceic solutions to all your information management and organizational needs.

But as you'll see below, the three scenarios we've identified thus far are both various in nature and diametrically opposed in workflow needs.

In Scenario 1: Describing items for targeted search and retrieval. Here, individual items, not groupings of items are of primary importance. The groupings are just a means to an end, a way to find the individual item. This has important consequences to what users want out of data systems and how they manipulate the system when it doesn't naturally provide them with the right affordances. You're looking to group things by shared charactersitics or shared sets of characteristics so you can narrow your search based on what you remember about the particular item you are looking for. You then want to structure the groupings so that the kinds of items you look for the most (ie. emails from your boss) are also the most easily accessible groupings in the structure.

In Scenario 2: Pulling items together into an explicit grouping, the grouping is of primary importance and the consituent items important only because they have been pulled together. In other words, in explicit groupings, the whole is greater than the sum of the individual parts. Explicit groupings are often time-sensitive, in preparation for some event: a meeting, an email to send. More often than not, explicit ordering really matters, there is a chain relationships with very specific dependencies between the items, ie. a conversation thread, a thread of task dependencies, etc. Explicit groupings are usually structured so that the groupings that you need to look at the most (ie. Prep for next staff meeting) are the most easily accessible. Since explicit groupings are often temporal, the set of groupings that "you need to look at most" is likely to change frequently over time.

In Scenario 3: Surveying the entire data system so you can see the forest for the trees requires a structure that is clear, conceptually consistent throughout and easy to understand. ThePrinciplesOfGrok essay deals with the intricacies of how to accomplish this. The important thing to note here is that the interests of the long view are more often than not, at loggerheads with the interests of scenarios 1 and 2. Scenario 3 requires a level of discipline, foresight and downright elbow grease that is simply unecessary and in some ways actively harmful for targeted search and retrieval and explicit groupings.

We'll be exploring the impact of scenario 3 on information management workflows later in this paper. For now, let's just say that as much as possible, workflows and affordances for accomplishing these 3 different goals should be different with the caveat that users should never be penalized for choosing the "wrong" scenario.

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.