r29 - 02 May 2006 - 13:17:09 - MimiYinYou are here: OSAF >  Journal Web  >  MimiYin > HierarchyVersusFacetsVersusTags

See ClassificationPaperOutline2 for a more up-to-date version of this paper.


The problem of where to file: Is it possible to construct the perfect classification system?

A truly first-rate hierarchy would not only have all of the characteristics of FN's hierarchy [_hey - what's an 'FN'?_], but it would also manage to encode the hierarchy in such a way as to eliminate all ambiguity as to where an item might be found. FN comes pretty close. But you can always imagine that it might be hard to decide where that sock garter really goes? Bottoms? Legs? Ankles? Feet? It's also easy to imagine how that favorite pair of stretchy pants might do equally well @Home or @Gym.

[As a result, Hierarchies are horrible at #3: Targeted search and retrieval of individual items. In a hierarchy where items can only live in 1 place, the messier the hierarchy is, the harder it is to figure out where to put an item and the harder it is to figure out where you put it, when it's time to find it.]

But as you'll, see this is a problem even in faceted classification systems.


How the cookie crumbles: The ways in which hierarchies fail:


Nobody builds semantically pure hierarchies, it's just too much work.

Look at the Finder screenshot on the HierarchyPapers. Who the hell would do that? Hierarchies are too hard to set up and even harder to maintain. Not even the DDC is this cleanly done. But why? Because real life data needs are never semantically pure.

[More reasons why hierarchies are bad at #3: Targeted search and retrieval of individual items]

  • 1. Users need to browse their data in different ways
  • The Closet hierarchy, with it's emphasis on pulling together a particular ensemble for a particular event is not optimized for targeted retrieval of a single item of clothing: ie. that favorite pair of shorts.
  • Instead, the Inverted tree where Anatomy and Layer are at the top of tree is the optimal organizational structure to guide you to that favorite pair of shorts.
  • Finding_tree.png:
    Finding_tree.png

  • 2. Hierarchies are too hard to reorganize on a whim
  • You could always reorganize your hierarchy on an as needed basis, but What a big job that would be...
  • Here's a tree of 32 containers that has been reorganized such that the bottom level of the hierarchy is now at the top.
  • moving.png:
    moving.png

[Hierarchies are bad at #7: Attaching semantic information to data.]

  • 3. Real people have different semantic encoding requirements for different branches of the tree...therefore, items lose semantic data if they are moved between branches with incompatible semantic structures
  • Case study: FN's Laundry pile needs to be optimized for doing Laundry, not getting dressed
    • Laundry>>Material>>Color family
    • However, whenever FN moves an item of clothing from his Closet to the Laundry, he loses all of the semantic data encoded in the Closet hierarchy: Occasion>>Mood>>Anatomy>>Layer. Actually, he loses all those semantics as soon as he takes the clothing out of the closet and puts it on.
  • Case study: Katie doesn't want to replicate her entire Project hierarchy under the Done branch of her OmniOutliner tree for several reasons:
    • She doesn't have that many Done items to warrant such a huge tree
    • She really wants to have easier access to her Done items, without having to dig 3 levels deep into a Project hierarchy
    • It's a pain in the butt
    • BUT, if she doesn't replicate the Project hierarchy within the Done container, she loses semantic data encoded in the Project hierarchy, every time she moves an item from the Project branches to the Done branch
    • 05a_Katie_omnioutliner.png:
      05a_Katie_omnioutliner.png

[Hierarchies are horrible at #6: Easy access to favorites.]

  • 4. Stuff I need access to DOES NOT HAPPEN TO EQUAL the stuff at the top of the tree: Hierarchies are bad at Favorites
  • Case study: Katie's OmniOutliner
  • To achieve a semantically pure hierarchy, Katie would not only need to replicate the Project hierarchy under the Done branch of her outline, she would also need to nest the Project hierarchy within a Not Done container, such that the top level of the hierarchy can cover the full spectrum of the Status dimension
  • To Katie, this would bury her Project in yet another layer of unecessary
  • Similarly, Katie doesn't want to bury her Done items in 2 layers of unecessary Project hierarchy. But if she doesn't she looses the semantics of the Project hierarchy.

  • Side note: Attempts to shoehorn alternate organizations of data into a single hierarchy muck up the guided navigation experience
  • Case study: Yahoo! related links also break the story of the hierarchy...back to being a mole rat
  • Want to see things by Anatomy? Drill down to the Anatomy level and travel horizontally across the hierarchy to visit all Anatomy containers
  • This feels manageable with something simple and orderly like the Closet hierarchy
  • But mole-rate syndrome settles in pretty quickly with something as huge and unwieldy as the Yahoo! directory, Amazon, or our very own wiki...Basically any hierarchy that is semantically impure and therefore hard to grok completely will get even messier with links.
    • When you walk the tree in an orderly fashion, you know exactly what parts of the spectrum you've looked at, in each level of the hierarchy
    • You also know what parts of the spectrum you've decided not to look at, in each level of the hierarchy
    • When you jump around the tree via links (or wormholes), you lose the context of an orderly perusal of the tree
    • This is the difference between looking for a missing document in a file cabinet in an orderly manner, one folder at a time from A-Z...as opposed to jumping around. It's hard to know what you've looked at and what you haven't looked at.
  • As we'll see shortly, this foreshadows a similar problem with navigating tagsonomies

Take home: Because hierarchies has been the designated one size fits all solution to all our organizational needs, we break our semantically pure hierarchies by overstretching their bounds. As a result, we end up with messy hierarchies that are unusable and unmaintainable.

  • We encode different parts of the hierarchy in different ways (ie. Closet v. Laundry, Katie's Projects v. Done)
  • We bubble up favorites to the top of the hierarchy
  • We use things like Desktop aliases or Yahoo! or Amazon-style "See related links" to travel the hierarchy in alternate ways

[Messy hierarchies are bad at everything.]

  • The more varied the data the messier hierarchies become
  • The faster the data changes the messier hierarchies become
  • Unfortunately for most PIM client users, these are the 2 main characteristics of PIM data

  • Case study: Outlook.png
    • Notice the mixture of different semantic encodings at each level
    • Notice the incomplete spectrums:
      • There's a folder for "For Followup", where's Done and No Followup Needed?
      • Rejected companies? Where's Accepted and Pending?

  • The screenshot below is a real-life Outlook sidebar of someone who works in the tech industry but isn't a programmer or someone who feels particularly tech saavy.
  • The sidebar consists of 5 levels of hierarchy that span 5-6 different attributes: Kind, Sphere, Status, Who, Project Area, etc...
  • The levels of the hierarchy are semantically impure:
    • I've color coded the sidebar such that each level of hierarchy is a different hue family: Blue, Violet, Orange, and Pea Green
    • Within each hue family, different attributes are assigned different saturation levels. From the variegated distribution of saturation levels within each level of the hierarchy, you can see that it's very inconsistent.
  • 05_Outlook.png:
    05_Outlook.png

  • Case study: DDC
  • Notice how the most easily comprehensible parts of the Generalities branch are the ones with "clear semantic encoding".
  • This is because you're brain was able to chunk down a big huge list of topics into a hierarchy of 2 or 3 attributes or facets.
  • The other sub-areas just smear into an unmemorable list of General sounding things.
  • 04_Generalities.png:
    04_Generalities.png

  • Messy hierarchies inaccurately represent data
  • When the levels of the hierarchy are consistently encoded with the same semantics across all of the branches of the tree, it is easy to see the facets in the hierarchy
  • hierarchy_facet_verticals.png:
    hierarchy_facet_verticals.png

  • Ideally, you want to visually represent facets and true parent-child relationships differently.
  • parent_child_versus_facets.png:
    parent_child_versus_facets.png

  • However, when the levels aren't consistent, you can't tell the difference between true parent-child relationships (ie sub-folders) and semantically-encoded hierarchical levels (ie. consistently semantically encoded child folders that cut across the entire tree). Because actually, very few things are truly sub-things. Most things are facets.


What about faceted classification systems?


What is a facet and What is it good for?

  • Facets in Chandler-speak are the Attributes of an item
  • The semantically meaningful levels in the hierarchy are essentially an attempt to encode facets into a hierarchical organization of data
  • However, unlike the levels of the hierarchy, Facets in a Faceted Classification System are independent of each other.
  • Therefore, unlike hierarchies, Facet values or Attribute values do not have fixed parent-child relationships. They have no relationship to each other at all.
  • Examples of faceted systems include: Chandler data model, Spreadsheets and tables

  • The lack of structure means that Faceted systems ARE optimized for switching between different organizations of the same data
  • Imagine the iTunes browser on crack...where you can actually reposition the columns, effectively re-organizing the hierarchy on the fly.
  • Fire up the iTunes Browser and step through how it works
  • iTunes.png:
    iTunes.png

[*Why facets are good at #3: Provide means for targeted search and retrieval of individual items.*]

  • 1. Facets provide multiple ways to label an item
  • The ability to label an item in multiple Facets means that you never have to be paralyzed by an inability to choose between two equally valid folder locations in a hierarchy. In a hierarchy, an item can only belong in either the Project Foo folder or the Status Done folder. It can't live in both. In a faceted system, an item can be labeled as both Project: Foo and Status: Done.
  • This in turn means that while it is often unlikely that a single characteristic of an item is enough to differentiate it from all other items (ie. Email from: Jane, a person might have thousands of them), the ability to attach multiple dimensions or facets of metadata to a single item gives you the capacity to construct a more unique metadata thumbprint for each item. (ie. Email from: Jane, Date received: Today, Subject includes: Junior league)

  • 2. All of this translates into more flexibility when it comes time to retrieve items
  • Especially in large, messy hierarchies, it's often hard to remember exactly where you put an item. As a result, people often find themselves hunting through their hierarchies looking for that elusive item.
  • In a faceted system, you can pluck items out of the data soup by calling up a few of its Facet or attribute values. This is what we do when we build search queries. A well-constructed search is one that puts together an unique enough thumbprint of the item's attributes.
  • It's the difference between looking for Roberto's Taco Stand by walking the streets of San Diego, looking for where Marina Blvd. intersects the rollercoaster versus having Roberto's Taco Stand, the one at the intersection of Marina Blvd. and the rollecoaster in San Diego brought to you.
    • Another way to put it is: You don't have to go to the information, the information comes to you
    • Another variation on this theme: You don't need to know where the information lives, the system knows where it is. You just need to remember a few of it's characteristics and the system brings it to you.

  • 3. Faceted browsers provided a guided navigation experience optimized for pinpointing data
  • The ability to slice and dice data in multiple ways amounts to the ability to construct completely different semantically pure hierarchies on the fly. What would take a lot of work in a fixed hierarchy is no effort in a Faceted browser. Faceted browsers are great at flexibly optimizing and re-optimizing themselves for the task at hand, whether it's getting dressed for work, doing your laundry or finding a pair of shorts.
  • Footnote: You might think that with search, you wouldn't need to have hierarchy-style guided navigation, even if it is infinitely flexible. However, sometimes you need a little more hand-holding than that. When constructing a search, you're asked to set all of the parameters at once. This works when you know exactly what you're looking for. You can get to your information by directly wormholing to it at warp speed. When you're less clear about what you're looking for (ie. this often a problem in Google when you know what something looks like but you don't know what it's called) you will often conduct a series of searches that iteratively narrow the search results with additional parameters, where the results of each iterative search help provide clues about how to proceed.
  • A Faceted browser like the one in iTunes provides essentially the same experience with more hand-holding. Instead of having to sift through pages and pages of search results to figure out how to narrow your search, the Faceted browser is simply a clustering mechanism that explicitly clumps the search results into semantically meaningful groupings (ie. For the Genre: R&B, the songs you have are by these Artists and in these Albums). In other words, instead of using your brain to cluster search results, the system clusters the search results for you.

[Why facets are great at #7: Attach semantics to data]

  • (Cont'd from #3 above...) And the great thing is, because in Faceted systems, semantic information is stored on the item, no matter how you reorganize the facets, no matter how often you turn the tree on it's head like a Rubiks cube, metadata is never lost.

Where facets fail

[Why facets suck at #1: Providing narrative.]

  • 1. Slicing and dicing is great, but... the constant reorientation can be confusing in and of itself: Most people are bad at spatial reorientation and if you think of each facet as a dimension of the data, everchanging rearrangements of facets can be similarly jarring.
  • The following is the kind of drawing you might do in a high school drafting class. It is a 3-dimensional cube represented as a series of 2-dimensional drawings, showing all faces of the cubes. Takes a second to put it all together into a cube, doesn't it?
  • 3D.png:
    3D.png
  • Can you tell when the image has made 1 revolution?
  • The following image has been taken from: http://synapses.mcg.edu/anatomy/astro3d/astro/A_3D.stm
  • 3-D_rotating.gif:
    3-D_rotating.gif

  • 2. Faceted systems don't quite tell a story
  • To be fair you have the same multi-level chunking that you get in well-designed hierarchies:
    • Chunking of items into containers and then
    • Further chunking of containers into container types (facets)
  • And the chunking is far easier to maintain because the containers and container types are NOT locked into a fixed parent-child structure, but instead co-exist independently
  • Similar to well-designed hierarchy levels, a well-designed facet that is filled out end-to-end with minimal overlap can greatly enhance the browsing experience.

However, it is precisely this lack of structure that also means that Faceted systems *don't prioritize" the container types for you the way Hierarchies do and as a result, they fail to go that final mile so crucial to storytelling: a linear dictation of what order to experience the facets in. Instead Faceted systems are designed to allow the user to construct their own storyline.

  • Case study: iTunes
    • First, you might figure out that the kind of data you are dealing with is songs, because the facets are: Artists, Albums, Genre, Composer
    • Second, by looking at the range of Containers or Attribute values in each Facet or Attribute, you get a sense of what what kinds of songs you have: Bob Dylan v. Berlin Philharmonic v. Kenny G
  • But ultimately, facets fail to give you a sense of the facet foodchain, which a hierarchy of Genre>>Composer>>Artist>>Album does:
    • That Genre is a bigger, more everlasting concept that a Composer, which is inherently an individual person, living in a particular time.
    • And Composer, especially in Classical music is a bigger, more everlasting figure than any individual artist that performs and interprets the work of that Composer. (ie. An analogy might be how certain bureaucrats outlive the elected presidential administrations they serve: Henry Kissinger)
    • And finally Artist, assuming they have more that 1 hit song, is a more everlasting thing than the Albums they release.

  • Case study: DDC
    • Compare Area of study>>Region>>Time period to (where Area of study, ie Philosophy, transcends all physical boundaries and exists throughout time and Region transcends Time) to
    • Area of study
    • Region
    • Time period

[Facets don't always work for #2: Guided navigation to explore a particular topic.]

  • 3. It's not always clear how best to use Facets
  • The lack of a fixed structure also means that users are left to construct the right structure for the right task, which is maybe not what some people are interested in doing. A well designed faceted system should perhaps provide users with options based on what they want to accomplish, rather than asking them to construct a Faceted browser one facet at a time. (ie. How many times have you watched an inexperience computer user search on Google using some incredibly generic term.)
    • If you want to get dressed, the system presents you with the Closet hierarchy
    • If you want to do your laundry the system reorganizes into the Laundry hierarchy

  • 4. Facets have to be added one at a time
  • Hierarchies provide easy affordances for attaching semantics to data because the entire organizational structure is visualized
  • Facets, because they're independent of each other, must be added one at a time (ie. filling out the fields of a form)
  • Caveat 1: We believe is primarily a workflow structure and interaction design challenge that can be overcome
  • Caveat 2: This obviously doesn't apply to automatically derived metadata (ie. CDDB)
  • 02_Hierarchy_Encoded_Space.png:
    02_Hierarchy_Encoded_Space.png

  • 5. Hard to design facets well
  • A well designed facet, just like a well-designed semantically encoded level of hierarchy:
    1. Fills out the spectrum of the facet from end-to-end (ie. Beginning to End: Planning, Execution, Documentation, Post-mortem) so you're never afraid you're missing something.
    2. Has no overlaps, so you're never ambiguous about where something belongs.
  • Katie has a project called 0.6 planning, but does it really belong in the Project facet? Or is there more to this story?
    • Release: 0.6
    • Phase: Planning
    • Product layer: i18n
  • It's hard and tedious to design facets with no overlaps

As a result, just as Hierarchies turn into chaos, Faceted systems often disintegrate into Tagsonomies

[Tags suck even worse at #s1 and 2: Narrative and Guided navigation to explore a particular topic.]

  • Loss of depth
  • Because Facets are onerous to add and hard to design well, what most people end up with is more of a Tagsonomy: attribute values without attributes or a-semantic labels, which ultimately means, even less chunking and less storytelling
  • In the end, this feels just as overwhelming as the free-for-all hierarchy of semantically inconsistent or worse, a-semantic, generic "Categories"

  • They multiply like rabbits!
  • The Tags multiply like rabbits The loss of dimensions in Tagsonomies can be thought of conversely as an unchecked multiplication of dimensions. Rather than thinking of Tags as all belong to the same facet, you can think of each individual Tag as its own dimension. See facetious. But, who can wrap their head around n-dimensions? n-dimensions eventually meld together, back into 1 dimension, just as the 100-sided volume (zocchihedrom) melds into a continuous-surface sphere.
  • zocchihedron.jpg:
    zocchihedron.jpg
  • The items look like they're multiplying like rabbits Furthermore, the ability to assign more than 1 tag to an item is perhaps more convenient when you're tagging, but when it comes time to understand the landscape of your data in terms of all your tags, the ramplant multiplication of items showing up in multiple tags can make a mountain out of a mole hill of data.
  • As a result, as you browse around, jumping from tag to overlapping tag:
    • Sometimes you see the same items reappear for the nth time,
    • Occasionally you see new items appear for the first time, but you never get the satisfying feeling of knowing:
      • Where you are
      • What you've seen, what you haven't seen
      • How much of the data you've looked at and how much there is to go...this "sense of place" can only come from an orderly walk through an orderly tree (whether it's a fixed Hierarchy or a flexible tree generated by a Faceted browser).

  • Tags don't actually help you understand your data better, they're just a more usable way of labeling than the alternative of dragging and dropping items into folders

  • Tagsonomies lack a visualization UI
  • This sense of disorientation is partially because Hierarchies visualize the relationship between Containers. So if you're looking in the wrong Container, you can at least look at a neighboring Container to see if you've missed something. In other words, hierarchies visualize degrees of separation. (ie. Folder A is not inside Folder B, but it's next to Folder B, or it's 1 branch over at the same level as Folder B).
  • There is no comparable visualization UI for tags, as a result, in tagsonomies, all neighbors are created equal. If what you're looking for doesn't exist in the tag or the intersection of tags you're currently looking at, you're out of luck. For any tag A, you can only guess which tags B,C,....or Y might be a bridge to another not directly related, but still highly relevant tag Z.
  • Without a visualization tool, tags are just as dumb if not dumber than hierarchies. They also only have 2 kinds of relationships: instead of Parent-Child and Sibling, tags are either Related or Not related.

  • chunking.png:
    chunking.png

  • The drawing above is one possible way to visualize faceted classification systems and/or tagsonomies
  • Items are chunked into tags, categories or attribute value groupings: circles
  • Attribute values are chunked into attributes or facets: same hue
  • Attributes or facets are then chunked into hue families
  • Visualizing the overlap also gives you a more textured narrative of the different kinds of relationships. All of a sudden, you go from a binary world view: related, not related to a much more nuanced universe of relationships between tags:
    • Degree of overlap with other tags
    • # of other tags
    • Relative size of this tag to other tags
    • Relative size of this tag to overlapping tags

Easy come, Easy go: Tagsonomies are too flexible for their own good

[However...Tags are great at #3: Targeted search and retrieval...Sort of]

  • Tagsonomies make it even easier than Faceted systems to label items, primarily because you don't have to worry about assigning the right Tag to the right Facet. Instead, you just blurt out free-form, stream-of-consciousness
  • However, it's greatest strength is also it's greatest weakness. It's very flexibility can paralyze people as well.
  • Some of the MIT Haystack studies asked users to "tag" URLs they found on the web with keywords as an alternative to filing bookmarks in folders. In the beginning, users felt great about the new paradigm. However, pretty quickly, many of the users began to feel like the whole process pointless. What often happens is that someone who is researching a particular topic (ie. Mating behavior of Bonobo apes) will find that all of the keywords they come up with apply to all of the material they find. As a result, they find it pointless to apply the keywords after a while.
  • You could say that the users were just bad at coming up with good keywords, keywords that could actually help them differentiate between data, generate an unique enough metadata thumbprint of the item, rather than glom all of their data into one homogeneous mass. Another way of looking at it is that if the users had worked with a Faceted system instead, the structure inherent in the Faceted system might have guided them to attach the right kind of metadata to their content. Rather than simply coming up with subject matter or topical keywords, a Faceted system might suggest a richer variety of orthogonal or independent attributes to label items with such as: Content type, Pro v. Con, Status, Author background, Region, Time period, etc.
  • Footnote: Another way to alleviate the seeming random pointlessness of applying keywords to items would be the ability to rank order the keywords. While it is certainly easy to imagine that a lot of content is about very similar if not the same set of topics, different content will often emphasize different aspects of the same subject matter. So out of 10 articles about Bonobo mating behavior, some might focus more on Courting in Mature apes and less on Sexual play in Juveniles). This would be yet another way to differentiate seemingly homoegeous sets of data.

Segue to the Presentation

So what does all of this amount to? What does this mean for Chandler? How are all of these ideas and findings applied to the UI, that is the central question we'll try to answer in Tuesday's Virtuality Presentation.


Comments

(1) you say: Hierarchies turn into chaos

I thought the problem was, hierarchies turn into overly-ordered and therefore inaccurate representations of your information. That's shoddy and undesirable, but not "chaotic" at least in my understanding.

(2) I like the lack-of-visualization point a lot. i've seen a few dynamic network-y visualizations of non-hierarchical data (e.g. Grokker, various WordNet? visualizations) and I've always found them somewhat confusing.

-- BrendanOConnor - 17 Jul 2005

Thanks for your comments Brendan. With regard to #1, I've added a few examples under the Take home:... heading. The idea is that because hierarchies are overly-ordered, but our data isn't overly ordered, we have to shoe-horn some of the use cases that hierarchies are very good at (ie. Targeted search and retrieval, Favorites and Attaching semantics). The result is the mess...Facets have the same problem. Their very orderliness is what makes it difficult for them to be used by laypeople with unpredictable data sets that grow at and astounding rate and change all the time. Both "classification" systems disintegrate under the stress. So whatever design we have for Chandler needs to somehow address that problem.

-- MimiYin - 18 Jul 2005

Mimi and Brendan, regarding Hierarchies turning into chaos, isn't there also the time dimension to the issue? Or as in Mimi's analogy, hierarchies turning into noise over time (entropy)? I start with a perfectly fine hierarchy, but over time the world has changed and my needs change and I have to reorganize the hierarchy, but that's hard to do, so some old items stay in the old hierarchy and new items live in a new hierarchy. Over time, the organization system represents a hierarchy that spans the lifetime of the system, with all its inconsistencies, resulting in chaos.

-- ChaoLam - 18 Jul 2005

I'm adding this comment from Chao because I think it bring out an important "alternate" way to tell this story. Unfortunately, writing itself is a fixed, linear medium of communication and hard to reorganize into different structures ;o)

I feel you've de-emphasized an important "interaction" aspect to the discussions of hierarchy vs. facets vs. tags.

From the interaction techniques we know of (and maybe more fundamentally how the brain works) , 1) It's harder to file an item under a hierarchy, especially a deeply nested hierarchy (like DDC?) 2) It's extremely easy to tag something 3) Faceted systems are kind of in between, metadata is often cumbersome to input.

-- MimiYin - 18 Jul 2005

This page is all I've read in this journal. Based on this page, the very important option of selective hierarchy is left out. Unidimensional hierarchies are noxious to me because they are so constraining and make information management very difficult. I've felt this way since at least 1993 when I worked in a law office doing data organization work. I have found information management to have progressed quite slowly, perhaps because of the stranglehold that Microsoft has on the software market. On the other hand, tags and facets are helpful but are not hierarchical enough. Hierarchy in itself is not a problem. It's unidimensional hierarchy that is the problem. See MDE Infohandler's master/slave structure for a good example of how selective hierarchy can work.

-- InfoMan - 04 Mar 2006

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r29 < r28 < r27 < r26 < r25 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.