In my everyday-reality while working with the computer there are always three questions, I need to answer, before I can start or continue my work:
- Where is my information?
- What is the most current document?
- What else belongs to this context of information?
Let’s look in these requirements in a little more detail…
Where is my information?
Please Note:
I kept this very explanatory and basic, to invite people to the matter, who are unfamiliar with the concept of a taxonomy or ontology. Those of you, who did already projects with RDF or are familiar with the KM-paradigms can skip to “What is the most current document?”. You might however also read through this just for fun.
Usually I create something, which is known as a “directory” in my filesystem with a more or less consistent naming convention. I consider this as the possibly worst approach imaginable, but it works in some way for me.
Here lies a fundamental problem in workgroup environments. I might synchronize this information with a server or a peer computer for collaboration. My fellow co-workers then can then themselves figure out, how I organized it all. Unfortunately, they might change something and send it back to me through maybe email for me to figure out again, what has happened. Even worse, I might get a similar, unsorted “bucket” of electronic information from somebody and have to figure out myself, how things relate to each other. I am ironic here – in real live you want to avoid this under all circumstances and therefore we are defining necessary conventions for collaboration, which are heavily imposed by the limitations of our current working environments.
This situation is the anchor-point, where the category-idea kicks in. I assume a category to be something with a user-defined name similar to the name of the directory, but with more dimensions.
For example the category “Project Chandler” might refer to all information produced by me for that Project. It might also refer to the categories “OSAF”, “Categorizing” and “Customer XY”. Categories in Lotus Notes Documents do exactly, what I am talking about here.
In difference to a Notes Database, where single categories implemented a “flat” view, categories should be multi-dimensional-hierarchical, depending on the view. Such hierarchical trees of dimensional categories are known as taxonomy or ontology. So if the view on my taxonomy looks like this:
- Projects
- Chandler
- Categorizing
- (my Documents reference is here)
And I put my document into the level of “Categorizing”, the document would relate also to the other categories (maybe not in the first instance, but possibly as a 2nd or 3rd level sibling.). In an alternative view on the same taxonomy the tree might look like this:
- Customers
- OSAF
- Chandler
- Categorizing
- (my Documents reference is here)
And my document would show up here too. Placing something in the taxonomy (establishing a “relation” or reference”) is in most cases a manual action by the user. Some things can be automated though, but here a certain level of discipline and communication in your workgroup is still necessary. I don’t expect any application to automatically manage total chaos in an intelligent way, but it should be easier to avoid chaos in general. We will look more detailed into the power of taxonomies later. Lets continue with the next question:
What is the most current document?
I currently use naming conventions on the file, which indicate Project, Customer-Site, Content, Date, Version and Author for different reasons. This is a lot of meta-information. Only date and version help me to locate the most current document. Unfortunately, the filesystems timestamp cannot be reliably used in workgroup-environments. Some indication also comes from the size of the file, shown by the OS, because my most up-to-date document will be usually the “biggest” size, but this is weak.
The next level of Categorization would simply flag the most current document “You stopped working here”. That sounds easy, but how this can be accomplished will also explained later. Now for the 3rd and most important question:
What else belongs to this context of information?
An awful lot of other information relates to what I am doing currently. They come from all kinds of sources and are represented in a variety of applications and to make it even worse, delivered through different mechanisms (Filesystem, network, removable media, email… etc.). To make it even more complex, this information comes in different versions through time. I believe, I spend more than 80% of my time in this managing and tracking information, opening, viewing and closing documents in applications. I believe, this is not specific to my work and I whish, I could dedicate more time to what I actually want to accomplish. This is, where automatic categorization could be the “killer-app”.
To define a “Context of information” (which is, what a “Category” actually means), I found two basic possibilities:
One is to flag everything to belong to the context. This involves in any case a manual process. If it is not the manual flagging, it is to control the result of some automation to see if “flagging” was set correctly.
The other is to flag only my working document(s) and “relate” the other information to this document.
I will go the later approach, because I fount the first one somewhat impractical and we can very elegantly utilize our already existing taxonomy paradigm for this second approach. In fact, the idea taxonomy can be seamlessly extended to the atom-level of our content-based world. I even imagine the graphical representation of taxonomies (the view) to be the most critical issue for effective work. I am not a graphical visionary. I will outline as much of my ideas as I can for the real GUI-wizards to work with.
In my imagination “my” document forms the center of the informational context. All other documents, which I used along the way, relate to this document. This is a meta-relationship in the first place. I am not talking about “embedding” and object-references, such as Microsoft OLE.
It is essential for me to see when I used what other reference, in which version. “Stacks” of documents in versions with the “You stopped working here” flag would do the job fine, but there is more to come.
Up to here things tend to be nice, but somewhat basic and essential. Nevertheless it would blow my mind, if I could work like this so far.
Now we want to look into things, which make our live even better. All documented functionality build on the things discussed above and some additional tools.
- When I stopped or finished working with a document, I want to be able see a “Summary” of its content. That is extracting a configurable number of strongest semantic concepts.
- When I am working with a document, I want to see, which other documents (maybe not even categorized, or just abstracts from content in the Web, Emails) relate to the semantic concept I’m just typing here. If you have ever worked with autonomy’s “Active Knowledge”, you’ll know what I mean.
- I want to have active recommendations to extend taxonomies of my own. That is, providing hints, where a certain reference to information should be stored in reference to other information.
- I want to have an “Auto-Categorizer” function. That means, if I receive a bucket of information, I want to simply drop it onto the “Auto-Categorizer” and have it sort through all of it. It should come up with a little Taxonomic structure as proposal, of how things relate to each other.
- I want to have taxonomic assistance, while browsing the Web. In my imagination any websites content will be analyzed and related to the categories, which contain similar content.
- I want to automatically build indexes of what I looked at in the Web by Category. Imagine a Web-browser to be able to show a history of , what you read not by time and URL, but by categorized content and topic of interest.
- I want to exchange this information with others and submit parts of the taxonomy to search-engines to receive similar categories of information.
- I want to search for existing (yet submitted) taxonomies, containing references to documents in the desired categories.
- I want to find people through the network, who have similar categories of interest in their taxonomies.
And I am sure I want to do a lot of other things, based on categorized content, which I don’t even know yet…
Lets sum it all up. We need the following high-level functionality to get the job done, containing:
- Taxonomies, along with possibilities to create, view and edit references
- Jungle.Versioning
- The Toolbox
In the next section, we will learn, how we can build this high-level functionality from simple tools.
--
BernhardGroehl - 27 Dec 2003