r1 - 22 Feb 2005 - 15:55:11 - KatieCappsParlanteYou are here: OSAF >  Journal Web  >  ContributorNotes > KatieParlanteNotes > SpikeFeedback20050218
From an email sent to me (Katie Parlante) by Brian, who was in early conversations about the data model and the repository. (I added minor wiki formatting for readability).

link to spike overview


Hi Katie,

Thanks for the pointer. Good document. I think it raises a bunch of interesting questions.

Here are my some reactions off the top of my head to the things that caught my eye. Feel free to share any of this with other people if that would be useful to the discussion.

- brian

Independently testable layers

This all sounds great to me... "commands down, events up", "events only travel between adjacent layers", This part is probably uncontroversial. Basic undergraduate doctrine. But it's worth having someone remind you about it from time to time. It might be worth the effort to set up the unit tests in a layer-by-layer way, if that isn't the case already.

Presentation Layer separate from Interaction Model Layer

Seems like a noble goal, but in practice it might not be worth the effort. I'm not a good person to offer an opinion on this: I don't know enough about the current CPIA architecture, and I don't have enough experience writing presentation and interaction code.

Storage Layer above Model Layers

For me, this was the single most interesting idea in the paper. This week I've been writing code for my "blue-sky" project, and one of the issues I've been wrestling with is what relationship I want between the "Modelling Layer" and the persistence code. Right now I have them munged together because I couldn't figure out a good separation.

I've never seen a code base where the Storage code was layered above the Model code, but if that's a workable solution it would certainly be more elegant. And it would have the big pragmatic payoff: it would be easier to swap in and out different Storage modules, to test different performance numbers, or to add different features. It would also make it easier to define an ideal Model first, without being influenced by pragmatic storage layer constraints. I think the Chandler project may have suffered some from doing too much storage work too early on, before experimenting enough with the model layer.

Platform extensibility, plugins, and start-up time

This all sounds good to me. I don't know much about this kind of stuff, but it seems like the Eclipse model would work for Chandler.

Two Worlds: static access API and dynamic access API

This is a big deal. All along, Chandler has been struggling with the tension between static and dynamic APIs. RAP and ZODB and all that old stuff. I think it's a hard problem. I don't have any experience with any system that has tried to solve the problem and somehow combine static and dynamic APIs.

If I were starting the Chandler project from scratch today, I would argue for not trying to have both static and dynamic APIs, and instead having only dynamic access APIs. That would make all the e-mail and calendar code way uglier, but I think in the end it would lead to a much more wonderful product. I worry about losing the soul of agenda with all the compromises required to support even a limited static API. So, on this "two worlds" question, my position is at one extreme, and I'm opposed to the solution the paper is suggesting.

Chandler-defined attributes and user-defined attributes

The paper talks about two types of attributes: "Chandler-defined" and "user-defined". I don't think those are good terms. Presumably "Chandler-defined" means all the attributes that exist in the Chandler that OSAF ships, when I first install it. And "user-defined" means the attributes that get added later. But those "added-later" attributes won't just be "user-defined", they'll also be added when the user installs third-party parcels. Or they'll be added when user Foo looks at a shared item that was created by user Bar using a third-party parcel, even though user Foo never installed the third-party parcel.

In my mental model, there isn't a single distinction between "Chandler-defined" and "user-defined" attributes. Instead, there are two distinctions. One distinction is between "attributes which have code written against them" and "attributes which don't", without regard to who wrote the code, OSAF or a third-party. And the other distinction is between "attributes shipped by OSAF" and "attributes added later".

If you assume my mental model, then some of what paper says is problematic. For example, this paragraph concerns me: "Such code cannot reference user-defined attributes in such a way, but not only because that code doesn't know what user-defined attributes will exist ahead of time. Spike's architecture must actually make a stronger guarantee: it must never be possible to access a user-defined attribute (UDA) via normal Python attribute access directly from a content item, because it would otherwise be more difficult to evolve Chandler's schema safely."

Attribute names, naming conflicts, and schema evolution

The paper assumes that kinds and attributes have both display names and "names", where the name is some "unique" symbol that can be used in python code or xml. For example, an attribute might have a display name of "Start Date" and a name "startDate". The startDate name is meant to be unique within some context, where that context might be an xml namespace, or a might be a python class.

If I were starting the Chandler project from scratch today, I would argue for having only display names. I don't think unique symbol names should be stored in the repository alongside display names. Python code could still be written using unique symbol names ("sd = item.startDate"), it's just that the mapping between the symbols and their corresponding items should be stored with the python code base, not in the schema. The symbol "startDate" would uniquely specify a certain attribute because the python code that it's used in would explicitly associate it with that one attribute. That association could live in the python code itself, or in a mapping file along with the python code, or in parcel of items that describes the python code, but not in a parcel that describes the kinds and attributes themselves.

I believe that if you set things up without the repository storing unique names, then a lot of other headaches just go away, including a lot of the questions raised in the section of the paper on schema evolution. Another headache that would go away is the problem described in this paragraph in the paper -- and this problem would go away not only for "user defined attributes", but also for attributes defined by third-party parcels, where the parcel has code written against the attribute: "Right now, if a user were to add an attribute to a Chandler-defined 'kind', and a future version of Chandler added an attribute to that kind with the same name as the one the user added, Chandler would be forced to rename the user's existing attribute in order to avoid conflict. However, if UDA's always exist only in a dynamic namespace with no direct mapping to Python object attribute names, then there is never any possibility of conflict."

Information modelling

I totally agree with everything the paper says about relationship cardinality.

I also like the material the paper presents about bi-directional relationships. Good material about weak-typing of relationships, and defining a relationship as its own thing, apart from the kinds that it relates. And the stuff about iterating over relationships. Also good stuff about problems with having to modify existing kinds in order to create new kinds where items of the new kind can have references to items of the old kind. (Although some of the problems go away if you don't have to worry about unique names.) And good stuff about having API that allows bi-directional traversal, regardless of how the actual storage is done.

Terminology

The paper suggests using the term "Entity" instead of "Kind". I think that would be a mistake.

Ideally, the software designer and the programmer should have the same mental model of the app as the end-user will have. It doesn't work well for the developers to build one set of abstractions but then present the user with a different set of abstractions. You get the problem with leaky abstractions. Better to just have one mental model to start with, and that mental model should be the end-user model, not the programmer model. So, when it comes to naming things, I would argue that you should pick the names that are going to show up in the UI and the help text. I can explain to my grandfather that "there's a kind of item called book, and these items here are book items", but that's harder if I have to use the word "Entity".

Also, words like "Entity" and "Class" are problematic precisely because they already have established meanings in programming languages and database theory and metamodelling standards. Better to start with a clean slate and introduce terms that are free of preconceptions. That way there's less confusion when you start explaining that an item can be assigned to more than one kind, or whatever else might be innovative.

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.