r2 - 11 Mar 2004 - 18:00:54 - TedLeungYou are here: OSAF >  Journal Web  >  MeetingNotes > SmallMeetingNotes > DataModelMeeting20040311

Meeting Goals:

  • Consolidate list of all open data model issues
  • Prioritize issues according apps group needs
  • For each issue determine how to resolve (in person meeting vs e-mail)
  • Start addressing in person issues.

I"ve updated the issues from ApplicationDataModelOpenIssues to include the stuff from dev@ where it wasn't redundant. I also re-ordered the issues into groupings by category. I've placed my commentary as to the desired outcome next to the title of each issue.

Another way of thinking about the issues list is to divide the issues (and their pieces into two categories)

  1. Issues with the Abstract/Conceptual data model. An example would be "are we actually using Alias types"
  2. Issues with the implementation of the data model. An example would be "the API for Lob derived types" is unwieldy.


Issues List

Items

Item Addressability/Locating an item [ proposal for item addressability, accounting for requirements below ]

There's a cluster of issues around uris, item paths, namespaces, parcels, the describes tag, containment paths for items, and the parcel directory structure.

A couple of requirements/constraints:

  • In parcel xml, and in python code, developers need some way to uniquely refer to items with a human readable name (vs a uuid). If we had better memories for hexidecimal, we might be able to get away with just using uuids; this is really a readability issue.
  • Items need to live somewhere in the repository, one and only one place to "put" the item. End users don't need to know this place.

Currently, the itemPath covers both of these needs. As a place where items live, it is a hierarchical structure based on the itemPath of the item's parent and the item's name. The repository itself does not decide where to put items, the client code needs to decide where to put each item as it is created.

More requirements/constraints:

  • We needed some static way to define items used by chandler, especially Kinds, Attributes, and various CPIA items that describe the layout of the application. Although we experimented with doing this in python, we ended up wanting these item definitions in xml.
  • We wanted some way to extend Chandler, and we wanted most of the application to be written using this extensibility mechanism.
  • Someone should be able to extend Chandler by dropping a folder (or zip file) into one location in the filesystem.

Currently, parcels (and parcel xml) covers both of these needs. A parcel is the unit of extensibility. A parcel contains items to add to the repository. Items might refer to python code that implements them. Parcels in particular and items in general might also refer to other resources (images, etc.) used by the application.

More requirements/constraints:

  • Given that we have parcel xml to statically define items to load into the repository at startup, we'd love for that parcel xml to be relatively clean/easy/nice for developers to read it and write it. We'd like to be able to define items using tags based on their Kinds and Attributes, not in some general way. It would be cool if this were true for any Kind added by any parcel, not just Kinds that are core to Chandler.
  • Parcels ought to be able to contain other parcels. Parcels need to live in the filesystem where they can be discovered by the application.
  • Parcels should be items, with attributes for parcel data just like any other item.
  • Developers shouldn't have to manage two separate "locations" for parcels and the items they contain.

These requirements led us to the proposal that:

  • We could unify the notion of xml "namespace" and an itemPath, and use this as the "location" of an item in the xml.
  • We could have one parcel per folder, which could then contain other parcels/folders in the filesystem
  • The parcel's location in the filesystem would map to the parcel item's location in the repository (as a well known location). A parcel would then be an item's parent, solving the question of "where do I put this item in the repository?"
  • Using xml namespaces, we could simplify the tags used to describe items, items could be described in parcel xml by their Kinds and Attributes.

So, where does this go wrong, cause problems?

  • We're now in a position where we're using repository itemPaths as uris, but they are not really uris. Perhaps this could be resolved by making the repository's itemPath a real uri (http://core/schema/Kind) or (chandler://core/schema/Kind).
  • We have a lot of code that looks up an item by its itemPath (or uri), and the repository was not really designed to make this a fast operation, afaik.
  • This idea of mapping the xml namespace to the directory structure feels sort of odd/problematic to xml folks. We needed to go add a "describes" tag for the xslt to work, for example. The "describes" tag feels redundant/not right to folks writing the xml (why am I specifying the parcel path 3 times in each file?)
  • Sometimes one parcel defines a lot of items, it feels like a lot of overhead to create a subdirectory/subparcel to be able to modularize sets of items. As we make parcels more heavyweight, subparcels in general might feel like a lot of overhead, even if parcel containership doesn't map to the directory structure.
  • Because we have generalized the parcel xml to use any Kind/Attribute as a tag, its not obvious how to write a validator for the xml. One option is to go back and use more general tags, as the pack xml does. We fear that solution will make xml an unworkable solution to our problem, though, too unweildly for developers to use. We could throw out the approach of using xml to define items, defining some shorthand format (like RDF's N3), or using python. We could perhaps be clever and generate validators based on Kinds an Attributes, which might require some adjustment to the parcel format.

Now that we're at the beginning of a release cycle and not the end, we can question some of our requirements, make adjustments that remove or alter constraints, and come up with new proposals that work better to solve these problems.

AutoKind, AutoItem and ad-hoc attributes [ Why are AutoKind?, AutoItem? inadequate? Do we need to revisit ad-hoc attributes? ]

At some point, we made a simplifying decision to not support true ad-hoc attributes. We agreed we might need to revisit the decision if it didn't work out. Specifically, we mean that an Item can't have an attribute that its Kind does not know about. This brings up two problems, which we had solutions for:

  • In the UI, the user might want to add an attribute to an item on the fly. We decided that the apps code would add the Attribute definition to both the Item and the Kind behind the scenes. We've done no work on this feature yet, for other reasons (other things are more pressing, hasn't come up as a design team priority yet, etc.)
  • A programmer might want to create an item "informally" in Python, without having to go add an attribute to the Kind every time the programmer thought of a new instance variable. We agreed that one could have a class that handled that detail programmatically, adding the appropriate attribute to the Kind, reducing the work for the programmer. We had two attempts at this idea: AutoKind and AutoItem, neither worked out perfectly. Neither is actively used in the current code base.

Where do we go from here? We could take another pass at AutoKind/AutoItem. We could revisit ad-hoc attributes, Andi had a new idea about how we might implement them.

inverseAttribute and otherName [ unify terminology and semantics and document ]

Six months ago, we had a lot of terminology issues to work through as we worked out the initial data model and repository; Brian had one set for talking about the data model and Andi had another for the implementation in the repository. We made good progress and worked almost all of them out, but we never resolved this one. One reason for the difference is that they have ever so slightly different semantics. inverseAttribute refers to an actual Attribute item, otherName refers to the symbolic name of the attribute on an item. Right now, the parcel loader uses the inverseAttribute xml tag to look up the Attribute's name, and then uses that name to set the otherName. We should probably go ahead and resolve this one.

  • Brian adds: I think the ideas behind inverseAttribute and otherName may be tied up with the idea of namespaces and name resolution, as described in the section, Locating an item. At some point we should think about what sort of behavior we want the data model to have with respect to namespaces and name resolution, and we should document it. In particular, if we end up having a data model notion of namespaces that differs slightly from the XML notion, and we have data model items that start their lives in parcel.xml files, then we'll probably want a document that explains the subtle differences.

Default Value, Initial Value [ decide which to keep, find solution for parcel.xml]

taken from Morgen's Extending Chandler document:

  • defaultValue: The value to return when there is no value set for this attribute. This default value is owned by the attribute item and is read-only when it is a collection; default = an attribute has no default value
  • initialValue: Similar to defaultValue except the initial value is set as the value of the attribute the first time it is returned. A copy of the initial value is set when it is a collection; default = none

Open Issues:

  • Do we want both defaultValue and initialValue, or does this make the data model more complicated for not much benefit?
  • If the initialValue is set only the first time it is returned, then it won't show up in queries (I'm assuming), or when iterating through a list of attributes. Often, the developer just wants an attribute to be initialized when the item is created.
  • The parcel xml does not currently support None or the empty list as values for an item, which makes defining an initialValue or defaultValue much less useful. How should the parcel xml represent None and the empty list?

Global vs local attributes [ decide whether to abandon usage, document correct usage ]

Our notion of an attribute is a little different from instance variables in the OOP world, and this has caused confusion, especially when we make analogies between items/kinds and instances/classes. Attributes have definitions which exist as data independently from the Kind. Multiple kinds can use the same Attribute definition, which we think will ultimately be interesting/powerful, but we have yet to prove that. In particular, this comes up in the decision whether to make an attribute "local" or "global", and whether or not the parcel syntax is appropriate. Currently, a lot of people are placing their Attribute definitions as children of a Kind, to indicate that they are not meant to be re-used. (They could be reused, nothing separates them from "global" attributes other than their more obscure location.) When used this way, it feels redundant to have to wire up the Attribute to the Kind.

A couple of possible things to do:

  • Abandon our notion of global Attributes entirely, go back to the drawing board with Attributes
  • Embrace global Attributes, abandon our notion of "local" attributes, work on some use cases that show their benefit
  • Go a little further with our notion of "local" Attributes, assume that if an Attribute is the child of a Kind, the Kind does not need to delcare the Attribute as an "attribute" -- it will be assumed. This would simplify the parcel xml, for example.

Attribute inheritance and bidirectional references [evaluate value of attribute inheritance]

Attribute inheritance was an idea we took from RDF. We haven't really made much use of it yet. When combined with bidirectional references, it starts to get confusing. Jeffrey Harris writes about a particular example (and a related known bug). Andi has a proposal for how we might deal with this issue. Brian has mentioned that we could consider not supporting attribute inheritance at all, perhaps it causes more complexity for not so much value.

Deletion semantics [ examine tradeoffs, decide]

We need to implement some form of garbage collection, otherwise it will be difficult or impossible to know when the last reference to an item is gone. Explicit deletion leads to dangling references which causes code to fail or requires awkward error checking.

  • Bidirectional reference deletion semantics doesn't work because of cyclic references.
  • Container deletion is incompatible with garbage collection since it deletes all contain items are respective of whether there are references to any of the items.
  • These problems could be solved by of garbage collection phase that happens during compaction.

"Policy Primitives" [ new functionality request]

  • reference type "owns" == delete policy + copy policy
  • references should have a "sharing policy", similar to "delete policy" and "copy policy" -- see ItemLinksAndSharing?
  • {mimi 12 Jan 2004} maybe change the attribute "hidden" (boolean) to be "visibility" (enum: visible, hidden, invisible)

Types

Reference Collections [ clarify list, dict ref collections, and status. Document ]

  • We've run into cases where we wanted dictionaries that can contain items. In particular, this came up in the notification manager.

Repository data types and Python data types [ Document supported types and relationship to Python/parcel.xml]

As we are using the repository for all of our persistent data, and as we are writing most of our code in Python, it will probably be necessary to support the basic data structures, like dicts and lists, consistently with the way most dicts and lists work in Python. Today, the data types supported by the repository aren't clearly defined anywhere, and there are often subtle differences between the repository's idea of a data structure and Python's idea of the data structure. Although dictionaries are deprecated, the apps team is currently using them. In the past, and maybe even still today, dictionaries have restrictions on what you can put inside of them. Repository lists, unlike Python lists, are implemented as doubly linked lists, which affects the efficiency of indexing.

  • It might make sense to begin by making a list of the data types supported by the repository, noting how they differ from Python data types, and their representation in parcel XML.

Structs [resolve field values, document usage vs items, settle dirty semantics, clean up API ]

Andi added structs to the data model/repository, and cpia is making use of them.

Rect, Size, and Color are simple examples, defined by cpia:

A few code snippets pulled from these files to give an illustration of how struts are working right now:

struct item definition for the size struct

  <Struct itemName="SizeType" itemClass="OSAF.framework.blocks.DocumentTypes.SizeStruct">
    <fields key="width"></fields>
    <fields key="height"></fields>
    <implementationTypes key="python">OSAF.framework.blocks.DocumentTypes.SizeType</implementationTypes>
  </Struct>

a few attributes that use structs as types

    <Attribute itemName="size">
      <type itemref="docSchema:SizeType"/>
    </Attribute>

    <Attribute itemName="minimumSize">
      <type itemref="docSchema:SizeType"/>
    </Attribute>

    <Attribute itemName="border">
      <type itemref="docSchema:RectType"/>
    </Attribute>

the python code needed for the structs

class SizeType(object):
    __slots__ = 'width', 'height'
    
class SizeStruct(CoreTypes.Struct):

    def makeValue(Struct, data):
        (width, height) = data.split(",")
        size = SizeType()
        setattr (size, 'width', float(width))
        setattr (size, 'height', float(height))
        return size

an example of an item that uses the attributes defined above

  <BookmarksBar itemName="BookmarksBar">
    <!-- Attributes -->
    <bookmarksPath itemref="doc:BookmarksRoot"/>

    <contentSpec itemref="doc:myQuery"/>
    <open>True</open>
    <size>200,20</size>
    <minimumSize>200,20</minimumSize>
    <border>2.0, 2.0, 2.0, 2.0</border>
    <alignmentEnum>grow</alignmentEnum>
    <stretchFactor>0.0</stretchFactor>
  </BookmarksBar>

Brian might be able to give a more complicated example (not yet implemented) from the content model: Conversations.

  • Brian adds: A Conversation Item is currently modeled as a list of Conversation Line Items. Each Line Item represents something like a line of text in an IRC chat, and each Line Item has an author, a timestamp, and a simple one line string. Right now the Line Items are modeled as first-class items, but that seems like overkill, because you'll never operate on them as independent items. For example, you'll never delete one line without deleting the whole conversation, and you'll never share one line without sharing the whole conversation. So maybe we should really model the line items as being structs, but in order to do that we might want to have the struct feature support the ability to have the 'author' attribute be a reference to a Contact Item, rather than just a string literal. -- BrianDouglasSkinner - 11 Mar 2004

Open issues:

  • What kind of value can a struct field have? In particular, can it have a reference to an item? We haven't run into this need in CPIA, afaik, but Brian has mentioned it as an interesting feature for the content model.
  • Unlike attributes for items, modifying a value doesn't cause it to be saved. It's only saved if you explicitly dirty it, which is confusing and error prone. A related issue, but not part of the struct question, is the fact that changing an attribute to its current value causes the item to be dirtied. This leads to situations where we dirty many more items than is necessary. It would be interesting to see if there is a benefit of checking to see if an attribute is changed before marking it dirty.
  • handling of structs in parcel xml, setting the fields
  • It might be nice to streamline the code that someone has to write to define a struct, reducing the amount of boilerplate python. For simple cases, perhaps we could define structs entirely as items in parcel xml, including each field's type.

Aliases [Interchangeabilty of types and kinds, mixed ref collections, ]

We're pretty sure we want Aliases, and we have an implementation of sorts in the repository, but we haven't really put them to use yet.

AnyContact example
We'd like to create an Alias to be used in places that someone might want to refer to a person. The person might be in a list of people at a meeting, for example. Sometimes we have a full Contact item for that person. Sometimes we have an email address. Sometimes we might only have a string for the person's name.

  • Brian adds:
    • see also Jungle.DataModelSept2003FeatureList#attributes
    • The Alias example above ends up touching on the question of whether or not we deal with types and kinds as being homogeneous and interchangable. Other related questions?
      • single valued attributes that can contain either items and literals (either "string" or "Task")?
      • lists that contain both items and literals (e.g. both "strings" and "Tasks")?
      • "Anything" as an alias?
      • aliases to Types vs. aliases to Taxons?
      • in the schema XML format, do we separate "attributes" and "references"?

Internationalization/Localization [What needs to happen here]

Strings [resolve]

  • String vs. Single-line-string
  • plain-text-string vs. rich-text-string
  • strings and text fields that include links to Chandler items
  • How are line endings stored in strings used in the repository? Is this an issue for portable (cross platform) repository files? (Is this a goal?)

Dates [resolve/design]

  • flexible dates: Jan 14, Feb 2005, "all day on 13 May 2004"
  • timezones
  • duration -- what data type for the email attribute "Polling Frequency" -- duration?

LOB/Text/Binary [settle the API]

  • usability of the API

Implementation/API issues

Make Repository look like an Item [assess impact to client code, make changes]

  • removing Item.getItem<Name|Path|Parent>() methods and replace them with read-only python properties called name|path|parent
  • add same properties onto Repository|RepositoryView so that it can be viewed as an Item in that context

Python code corresponding to Kinds [need a proposal for what would be ideal]

We haven't given this problem much attention recently, but its probably worth really working out the idiom for writing Python code associated with a Kind.

Open issues:

  • The python code associated with a Kind still involves more boilerplate than is ideal
  • Folks have mentioned confusion about where to comment the Kind: in the python code or in the xml?

We have several earlier experiments where we defined the Kind in python (no xml) -- a couple of folks wanted to bring up this option again. A few lessons:

  • When we tried generating python classes on the fly from data, folks found this very confusing: no python code to go look at to see what was what
  • When we tried inferring the item from a Python class, folks found that confusing as well. Perhaps that's partially a result of not having a clear idea of what we wanted out of the repository at the time.
  • Figuring out a way to define Attributes as we have imagined them is a bit tricky. Attributes have an identity globally.

Parcel issues [not sure what needs to happen here]

There are more parcel issues. We've tried to capture the ones that imply questions about the data model, but may have missed something. In particular, these issues may impact the data model:

  • parcel loading/unloading
  • adding/removing/changing values from parcel xml
  • better debugging support

Out of Scope

  • Schema evolution

Threading model

The current threading model, which gives each thread a different view of items, makes it difficult for us to run tasks in the background, e.g queries, receiving e-mail, or agents. If we need to commit from the U/I thread to view changes from other threads, then it impossible to use commit for undo.

Summmary/Next Actions

We made it about half way through our list, so we'll probably need to have another meeting. Here are the issues we did discuss, any decisions that were reached, and Next Actions.

Items

Item Addressability/Locating an item [ proposal for item addressability, accounting for requirements below ]

We had a long (40 min) discussion about addressability. There was a detour into cross repository addressing of items. Andi was unconvinced that addressability was a feature of the data model proper, while others felt that it was important that addressing mechanisms be as consistent as possible between the persistence and parcel layers of the system.

There are probably some issues here that will get worked out as we flesh out our understanding of sharing.

Also there was some discussion about the tension between trying to get crisper understanding of data model so it can be explained documented vs the fact that we are still evolving the model so it's hard to get a good understanding. If we cannot explain the model to others we will be unsuccessful.

Next Action : Morgen and Andi to create a proposal for how to use URI syntax to define addressing within a single repository.

AutoKind, AutoItem and ad-hoc attributes [ Why are AutoKind?, AutoItem? inadequate? Do we need to revisit ad-hoc attributes? ]

Ted asked for an explanation of the inadequacies of AutoKind? and AutoItem? which are basically that a Python developer still has to write too much code in order to add an attribute

Brian reported that the Design group is moving away from ad-hoc attributes.

Decision : defer action on ad-hoc attribute support until the Design group clarifies their thinking.

Next Action : There was general agreement to remove AutoKind? and AutoItem?. Andi will remove these two file from CVS after he e-mails a copy of them to John for insurance against the future.

inverseAttribute and otherName [ unify terminology and semantics and document ]

This issue is also related to addressability. Andi noted that the problem becomes worse when you add sub attributes.

The feeling was that there was insufficient experience with these features to make an informed decision

Next Action : The apps group will try to write some real application scenario code that exercises these features.

Default Value, Initial Value [ decide which to keep, find solution for parcel.xml]

There was some discussion about whether Default Value was really needed or whether it was mostly in use inside the repository code base. Andi said that Default Value is used in the schema pack. The apps folks think that what the really want is something like initialValue, but with the values initialized immediately, not lazily -- this is also needed for efficient query processing.

Next Action : Andi will implement the agreed upon (described above) semantics for initialValue

Global vs local attributes [ decide whether to abandon usage, document correct usage ]

The feeling was that there was insufficient experience with these features to make an informed decision

Next Action : The apps group will try to write some real application scenario code that exercises these features.

Attribute inheritance and bidirectional references [evaluate value of attribute inheritance]

Jeffrey Harris is blocked by the interaction of these two features. And there was some confusion as to what the correct expected semantics should be.

Next Action : Andi to do a whiteboard session with Katie, Brian, Jeffrey (by phone) and anybody else to describe the precise semantics.

Deletion semantics [ examine tradeoffs, decide]

This could be a whole meeting in itself, and there are differences in perspective between and among the Design, Apps, and Repository groups.

Andi asked that we keep in mind the difference between deleting a reference to an item and deleting the item itself.

Next Action : Katie to schedule a meeting for the Design and Apps groups to come to a common understanding of what the behavior should be.

"Policy Primitives" [ new functionality request]

This item appeared as a feature request from Brian. In addition to the policies listed in the issue list, he also mentioned an "owned" policy.

Next Action : Brian to write a proposal detailing how policy primitives should work.

Types

Reference Collections [ clarify list, dict ref collections, and status. Document ]

Andi clarified that reference collections of cardinality list are the only ones supported right now. Dict collections are deprecated -- the only reason they work is that Andi is faking them to lists.

John and Stuart raised the issue of wanting a reference collection that was indexable via a key (dict). John also raised the need for a reference collection that could be indexed like an array, since lists are O(n) for "random" access.

Next Actions : John and Andi to collaborate to remove all reference collections with cardinality dict. John (and Stuart?) to write a proposal for the types that they would like to have.

Implementation/API issues

Make Repository look like an Item [assess impact to client code, make changes]

We agreed that Andi's proposal to use Python properties to solve this problem was good. There was some discussion over the property names. The primary tradeoff being between using up a common identifier (like 'name') versus using a semantically incorrect property name (like 'getItemName' applied to the repository).

Decision : we agreed to use a prefix to disambiguate names. So the name property will be .in_name.

Next Action : Andi to implement properties using this naming convention.

-- TedLeung - 11 Mar 2004

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.