r11 - 02 Dec 2004 - 22:46:22 - KatieCappsParlanteYou are here: OSAF >  Projects Web  >  DevelopmentHome > ApplicationProject > ApplicationOpenIssues20040127 > ApplicationDataModelOpenIssues

0.4 Data Model Open Issues

Note, these issues were input to the repository team at the beginning of the 0.4 release (March 2004)

More detail on open issues in the data model from the application perspective: cpia, content model, agents, etc. This page will be used as input to the repository team as they look at the data model at the beginning of the 0.4 release. This page will not be maintained indefinitely. Discussion of the issues brought up here might be best on the dev list.

Structs

Andi added structs to the data model/repository, and cpia is making use of them.

Rect, Size, and Color are simple examples, defined by cpia:

A few code snippets pulled from these files to give an illustration of how struts are working right now:

struct item definition for the size struct

  <Struct itemName="SizeType" itemClass="OSAF.framework.blocks.DocumentTypes.SizeStruct">
    <fields key="width"></fields>
    <fields key="height"></fields>
    <implementationTypes key="python">OSAF.framework.blocks.DocumentTypes.SizeType</implementationTypes>
  </Struct>

a few attributes that use structs as types

    <Attribute itemName="size">
      <type itemref="docSchema:SizeType"/>
    </Attribute>

    <Attribute itemName="minimumSize">
      <type itemref="docSchema:SizeType"/>
    </Attribute>

    <Attribute itemName="border">
      <type itemref="docSchema:RectType"/>
    </Attribute>

the python code needed for the structs

class SizeType(object):
    __slots__ = 'width', 'height'
    
class SizeStruct(CoreTypes.Struct):

    def makeValue(Struct, data):
        (width, height) = data.split(",")
        size = SizeType()
        setattr (size, 'width', float(width))
        setattr (size, 'height', float(height))
        return size

an example of an item that uses the attributes defined above

  <BookmarksBar itemName="BookmarksBar">
    <!-- Attributes -->
    <bookmarksPath itemref="doc:BookmarksRoot"/>

    <contentSpec itemref="doc:myQuery"/>
    <open>True</open>
    <size>200,20</size>
    <minimumSize>200,20</minimumSize>
    <border>2.0, 2.0, 2.0, 2.0</border>
    <alignmentEnum>grow</alignmentEnum>
    <stretchFactor>0.0</stretchFactor>
  </BookmarksBar>

Brian might be able to give a more complicated example (not yet implemented) from the content model: Conversations.

  • Brian adds: A Conversation Item is currently modeled as a list of Conversation Line Items. Each Line Item represents something like a line of text in an IRC chat, and each Line Item has an author, a timestamp, and a simple one line string. Right now the Line Items are modeled as first-class items, but that seems like overkill, because you'll never operate on them as independent items. For example, you'll never delete one line without deleting the whole conversation, and you'll never share one line without sharing the whole conversation. So maybe we should really model the line items as being structs, but in order to do that we might want to have the struct feature support the ability to have the 'author' attribute be a reference to a Contact Item, rather than just a string literal. -- BrianDouglasSkinner - 11 Mar 2004

Open issues:

  • handling of structs in parcel xml, setting the fields
  • What kind of value can a struct field have? In particular, can it have a reference to an item? We haven't run into this need in CPIA, afaik, but Brian has mentioned it as an interesting feature for the content model.
  • It might be nice to streamline the code that someone has to write to define a struct, reducing the amount of boilerplate python. For simple cases, perhaps we could define structs entirely as items in parcel xml, including each field's type.
  • Unlike attributes for items, modifying a value doesn't cause it to be saved. It's only saved if you explicitly dirty it, which is confusing and error prone. A related issue, but not part of the struct question, is the fact that changing an attribute to its current value causes the item to be dirtied. This leads to situations where we dirty many more items than is necessary. It would be interesting to see if there is a benefit of checking to see if an attribute is changed before marking it dirty.

Repository data types and Python data types

As we are using the repository for all of our persistent data, and as we are writing most of our code in Python, it will probably be necessary to support the basic data structures, like dicts and lists, consistently with the way most dicts and lists work in Python. Today, the data types supported by the repository aren't clearly defined anywhere, and there are often subtle differences between the repository's idea of a data structure and Python's idea of the data structure. Although dictionaries are deprecated, the apps team is currently using them. In the past, and maybe even still today, dictionaries have restrictions on what you can put inside of them. Repository lists, unlike Python lists, are implemented as doubly linked lists, which affects the efficiency of indexing.

  • It might make sense to begin by making a list of the data types supported by the repository, noting how they differ from Python data types, and their representation in parcel XML.
  • We've run into cases where we wanted dictionaries that can contain items. In particular, this came up in the notification manager.

Default Value, Initial Value

taken from Morgen's Extending Chandler document:

  • defaultValue: The value to return when there is no value set for this attribute. This default value is owned by the attribute item and is read-only when it is a collection; default = an attribute has no default value
  • initialValue: Similar to defaultValue except the initial value is set as the value of the attribute the first time it is returned. A copy of the initial value is set when it is a collection; default = none

Open Issues:

  • Do we want both defaultValue and initialValue, or does this make the data model more complicated for not much benefit?
  • If the initialValue is set only the first time it is returned, then it won't show up in queries (I'm assuming), or when iterating through a list of attributes. Often, the developer just wants an attribute to be initialized when the item is created.
  • The parcel xml does not currently support None or the empty list as values for an item, which makes defining an initialValue or defaultValue much less useful. How should the parcel xml represent None and the empty list?

Aliases

We're pretty sure we want Aliases, and we have an implementation of sorts in the repository, but we haven't really put them to use yet.

AnyContact example
We'd like to create an Alias to be used in places that someone might want to refer to a person. The person might be in a list of people at a meeting, for example. Sometimes we have a full Contact item for that person. Sometimes we have an email address. Sometimes we might only have a string for the person's name.

  • Brian adds:
    • see also Jungle.DataModelSept2003FeatureList#attributes
    • The Alias example above ends up touching on the question of whether or not we deal with types and kinds as being homogeneous and interchangable. Other related questions?
      • single valued attributes that can contain either items and literals (either "string" or "Task")?
      • lists that contain both items and literals (e.g. both "strings" and "Tasks")?
      • "Anything" as an alias?
      • aliases to Types vs. aliases to Taxons?
      • in the schema XML format, do we separate "attributes" and "references"?

inverseAttribute and otherName

Six months ago, we had a lot of terminology issues to work through as we worked out the initial data model and repository; Brian had one set for talking about the data model and Andi had another for the implementation in the repository. We made good progress and worked almost all of them out, but we never resolved this one. One reason for the difference is that they have ever so slightly different semantics. inverseAttribute refers to an actual Attribute item, otherName refers to the symbolic name of the attribute on an item. Right now, the parcel loader uses the inverseAttribute xml tag to look up the Attribute's name, and then uses that name to set the otherName. We should probably go ahead and resolve this one.

  • Brian adds: I think the ideas behind inverseAttribute and otherName may be tied up with the idea of namespaces and name resolution, as described in the next section, Locating an item. At some point we should think about what sort of behavior we want the data model to have with respect to namespaces and name resolution, and we should document it. In particular, if we end up having a data model notion of namespaces that differs slightly from the XML notion, and we have data model items that start their lives in parcel.xml files, then we'll probably want a document that explains the subtle differences.

Locating an item

There's a cluster of issues around uris, item paths, namespaces, parcels, the describes tag, containment paths for items, and the parcel directory structure.

A couple of requirements/constraints:

  • In parcel xml, and in python code, developers need some way to uniquely refer to items with a human readable name (vs a uuid). If we had better memories for hexidecimal, we might be able to get away with just using uuids; this is really a readability issue.
  • Items need to live somewhere in the repository, one and only one place to "put" the item. End users don't need to know this place.

Currently, the itemPath covers both of these needs. As a place where items live, it is a hierarchical structure based on the itemPath of the item's parent and the item's name. The repository itself does not decide where to put items, the client code needs to decide where to put each item as it is created.

More requirements/constraints:

  • We needed some static way to define items used by chandler, especially Kinds, Attributes, and various CPIA items that describe the layout of the application. Although we experimented with doing this in python, we ended up wanting these item definitions in xml.
  • We wanted some way to extend Chandler, and we wanted most of the application to be written using this extensibility mechanism.
  • Someone should be able to extend Chandler by dropping a folder (or zip file) into one location in the filesystem.

Currently, parcels (and parcel xml) covers both of these needs. A parcel is the unit of extensibility. A parcel contains items to add to the repository. Items might refer to python code that implements them. Parcels in particular and items in general might also refer to other resources (images, etc.) used by the application.

More requirements/constraints:

  • Given that we have parcel xml to statically define items to load into the repository at startup, we'd love for that parcel xml to be relatively clean/easy/nice for developers to read it and write it. We'd like to be able to define items using tags based on their Kinds and Attributes, not in some general way. It would be cool if this were true for any Kind added by any parcel, not just Kinds that are core to Chandler.
  • Parcels ought to be able to contain other parcels. Parcels need to live in the filesystem where they can be discovered by the application.
  • Parcels should be items, with attributes for parcel data just like any other item.
  • Developers shouldn't have to manage two separate "locations" for parcels and the items they contain.

These requirements led us to the proposal that:

  • We could unify the notion of xml "namespace" and an itemPath, and use this as the "location" of an item in the xml.
  • We could have one parcel per folder, which could then contain other parcels/folders in the filesystem
  • The parcel's location in the filesystem would map to the parcel item's location in the repository (as a well known location). A parcel would then be an item's parent, solving the question of "where do I put this item in the repository?"
  • Using xml namespaces, we could simplify the tags used to describe items, items could be described in parcel xml by their Kinds and Attributes.

So, where does this go wrong, cause problems?

  • We're now in a position where we're using repository itemPaths as uris, but they are not really uris. Perhaps this could be resolved by making the repository's itemPath a real uri (http://core/schema/Kind) or (chandler://core/schema/Kind).
  • We have a lot of code that looks up an item by its itemPath (or uri), and the repository was not really designed to make this a fast operation, afaik.
  • This idea of mapping the xml namespace to the directory structure feels sort of odd/problematic to xml folks. We needed to go add a "describes" tag for the xslt to work, for example. The "describes" tag feels redundant/not right to folks writing the xml (why am I specifying the parcel path 3 times in each file?)
  • Sometimes one parcel defines a lot of items, it feels like a lot of overhead to create a subdirectory/subparcel to be able to modularize sets of items. As we make parcels more heavyweight, subparcels in general might feel like a lot of overhead, even if parcel containership doesn't map to the directory structure.
  • Because we have generalized the parcel xml to use any Kind/Attribute as a tag, its not obvious how to write a validator for the xml. One option is to go back and use more general tags, as the pack xml does. We fear that solution will make xml an unworkable solution to our problem, though, too unweildly for developers to use. We could throw out the approach of using xml to define items, defining some shorthand format (like RDF's N3), or using python. We could perhaps be clever and generate validators based on Kinds an Attributes, which might require some adjustment to the parcel format.

Now that we're at the beginning of a release cycle and not the end, we can question some of our requirements, make adjustments that remove or alter constraints, and come up with new proposals that work better to solve these problems.

AutoKind, AutoItem and ad-hoc attributes

At some point, we made a simplifying decision to not support true ad-hoc attributes. We agreed we might need to revisit the decision if it didn't work out. Specifically, we mean that an Item can't have an attribute that its Kind does not know about. This brings up two problems, which we had solutions for:

  • In the UI, the user might want to add an attribute to an item on the fly. We decided that the apps code would add the Attribute definition to both the Item and the Kind behind the scenes. We've done no work on this feature yet, for other reasons (other things are more pressing, hasn't come up as a design team priority yet, etc.)
  • A programmer might want to create an item "informally" in Python, without having to go add an attribute to the Kind every time the programmer thought of a new instance variable. We agreed that one could have a class that handled that detail programmatically, adding the appropriate attribute to the Kind, reducing the work for the programmer. We had two attempts at this idea: AutoKind and AutoItem, neither worked out perfectly. Neither is actively used in the current code base.

Where do we go from here? We could take another pass at AutoKind/AutoItem. We could revisit ad-hoc attributes, Andi had a new idea about how we might implement them.

Python code corresponding to Kinds

We haven't given this problem much attention recently, but its probably worth really working out the idiom for writing Python code associated with a Kind.

Open issues:

  • The python code associated with a Kind still involves more boilerplate than is ideal
  • Folks have mentioned confusion about where to comment the Kind: in the python code or in the xml?

We have several earlier experiments where we defined the Kind in python (no xml) -- a couple of folks wanted to bring up this option again. A few lessons:

  • When we tried generating python classes on the fly from data, folks found this very confusing: no python code to go look at to see what was what
  • When we tried inferring the item from a Python class, folks found that confusing as well. Perhaps that's partially a result of not having a clear idea of what we wanted out of the repository at the time.
  • Figuring out a way to define Attributes as we have imagined them is a bit tricky. Attributes have an identity globally.

Global vs local attributes

Our notion of an attribute is a little different from instance variables in the OOP world, and this has caused confusion, especially when we make analogies between items/kinds and instances/classes. Attributes have definitions which exist as data independently from the Kind. Multiple kinds can use the same Attribute definition, which we think will ultimately be interesting/powerful, but we have yet to prove that. In particular, this comes up in the decision whether to make an attribute "local" or "global", and whether or not the parcel syntax is appropriate. Currently, a lot of people are placing their Attribute definitions as children of a Kind, to indicate that they are not meant to be re-used. (They could be reused, nothing separates them from "global" attributes other than their more obscure location.) When used this way, it feels redundant to have to wire up the Attribute to the Kind.

A couple of possible things to do:

  • Abandon our notion of global Attributes entirely, go back to the drawing board with Attributes
  • Embrace global Attributes, abandon our notion of "local" attributes, work on some use cases that show their benefit
  • Go a little further with our notion of "local" Attributes, assume that if an Attribute is the child of a Kind, the Kind does not need to delcare the Attribute as an "attribute" -- it will be assumed. This would simplify the parcel xml, for example.

Attribute inheritance and bidirectional references

Attribute inheritance was an idea we took from RDF. We haven't really made much use of it yet. When combined with bidirectional references, it starts to get confusing. Jeffrey Harris writes about a particular example (and a related known bug). Andi has a proposal for how we might deal with this issue. Brian has mentioned that we could consider not supporting attribute inheritance at all, perhaps it causes more complexity for not so much value.

Deletion semantics

We need to implement some form of garbage collection, otherwise it will be difficult or impossible to know when the last reference to an item is gone. Explicit deletion leads to dangling references which causes code to fail or requires awkward error checking.

  • Bidirectional reference deletion semantics doesn't work because of cyclic references.
  • Container deletion is incompatible with garbage collection since it deletes all contain items are respective of whether there are references to any of the items.
  • These problems could be solved by of garbage collection phase that happens during compaction.

Threading model

The current threading model, which gives each thread a different view of items, makes it difficult for us to run tasks in the background, e.g queries, receiving e-mail, or agents. If we need to commit from the U/I thread to view changes from other threads, then it impossible to use commit for undo.

Parcel issues

There are more parcel issues. We've tried to capture the ones that imply questions about the data model, but may have missed something. In particular, these issues may impact the data model:

  • parcel loading/unloading
  • adding/removing/changing values from parcel xml
  • better debugging support

Other

A few big issues that might be out of scope:

  • schema evolution
  • internationalization/localization

-- KatieCappsParlante - 10 Mar 2004
-- JohnAnderson - 10 Mar 2004


Discussion of the issues raised here might be best on the dev list, but you can add new issues or other comments below...

Discussion


Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r11 < r10 < r9 < r8 < r7 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.