pick which features we want to incorporate, and which to leave out
Highlights of what we talked about
parcel programmers vs. "baseball fans"
One of the big themes of today's meeting was the idea that there are two distinct types of people who use schema information:
parcel programmers == These are people like the OSAF engineers, who are writing parcels like the OSAF calendar parcel and the OSAF e-mail parcel. Also third-party programmers who are writing their own parcels.
"baseball fans" == These are end-users. But not the users of task-specific parcels like the calendar parcel. Rather, just users who are using the general purpose info management tools, like the SuperWidget?.
Similar needs:
parcel programmers create schemas to represent domain-specific info, like calendar appointments, tasks, etc.
"baseball fans" create schemas to represent domain-specific info, like baseball teams, baseball players, baseball games, etc.
both types of people are happy to think about their domains in terms of items, and attributes, and kinds of item
Markedly different needs:
parcel programmers need fixed schemas, strong typing, and guarenteed enforcement of schema restrictions
"baseball fans" need flexible schemas, weak typing, and the ability to easily make ad-hoc changes
Different backgrounds:
parcel programmers are used to thinking about data modeling features from object-oriented programming languages -- inheritance, pointers, etc.
"baseball fans" are used to keeping their records in tools like Excel, or Filemaker Pro, or Access -- which offer slightly different data modeling features
schema changes
parcel programmers may make changes to the schema, but the changes will be carefully planned, and the new parcel code will be written so that it (a) can do data conversion on existing items, and (b) can deal with running into old-format data
"baseball fans" will also change the schema information, by doing things like adding attributes to a kind, or deleting attributes, or changing their names. Those changes will impact items that already exist. But baseball fans won't write data conversion routines, so Chandler will have to deal gracefully with that.
in both types of use case, there will be some thorny issues. once we have a data model design that describes what schemas look like, then we need to cycle back and think about what types of changes we will allow people to make to their schemas, and what we need to do to support those schema changes.
attribute inheritance via value inheritance
In yesterday's meeting we talked about "attribute inheritance" vs. "value inheritance". Today Andi pointed out that value inheritance could actually be used to implement attribute inheritance, by having Kind instances value-inherit their "Attribute Definition" values from across the super-kind Item-Ref.
A further optimization would be to cache, or "copy down" the Attribute Definitions in the Kind instance that inherits them.
Brian suggested that if you view value inheritance as a type of derivation rule, then the "copy down" optimization is an example of caching derivated values, which may be generally important for indexing.
lifecycle events
item instances go through lifecycle events, like instance creation, instance deletion, and instance cloning
Item-Refs can have definitions that dictate how to handle these lifecycle events -- when to do deep copies and when to do shallow copies -- whether to include sub-items in a delete operation
parcel programmers will think through these issues carefully, and may make schemas with carefully chosen Item-Ref settings
"baseball fans" will want to easily create new schemas, using simple default Item-Ref definitions that behave in simple, predictable ways
"domain attributes" vs. "house-keeping attributes"
any given item will have both domain attributes and house-keeping attributes
domain attributes are things that the end-user cares about, like a baseball player's "name", or "age", or "batting average"
house-keeping attributes are things that the chandler infrastructure code cares about, like "last-modified" time, or "version" number, or a "logically deleted" flag
domain attributes should always be visible to the user
house-keeping attributes may frequently be invisible to the user, although in some cases the user might want to be able to look at them (e.g. "creation date")
probably users should never be able to edit house-keeping attributes directly
"display names" vs. "identifier names"
an attribute definition can have a display name, which is what the end-user sees -- e.g. "Start Time"
an attribute definition can have an identifier name, which is something that might appear in Python code -- e.g. "startTime"
terminology
we settled on the following terminology:
reserved words:
"Item" -- a bunch of attribute values -- pretty much everything is an item -- e.g. "Lunch with Pat"
"Attribute Definition" -- e.g. "Start Time"
"Kind" -- a category of items -- a Kind has a set of Attribute Definitions -- e.g. "Calendar Appointment"
"Item-Ref" -- a reference from one Item to another -- e.g. "Employees<-->Department"
"Domain Schema" -- a set of Kinds and global Attribute Definitions -- e.g. the "Baseball Schema" or the "Chandler PIM Schema"
non-reserved words:
"thing" -- no special meaning in Chandler -- just another fuzzy English langauge word
"schema" -- no special meaning in Chandler -- just another fuzzy English langauge word
Andi raised the point that we're using the term "Item" both down in the "Building Blocks" layer and up in the higher level layers (actually all the way up to end-user terminology). Are the Building Block "Items" the same thing as higher level "Items"? If not, then we should probably have different terms to distinguish them.
"global attributes" vs. "local attributes"
we resolved to support "global attributes", shared between Kinds
we resolved to also support "local attributes", specific to a single Kind
"sub-attributes"
we talked about RDF's idea of sub-attributes
we resolved not to think too hard about this right now, on the theory that it should be something that could be added after 1.0 without breaking anything (although we did note that it might be difficult to write database code that would be efficient when processing queries on a super-attribute)
diagram
We settled on diagram showing how the schema info is organized. I don't have a good way to reproduce the diagram here, but here's what it shows:
A Domain Schema item has a collection of Kind items
A Domain Schema item has a collection of Attribute Definition items, representing global attributes
A particular data item (e.g. "Lunch with Pat") has a defining Kind item
A Kind item has a collection of Attribute Definition items
some of those Attribute Definition items may be local to this Kind item
some of those Attribute Definition items may be global attributes that are in the collection of Attribute Definition items pointed to by the Domain Schema item that is pointed to by the Kind item
some of those Attribute Definition items may be "imported" global attributes that are in the collection of Attribute Definition items pointed to by some unrelated Domain Schema item
An Attribute Definition item may be pointed to by more than one Kind item
An Attribute Definition item may be pointed to by at most one Domain Schema item
Attribute Definitions vs. Attribute Bindings
When a Kind item includes an Attribute Definition item, the Kind item uses all the general information defined in the Attribute Definition item, which is shared by all the Kind items that use the Attribute Definition item
In addition, there may be some specific information particular to the use of the Attribute Definition item in this specific Kind item -- information that not's associated with the Attribute Definition item itself, but with the binding of the Attribute Definition item to the Kind item
Here's a breakdown of what we decided about what information should be associated with the Attribute Definition item and what should be associated with the binding
Attribute Definition info
"type"
could be something like int, float, string, date...
could be a specific sort of Item-Ref
could be "Any", meaning any of the above
"one vs. many"
this is really "cardinality" info -- but we want to be clear that we're only offering the two choices, "one" and "many", rather than more complicated things like "4" or "6 to 8"
defaults to "many" when a "baseball fan" creates a new Attribute Definition
"identifier name"
used as a python token -- e.g. "startTime"
"display name"
appears in the UI -- e.g. "Start Time"
can be a simple ASCII string, or a Unicode string, or a "Polyglot string" (meaning a dictionary of localized string translations, keyed by langauge)
Attribute Binding info
"required"
a boolean value -- means the same as "not null" -- the attribute must be included in every instance, and must always have a value
There are a few different options for storing Attribute Binding info:
We could have separate Attribute Binding items between a Kind item and an Attribute Definition item
We could somehow associate it with the Item-Ref that relates a Kind item to an Attribute Definition item
We could somehow associate it with the the (attribute of the Kind item) that points to the Item-Ref that points to the Attribute Definition item
We might be able to do this just by using our "Compound Attribute" idea
We didn't pick which option we want -- we'll cross that bridge when we come to it
"Emergent" typing for kinds vs. Declarative typing for kinds
resolved to pick declarative typing for kinds
we won't provide any direct support for "emergent" typing, although a third-party parcel developer could write a parcel that had this feature
Game plan coming out
John, Andi, Katie, Brian
follow-up meetings next week to keep deciding what features should be provided by the Data Model