pick which features we want to incorporate, and which we want to leave out
Highlights of what we talked about
Item-Refs
Why have an ItemRef, a separate object? Why not just capture the two endpoints on an Item?
you can index ItemRefs
you can add meta data to ItemRefs (weight is an example)
you could add a links-to-links feature, where one endpoint of an ItemRef is another ItemRef
for our "policy" bits (e.g. delete policy & copy policy), those bits might be stored on the ItemRef itself, rather than on the endpoints or on the corresponding Attribute Definitions
cycles
We agreed that its ok to have cycles in the schema, the repository will be able to handle it.
We agreed that the repository will have a compaction phase
If we use berkeleydb, the phase will be needed for deletion
It could compress a copy for backup
Cycles could be detected at this phase
Andi expressed reservation at the wisdom of having cycles in a schema design. We agreed to consider this when we have a proposed design, and examine the design on its merits at that time.
containers
We asked the question again, "what are containers for? why not use an ItemRef??"
containers are a way to make all items reachable (via some "root" item)
every item needs to be in a container (unless it is the root)
we add no semantic meaning to "container", its for housekeeping
containers may need to be hardwired for bootstrapping
containers might be a primitive for indexing
if every item is in a container, every item is reachable for debugging
We noted again that containers are a facility for the developer, not a facility that the end user need be aware of.
one-to-many and many-to-many relationships
we will have support for both one-to-many and many-to-many relationships
these relationships can be represented by attributes that point to lists of ItemRefs
unilateral relationships
use case where you might want this: an item needs to know its kind, but a kind doesn't want to know about every item.
instead of a full ItemRef, the item could reference the kind via the containment path or uuid
the containment path has the problem that it could move, so uuid is preferrable
the kind/item is a special case (fundamental because it is so common), but similar desired functionality could be required in other cases
should a "unilateral relationship" be a first class concept in our data model?
pro: would be useful to specify unidirectional relationship and say something about the domain/range of the relationship. Useful for introspection. Useful for debugging. Possibly useful for constraint enforcement.
con: proliferation of first class concepts complicates the repository. Also, more difficult to enforce range or other restrictions on a unilateral relationship.
pro: without it, you can't have a self describing schema of schemas
resolved to leave this as an open issue for now.
remote relationships
questions:
do item-refs work across repositories?
does garbage collection work across repositories?
how can items in different repositories be related?
answers:
trying to get item-refs and garbage collection to work across repositories constitutes a research problem, which we want to avoid.
which should only have simple, low-tech mechanisms for cross-repository relationships
how do we refer to a repository?
we might have local repository items that represent remote repositories
items might refer to a repository by the repository items, or by having a url that points to the remote repository
resolved:
we will have a separate data type to represent an "external reference" -- an external reference is a weak reference to an item in a remote repository
when you subscribe to an item instance in a remote repository, you get a local copy of the remote item
you can view items in other people's repositories, but if you're just viewing (not subscribing) then those items never get stored in your repository
it would be useful to walk through a real use case for this (e.g. what does the data structure look like when two people have scheduled an appointment together across repositories?) we'll plan to get together again later to walk through a use case.
authorization and permissions
once we hire a security person then we'll take direction from them about how our overall authorization system should be designed
in any case, authorization needs to be enforced on the server side -- code on the client never even gets to see any items or attributes that the user doesn't have permission to see
therefore, the database will need to know about the permissions schema in order to enforce the permissions
authorization might be handled using "capabilities" or "ACLs"
resolved: come back to this once we have a security person
indexes for queries
how do you decide what gets indexed? Sometimes a specific context in the application will require an "index". An alternative is to let the user specify that something should be indexed for faster search. Perhaps in a domain schema (like the Chandler PIM schema), attribute definitions might be marked with a special flag to indicate that they should be indexed.
we could imagine having the repository try to observe queries and heuristically index, but that would be a research problem, so we're not going to do that
full text indexes are a special case. Full text indexes will live with the data. We hope to use an existing 3rd party one, but may run into problems, especially getting it to work well with transactions. John's confident it wouldn't be too hard to roll our own if we needed to.
when a transaction fails (or on rollback), the index needs to be rolled back
when an item is deleted, the entries for it need to be removed from the index
queries
For a query, we often want specify the set of things to look through, and a filter on that set. The key is to narrow the set to be smaller than the whole repository.
Kind-specific queries
Most queries/searches are done in some context, and you can usually limit the set based on kinds. For example, a query might look for all the e-mail from a given person, so then we only have to look at items of kind e-mail, not all items. And calendar views generally only look at calendar events.
Multi-kind queries
In some cases, queries span different kinds. A calendar view might be used to view e-mail messages. And a generic table view might be used to look at lots of different kinds of items side by side.
Owners
The notion of an "owner" seems important for limiting queries. Many items will be owned by one user -- e.g. I own my e-mail. For lots of common queries a user will just want to search through stuff they own -- e.g. I mostly just want to see my calendar, my e-mail, my tasks.
Most queries will only need to search over the set of InformationItems. InformationItems are things like Email, contacts, appointments, tasks, notes. The set of Items is more general, and might include "system info" items, like items representing parcels, agents, repositories, queries, etc.
A query may also need to specify some sort of grouping/sorting behavior. Queries don't just return result sets, they return the result set pre-sorted, so that client code can get just a partial result set and then start interating over the entire result set as needed by the UI.
resolutions
Unlike SQL, we're not doing "joins"
we need to support some kind of programmatic API for submitting a query -- e.g. using some data structure to represent a query
we don't want to have a text-based query language that requires parsing