r21 - 14 Aug 2006 - 16:08:00 - MorgenSagenYou are here: OSAF >  Projects Web  >  DevelopmentHome > ServicesWorkingGroup > SharingProject > SharingFormatDiscussion

Sharing Format

One of the 0.7 goals is to identify/implement an 'external serialized representation' (sharing format) for use by Chandler, Cosmo, Scooby (and any other clients hopefully). This wiki page is meant to be the official record of requirements, proposals, and decisions. I'm in the middle of pulling info from various discussions and organizing them here:

Requirements

(This list contains everything anyone has mentioned, and needs to be triaged)

  1. We must be able to convert the sharing format to and from the external information model form
  2. Conversion between sharing format and EIM form follows an unchanging algorithm -- i.e., if the sharing schema changes, that will not require a programmer modify the sharing format conversion code.
  3. The desktop client communicates with the sharing server using a protocol/format that supports a "rich sharing format"
    • reasonable performance characteristics
    • extensible format to support new data types
    • supports data model features: stamping, items in multiple collections, bidirectional references
  4. The desktop client and sharing server use standard protocols and data formats where possible/sensible (avoid reinventing the wheel). Options include:
    • tweak CalDAV
    • build on DAV (RDF format as morgen proposed, or another format)
    • HTTP, Atom, some other solution?
  5. The sharing server implements a variety of protocols to facilitate inter-operation with other clients
    • CalDAV
    • iCal style WebDAV + ics files
    • WebDAV + other standard formats (vcard, etc.)
    • Atom
    • RSS
  6. The web client supports the same "rich data model"
    • stamping
    • items in multiple collections
    • bidirectional refs
    • extensible to support new types
  7. The desktop client implements other protocols to allow it to inter-operate with other servers
    • CalDAV, iCal style WebDAV + ics
    • functionality may be less rich
    • the desktop client can speak to other servers using these protocols, but doesn't necessarily speak to Cosmo this way, and Cosmo handles interop with other clients
  8. The web client communicates with the sharing service using the same "rich data format" protocol as the desktop client
  9. The web client implements other protocols to allow it to inter-operate with other servers
    • CalDAV
    • iCal style WebDAV + ics
  10. Decouple sharing format from Chandler domain model
  11. Avoid multiple resources to represent an item (avoid consistency problems)
  12. Miminize storage for a given shared item
  13. Performance
    • be able to tell what needs to be fetched
    • just send diffs
    • only send what's new (not duplicating info)
  14. Don't require that what is sent over the wire is what gets stored.
  15. Allow conflict resolution to happen where it needs to, possibly at a higher level
  16. Server side change notifications (rather than client polling)
  17. End user change notifications from server
  18. Transactions (ACID)


Questions

  1. How do we deal with 'secondary' items? Are they separate 'resources' on the server, or do they get embedded within the 'primary' items' resource? If the latter, do we live with the duplication of 'secondary' items that are associated with multiple 'primary' items? For example, the organizer and invitees of an Event item (primary), are Contact items (secondary). It's likely that a given Contact item will be associated with many Events, so do we devote separate server-side resources for each Contact? Answer: People agree that duplication is bad and we will have separate resources per item eventually. For phase one, we'll keep secondary items in the same resource as the primary item.
  2. How do we support sending only diffs, including attribute deletions? Asked another way, do the bodies that we send to and from the server contain "verbs" ("add" this new item, "delete" this item", "set" this attribute value, "remove" this attribute value), or does the mere presence (and disappearance) of a resource on the server indicate creation (and deletion) of items, and does the lack of an attribute value in the resource body indicate removal of the attribute?
  3. Do we want to include info about who made a change (and when)?
  4. How do we represent item deletions (tombstones)?
  5. Do we want to use XML?


Schema evolution and its relationship with sharing format

Regardless of what actual format we choose, the data it expresses will be of a given schema. That schema will have some unique identifier and version, and will define the namespaces and attribute names being used to serialize the data. Once data has been shared using a certain schema, any client sharing that data will continue to use that schema even though newer a schema may have been released and installed in the client. That way clients participating in this share who have not upgraded can still participate. At some point, when "enough" particpating clients have upgraded to thelatest schema, the shared data can be replaced using the new schema. Part of the publish/subscribe initialization will involve choosing which schema version to use.

A more advanced approach would be for the sharing server to serve up data in whatever schema the client asks for.


Format Choices

Proposal 1A: RDF "Triples"

Morgen Sagen: "The format could be as simple as a series of Subject-Predicate-Object statements (aka 'triples'), where the subjects are UUIDs, predicates are namespace-qualified attribute names, and objects are either literal values or UUIDs. See RDF Primer , RDF Concepts , RDF Triples and N-Triples"

Proposal 1B: RDF-XML

Morgen Sagen: "Same idea as 1A except expressed in XML. See RDF/XML Spec"

Proposal 2: GData

Morgen Sagen: "Use Google's format. See GData"

pje: "It's interesting, but it doesn't have a uniform or elementary information model. Notice, for example, the embedded iCalendar data in gd:recurrence. I agree that being able to share in this format seems useful for interoperability purposes, but it doesn't appear to solve our other issues. (Note also the idiosyncratic overlap in semantics between gd:recurrence and gd:recurrenceException.)"

Proposal 3: ICS++

Lisa Dusseault: "I think it's feasible for Chandler to achieve its sharing of rich data even with CalDAV servers that aren't Cosmo. (Oracle's CalDAV server is shipping this year, and RPI's CalDAV server is already in use in a couple places, so this isn't only a theoretical benefit). There are a number of ways of approaching this:
  • Raw XML data could be put in an inline attachment in the iCalendar resource -- simple, but with the slight drawback of behavior in other clients likely to display the XML object as an attachment
  • Translate OSAF attributes into iCalendar extension attributes e.g. "X-OSAF-TRIAGE-STATUS=Now". This is not much more complicated than storing the same data in XML anyway, and has the advantage of ideal behavior with non-Chandler clients -- if they don't understand the iCalendar extension elements they ignore them , but if they do decide to adopt triage status or some other property we define, it's possible. Even stamping attributes can be stored this way.

iCalendar and CalDAV are more extensible than they're given credit for, and we haven't really had a problem solving session to overcome any potential barriers."

Morgen: "This works for calendar events, but that's not always going to be what we share. Too calendar-centric."

Proposal 4: SSE

Interesting, but it's not practical for the server to subscribe to each client's feed(s). We'd likely have to have each syncing client 'push' their feed to the server occasionally.

Proposal 5A: "Nested" XML representation of External Information Model records

The External Information Model maps items into records with each record type uniquely identified by a namespace. We could XMLify those records using element names based on the record field names.

Has duplication of 'secondary' items, not to mention how do we express an item that has a reference back to itself?:

<item xmlns="http://schemas.osafoundation.org/sharingformat/1"
      xmlns:ci="http://schemas.osafoundation.org/pim/contentitem"
      xmlns:ta="http://schemas.osafoundation.org/pim/contentitem/tags"
      xmlns:ct="http://schemas.osafoundation.org/pim/contact" >

    <kind>Note</kind>  ?? Suggestions on how to indicate kind(s) (plural because of stamping)

    <ci:uuid>1</ci:uuid>
    <ci:title>Example note</ci:title>
    <ci:body>Example body</ci:body>
    <ci:createdOn>2006-07-13 12:26:00-07:00</ci:createdOn>
    <ci:lastModifiedBy>
        <item>
            <kind>Contact</kind>
            <ci:uuid>4</ci:uuid>
            <ct:emailAddress>morgen@example.com</ct:emailAddress>
            <ct:firstName>Morgen</ct:firstName>
            <ct:lastName>Sagen</ct:lastName>
        </item>
    </ci:lastModifiedBy>
    <ta:tags>
        <item>
            <kind>Tag</kind>
            <ci:uuid>2</ci:uuid>
            <ci:title>Work</ci:title>
            <ci:createdOn>2006-07-13 12:28:00-07:00</ci:createdOn>
            <ci:lastModifiedBy>
                <item>
                    <kind>Contact</kind>
                    <ci:uuid>4</ci:uuid>
                    <ct:emailAddress>morgen@example.com</ct:emailAddress>
                    <ct:firstName>Morgen</ct:firstName>
                    <ct:lastName>Sagen</ct:lastName>
                </item>
            </ci:lastModifiedBy>
        </item>
        <item>
            <kind>Tag</kind>
            <ci:uuid>3</ci:uuid>
            <ci:title>Sharing</ci:title>
            <ci:createdOn>2006-07-13 12:29:00-07:00</ci:createdOn>
            <ci:lastModifiedBy>
                <item>
                    <kind>Contact</kind>
                    <ci:uuid>4</ci:uuid>
                    <ct:emailAddress>morgen@example.com</ct:emailAddress>
                    <ct:firstName>Morgen</ct:firstName>
                    <ct:lastName>Sagen</ct:lastName>
                </item>
            </ci:lastModifiedBy>
        </item>
    </ta:tags>
</item>

Proposal 5B: "Flattened" XML representation of External Information Model records

The External Information Model maps items into records with each record type uniquely identified by a namespace. We could XMLify those records using element names based on the record field names.
<?xml version="1.0" encoding="UTF-8"?>

<items xmlns="http://schemas.osafoundation.org/sharingformat/1"
    xmlns:con="http://schemas.osafoundation.org/pim/contentitem"
    xmlns:cal="http://schemas.osafoundation.org/pim/contentitem/calendar"
    xmlns:tag="http://schemas.osafoundation.org/pim/contentitem/tags"
    xmlns:pho="http://schemas.osafoundation.org/photo"
    xmlns:cta="http://schemas.osafoundation.org/pim/contact" >

    <item primary="True" uuid="1"> <!-- In this example I used simple uuid values for readability, but real uuids will be in RFC 4122 form, e.g. 611fcf54-296e-11db-b36c-bc8a258a92d5 -->

        <kinds>
            <con:Note/>
            <cal:Event/>
        </kinds>

        <con:title>Example note</con:title> <!-- string -->

        <con:body mimetype="text/plain" encoding="utf-8">VGhpbmdzIGdldCBkYW1hZ2VkLCB0aGluZ3MgZ2V0IGJyb2tlbg==</con:body> <!-- Lob -->

        <pho:photoBody mimetype="image/jpeg">/9j/4AAQSkZJRgABAgEASABIAAD/4QndRXhpZgAATU0AKgAAAAgABwESAAMAAA==</pho:photoBody> <!-- Lob -->

        <con:createdOn>2006-08-08 9:50:58.432510 US/Pacific</con:createdOn>

        <con:lastModifiedBy><item ref="4"/></con:lastModifiedBy>

        <tag:tags> <!-- ref collection -->
            <list>
                <value><item ref="2"/></value>
                <value><item ref="3"/></value>
            </list>
        </tag:tags>

        <con:foo>
            <list>
                <value>abc</value>
                <value>def</value>
            </list>
        </con:foo>

        <con:bar>
            <dict>
                <value key="red">abc</value>
                <value key="white">def</value>
                <value key="blue">ghi</value>
            </dict>
        </con:bar>

    </item>

    <item uuid="2">
        <kinds>
            <tag:Tag/>
        </kinds>
        <con:title>Work</con:title>
        <con:createdOn>2006-08-08 9:50:58.432510 US/Pacific</con:createdOn>
        <con:lastModifiedBy><item ref="4"/></con:lastModifiedBy>
    </item>

    <item uuid="3">
        <kinds>
            <tag:Tag/>
        </kinds>
        <con:title>Sharing</con:title>
        <con:createdOn>2006-08-08 9:50:58.432510 US/Pacific</con:createdOn>
        <con:lastModifiedBy><item ref="4"/></con:lastModifiedBy>
    </item>

    <item uuid="4">
        <kinds>
            <cta:Contact/>
        </kinds>
        <con:createdOn>2006-08-08 9:50:58.432510 US/Pacific</con:createdOn>
        <cta:emailAddress>morgen@example.com</cta:emailAddress>
        <cta:firstName>Morgen</cta:firstName>
        <cta:lastName>Sagen</cta:lastName>
    </item>

</items>


Protocol Choices

WebDAV

What we have implemented today: a Chandler item collection maps to a DAV collection on the server; items contained in the item collection are each represented on the server by a resource. To perform a sync, the client PROPFINDs the collection to get the list of ETAGs for comparison, downloading the changed resources; next, any locally changed items are published to the server. Merging and conflict resolution done completely client-side. Currently no support for "diffs" -- entire resources are always transferred. 'Secondary' items -- those items being published yet aren't members of the collection itself -- are included in the 'primary' items' resource bodies, which means there can be much duplication.

CalDAV

Basically the same as WebDAV, above. However, in order to support data types beyond calendar events, we create a 'subcollection' to contain XML resources, while the main calendar collection contains ICS resources. This is problematic since we represent a single item as two DAV resources yet we don't have atomic access to them.

Atom

I'm just starting to look into this -- hopefully others with more Atom experience can provide more details. I think you still have to transmit entire items, not just diffs (although bcm mentioned that we could extend it to allow sending of only the changed attributes). Cosmo is getting Atom support, see CosmoAtomProposal , CosmoFeedService , CosmoFeedRequirements , CosmoFeedDesign.

From bcm: "atom is simpler to implement. a client needs only to speak basic http to the server. atom publishing requires an xml parser as well, so that the client can inspect a collection's "introspection document" (much like a directory index). there is no notion of locking or resource properties with atom.

caldav requires locking and has explicit provisions for sophisticated reporting. with atom, one can formulate query strings for GET requests that express the same parameterization as caldav reports, but the names of query parameters and so forth are not standardized. gdata (a specialization of atom) standardizes a small subset of the caldav report options.

atom and webdav are not aware/don't care about the specific types of content they are pushing around (atom goes a step further than webdav though, in building in provisions for servers to advertise alternate representations for a resource). caldav and gdata can be viewed as content-type-aware specializations of the base protocols.

the strategy that i favor is to reuse the atom support we're building into cosmo for sharing within the ecosystem. cosmo would provide "sharing format" (whatever that turns out to be - i don't have a strong opinion) representations of collections and resources. we can define query parameters that, when sent to collections, return etags or diffs for resources changed since a certain timestamp or revision number. similarly, we can define parameters or media types that, when included with a POST to a collection or resource, specify that the request content contains diffs rather than full resources. some of the other sharing requirements - notifications, conflict resolution - we'd have to think about in more detail."

RSS


Links

-- MorgenSagen - 05 Jul 2006

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r21 < r20 < r19 < r18 < r17 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.