r15 - 07 Feb 2006 - 17:52:49 - LisaDusseaultYou are here: OSAF >  Projects Web  >  DevelopmentHome > ContentModelInteroperationProject > InteroperationExperimentsProposal

Proposal for Content Model Interoperation Experiments with non-native formats

Getting Started

This is meant to be a launching point for discussion about import/export from non-Chandler applications. Please comment on the dev list or edit this page directly if you see something you'd like to change.

Note that this page isn't meant to cover Chandler to Chandler import/export.

Import and export

Chandler should be able to import and export data to a variety of standard formats. Because the RepositoryFramework makes possible complicated relationships, often when users export to a standard format they'll lose some Chandler-specific data. At least until the world adopts Chandler's feature set! Any data that won't be accurately exported should be reported to the user.

Chandler users should be able to import most of their data from email and calendaring applications without any loss of data. If there is any data lost in translation, Chandler should give detailed user feedback about what data can't be successfully imported.

Usage patterns

Import and export will be used to migrate data into and out of Chandler. It may also be used to share data with non-Chandler users.

Generally, import will mean reading in standard data files and creating appropriate entries in the repository. Export will mean taking a collection of items in the repository and converting them to a standard data format. Are there usage patterns that might require significantly different functionality?

Data Types

Data that should be a priority to import/export:

  • Stored email messages
  • Contact lists and address books
  • Calendar information
  • Task Lists

Data that probably isn't a priority to import/export:

  • Account settings

Are Notes a priority?

Major data formats

Mailbox formats

  • dbx (also called .mbx) - Outlook Express' mailbox format. dbxconv is a GPLed utility to convert Outlook Express mailboxes to standard mbox
  • mbox - The UNIX standard format, widely used on most platforms
  • Maildir - A slightly less common UNIX standard mailbox format which stores each message as a separate file. Easy to convert mbox <-> Maildir
  • PST - Outlook's storage format. ol2mbox is a GPLed project to convert PST and dbx files to mbox

Contact formats

  • vCard - industry standard format for contact data
  • CSV - While many applications will import/export address books to CSV (comma separated values) files, there doesn't seem to be total standardization on how this should be done
  • LDIF - LDAP Data Interchange Format. Ducky suggests this may have similar problems to CSV in terms of standardization

Calendar/Task list formats

  • iCalendar - industry standard for calendar data
  • vCalendar - old standard for calendar data, still used by some apps
  • CSV - Outlook can import and export its calendar information fairly cleanly to CSV files

Compatibility charts

These charts are very preliminary, based on experience and a bit of web research.

Different application often have subtle (or not so subtle) implementation differences, so a lot of testing will be needed. With that said, here are various popular applications and the formats they can import and export.

Can Import vCard LDIF CSV Address Book   mbox   iCalendar CSV Calendar
Eudora one at a time yes     sort of      
Evolution yes yes     yes   yes  
Mozilla one at a time yes yes   yes      
Mail.App/OS-X Address Book yes yes     yes      
Outlook one at a time   yes   sort of   one at a time yes
Outlook Express one at a time yes yes   sort of      

one at a time means that the application can import the file if it's an attachment to an email, which gets old fast with a big list of contacts.

sort of for Outlook, Outlook Express, and Eudora mbox import means that the application wants to import from a specific program, there are reported difficulties getting import to actually work. For Eudora you can rename mbox folders to have a .mbx extension and Eudora will read them, but attachments will be left inline.

Can Export vCard LDIF CSV Address Book   mbox   iCalendar CSV Calendar
Eudora     yes   almost, not quite      
Evolution yes       yes   yes  
Mozilla   yes yes   yes      
Mail.App/OS-X Address Book yes yes     yes      
Outlook ?   yes       a few at a time, only Calendar yes
Outlook Express no?   yes          

Eudora exports mbox almost, not quite because Eudora's native format is almost mbox, but not quite, attachments are removed and line feeds might not be right. Eudora2UNIX is a GPLed utility to take Eudora folders the last step.

Outlook (Outlook 2000, at least) is a few at a time because, while it exports to iCalendar by selecting calendar information then selecting Actions > Forward as iCalendar, the only way to select multiple events is to control-click, control-A doesn't select all. So it seems you have to manually click each event.

It seems likely that import/export filters will need to be created for all of the above formats. We may also want to directly import Eudora address book files, the syntax is very simple.

Are there other formats we should target?

What formats do the major PDAs speak?

License issues

GPLed utilities for format conversion are out there. We might want to not reinvent the wheel, and if we have problems contribute our code back to existing data conversion projects. But there might be licensing issues.

Getting mail import right

Doing import/export well in general is important, getting mail import right is essential. If a user imports their mail and it doesn't work quite right, there's a good chance they'll give up on Chandler.

Potential problems with mail import:

  • Some folks store messages they sent in the same folder as the messages they received on a particular topic. Getting Chandler to know about this might be tricky

What else?

Incompatible strings

Some Chandler data may contain strings that aren't allowed in a particular format. Export routines should have a well-defined way of converting incompatible strings to compatible strings. This may be challenging for formats that aren't unicode compliant.

Testing

For any validly formatted data file foo, export(import(foo)) should equal foo. Unit tests should be created to make sure this is true for sample data. Creating data with this in mind should be part of the SampleDataProject2004.

Interoperation

Chandler should strive to follow published formats closely. Other applications, however, may output non-standard files or behave in unexpected ways when importing standard files. Is interoperation with buggy applications a priority?

Proposed Priorities

  1. Implement a CSV address book format because it's relatively easy
  2. Get mbox import/export working reasonably well
  3. Implement vCard
  4. Implement LDIF
  5. Implement iCalendar

-- JeffreyHarris - 11 Feb to 13 Feb 2004



Comments Welcome


Goals

  • Eventually it will be important for Chandler to have full-featured import/export facilities available to the end-user. But that's not something we need right away. For 1.0 it's probably okay to only have modest import/export features, and we can gradually add more as we go.
  • What's important right now, in the 0.4/0.5 timeframe, is to put some work into getting a reality check about our content model. We need to validate our work on the content model, and make sure that we're not making any content model decisions that would "paint us into a corner". The content model doesn't have to perfect to start, but we want to make sure that we're on a path towards a design that will enable Chandler to interoperate smoothly with other apps. For the 0.4/0.5 timeframe, the main goals for the import/export project should be to identify problem areas in the content model, and come up with questions and suggestions and open issues about changes or additions we might want to make to the content model in order to interoperate well with other apps.
  • In the long run, there are several types of "interoperation" we will want the content model to support. Import and export are the most basic types of interoperation. We will also want to support one or more types of live sync feature, probably using SyncML, or something like it. Import and export will usually be infrequent operations -- for example, I might only import data once, when I first become a Chandler user. If the import process munges a few a little things (like the last-modified date of a contact), those are things I can live with, or fix by hand. But in contrast, if I use the sync feature twice a day to keep Chandler in sync with my PDA, then I need that sync process to work quite smoothly, so it's important that we design the Chandler content model to be compatible with other apps.

File formats

  • I like the easy-to-read tables about file formats. I assume CSV means "comma separated values", but I'm not sure what LDIF is. It might be good to have links to glossary entries for LDIF and CSV, and maybe vCard and iCalendar as well.

  • In the tables, right now there are an initial six apps (interoperation platforms):
    • Eudora
    • Evolution
    • Mozilla
    • OS-X Address Book
    • Outlook
    • Outlook Express
  • Eventually, we should also consider a number of other interoperation platforms:
    • Palm & Palm desktop
    • other PDAs and cells phones (PocketPC?, RIM, etc.)
    • Windows Longhorn
  • We should prioritize those interoperation platforms based on some criteria, like market share or number of users. Our effort on the import/export work should be very much focused on the specific platforms that we've identified as being most important. Here's a page with pointers to previous OSAF work on this topic: ChandlerEcosystem20030808 -- also, Chao may have some sense of the market share of different platforms, and he may have some good intuition about which platforms to put the most effort into.

Interoperation

  • Q: In the Interoperation section, it says "Chandler should strive to follow published formats closely. Other applications, however, may output non-standard files or behave in unexpected ways when importing standard files. Is interoperation with buggy applications a priority?"
  • A: My two cents: I think that defacto interoperation is more important than following the published standards. There may be some vCard features that nobody uses, and that we can safely ignore. And there may be some "non-standard" Outlook fields that a million people store data in, and that we need to be able to interoperate with even though they don't adhere to a published standard.
  • So, practically speaking, I think the best place to start is to collect a motley assortment of real-world data files exported from various apps -- and only look at the published standards later, if needed, to help understand the exported data files.
  • At some point it might be useful to post something on the dev list, asking people what platforms they use now, and asking for small sample export files in various formats. Might be a good reality check to see all the different versions of stuff that people are actually using. And it might be good to get a real-world sampling of what sort of fields/features/formats people actually use.


A couple of things:

  • Before you go crazy on supporting data formats, make sure you expose a good API for interested parties. smile Check with third-parties to see who's interested in writing specific import/export filters.
  • Be sure to support Notes/textual items that didn't happen to arive on the email transport. I.e. Notes should be a priority. At a technical level, it shouldn't be much different than supporting email bodies.

-- MicahDubinko - 16 Feb 2004


This is a top-of-the-head reaction that I'm throwing in for consideration; I'm not sure that I believe it completely myself:

There should be a text-based format (probably XML-based) that is capable of capturing all the data types and relationships in the repository (i.e., it should be possible to "dump" and "restore" a repository to/from this format without losing anything). Some arguments for this:

  • True persistence: there's a good chance that information in a text-based format will be readable in places and times where Chandler doesn't (any longer) exist.
  • Interlingua: it could provide a common source/target for translation from/to other formats, independent of Chandler code or versions; standard tools of the XSLT variety could be used to do the transformations.
  • Versioning: it would provide a crude facility for keeping versions of a repository. It might also be possible to cobble up a "diff" tool that would make the differences between versions humanly readable.

-- DonDwiggins - 17 Feb 2004


  • Looks like a great start
  • I'd emphasisze Brian's point that the first priority is to get data as a reality check for the content model
  • We probably want to deprioritize mail right now, for this project. We're going to hire a full time engineer to own mail in general, including the content model and what libraries to use. We have enough open issues still that its not yet a good use of time to get started on the import/export feature for mail yet.

-- KatieCappsParlante - 17 Feb 2004


  • This is a good list of issues.
  • I agree with use of XML and/or XSLT. XML export provides the opportunity for many to use their own transforms if they want to.
  • While I really see a need for Palm support, I do not think it is worthwhile to try to export to vCalendar format directly, for two reasons: first, you can go to an XML representation that can later be transformed to vCalendar OR iCalendar if necessary, and secondly it's time that Palm Desktop imported iCalendar anyway (I'm only stating here what I've already communicated to PalmSource?, so this is not a "troll").
  • Sort of import-related: it's critical (to me anyway) to have support for opening iCalendar objects found on web pages via href which means there's a handler somewhere for opening such an item. This I feel will be an important use case for people in the future - they find an event online, say perhaps a concert by a favorite performer, or a festival happening that they want to attend, and they want to add it to their own scheduler.

-- TimHare - 04 Dec 2004

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r15 < r14 < r13 < r12 < r11 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.