Proposal for Content Model Interoperation Experiments with non-native formats
Getting Started
This is meant to be a launching point for discussion about import/export from non-Chandler applications. Please comment on the
dev list or edit this page directly if you see something you'd like to change.
Note that this page isn't meant to cover Chandler to Chandler import/export.
Import and export
Chandler should be able to import and export data to a variety of standard formats. Because the
RepositoryFramework makes possible complicated relationships, often when users export to a standard format they'll lose some Chandler-specific data. At least until the world adopts Chandler's feature set! Any data that won't be accurately exported should be reported to the user.
Chandler users should be able to import most of their data from email and calendaring applications without any
loss of data. If there is any data lost in translation, Chandler should give detailed user feedback about what
data can't be successfully imported.
Usage patterns
Import and export will be used to migrate data into and out of Chandler. It may also be used to share data with
non-Chandler users.
Generally, import will mean reading in standard data files and creating appropriate entries in the repository. Export will mean taking a collection of items in the repository and converting them to a standard data format. Are there usage patterns that might require significantly different functionality?
Data Types
Data that should be a priority to import/export:
- Stored email messages
- Contact lists and address books
- Calendar information
- Task Lists
Data that probably isn't a priority to import/export:
Are Notes a priority?
Major data formats
Mailbox formats
- dbx (also called .mbx) - Outlook Express' mailbox format. dbxconv is a GPLed utility to convert Outlook Express mailboxes to standard mbox
- mbox - The UNIX standard format, widely used on most platforms
- Maildir - A slightly less common UNIX standard mailbox format which stores each message as a separate file. Easy to convert mbox <-> Maildir
- PST - Outlook's storage format. ol2mbox is a GPLed project to convert PST and dbx files to mbox
Contact formats
- vCard - industry standard format for contact data
- CSV - While many applications will import/export address books to CSV (comma separated values) files, there doesn't seem to be total standardization on how this should be done
- LDIF - LDAP Data Interchange Format. Ducky suggests this may have similar problems to CSV in terms of standardization
Calendar/Task list formats
- iCalendar - industry standard for calendar data
- vCalendar - old standard for calendar data, still used by some apps
- CSV - Outlook can import and export its calendar information fairly cleanly to CSV files
Compatibility charts
These charts are very preliminary, based on experience and a bit of web research.
Different application often have subtle (or not so subtle) implementation differences, so a lot of testing will be needed. With that said, here are various popular applications and the formats they can import and export.
| Can Import | vCard | LDIF | CSV Address Book | | mbox | | iCalendar | CSV Calendar |
| Eudora | one at a time | yes | | | sort of | | | |
| Evolution | yes | yes | | | yes | | yes | |
| Mozilla | one at a time | yes | yes | | yes | | | |
| Mail.App/OS-X Address Book | yes | yes | | | yes | | | |
| Outlook | one at a time | | yes | | sort of | | one at a time | yes |
| Outlook Express | one at a time | yes | yes | | sort of | | | |
one at a time means that the application can import the file if it's an attachment to an email, which gets old fast with a big list of contacts.
sort of for Outlook, Outlook Express, and Eudora mbox import means that the application wants to import from a specific program, there are reported difficulties getting import to actually work. For Eudora you can rename mbox folders to have a .mbx extension and Eudora will read them, but attachments will be left inline.
| Can Export | vCard | LDIF | CSV Address Book | | mbox | | iCalendar | CSV Calendar |
| Eudora | | | yes | | almost, not quite | | | |
| Evolution | yes | | | | yes | | yes | |
| Mozilla | | yes | yes | | yes | | | |
| Mail.App/OS-X Address Book | yes | yes | | | yes | | | |
| Outlook | ? | | yes | | | | a few at a time, only Calendar | yes |
| Outlook Express | no? | | yes | | | | | |
Eudora exports mbox
almost, not quite because Eudora's native format is almost mbox, but not quite, attachments are removed and line feeds might not be right.
Eudora2UNIX is a GPLed utility to take Eudora folders the last step.
Outlook (Outlook 2000, at least) is
a few at a time because, while it exports to iCalendar by selecting calendar information then selecting Actions > Forward as iCalendar, the only way to select multiple events is to control-click, control-A doesn't select all. So it seems you have to manually click each event.
It seems likely that import/export filters will need to be created for all of the above formats. We may also want
to directly import Eudora address book files, the syntax is very simple.
Are there other formats we should target?
What formats do the major PDAs speak?
License issues
GPLed utilities for format conversion are out there. We might want to not reinvent the wheel, and if we have problems contribute our code back to existing data conversion projects. But there might be licensing issues.
Getting mail import right
Doing import/export well in general is important, getting mail import right is essential. If a user imports their mail and it doesn't work quite right, there's a good chance they'll give up on Chandler.
Potential problems with mail import:
- Some folks store messages they sent in the same folder as the messages they received on a particular topic. Getting Chandler to know about this might be tricky
What else?
Incompatible strings
Some Chandler data may contain strings that aren't allowed in a particular format. Export routines should have a well-defined way of converting incompatible strings to compatible strings. This may be challenging for formats that aren't unicode compliant.
Testing
For any validly formatted data file foo, export(import(foo)) should equal foo. Unit tests should be created to make sure this is true for sample data. Creating data with this in mind should be part of the
SampleDataProject2004.
Interoperation
Chandler should strive to follow published formats closely. Other applications, however, may output non-standard
files or behave in unexpected ways when importing standard files. Is interoperation with buggy applications a priority?
Proposed Priorities
- Implement a CSV address book format because it's relatively easy
- Get mbox import/export working reasonably well
- Implement vCard
- Implement LDIF
- Implement iCalendar
--
JeffreyHarris - 11 Feb to 13 Feb 2004
Comments Welcome
Goals
- Eventually it will be important for Chandler to have full-featured import/export facilities available to the end-user. But that's not something we need right away. For 1.0 it's probably okay to only have modest import/export features, and we can gradually add more as we go.
- What's important right now, in the 0.4/0.5 timeframe, is to put some work into getting a reality check about our content model. We need to validate our work on the content model, and make sure that we're not making any content model decisions that would "paint us into a corner". The content model doesn't have to perfect to start, but we want to make sure that we're on a path towards a design that will enable Chandler to interoperate smoothly with other apps. For the 0.4/0.5 timeframe, the main goals for the import/export project should be to identify problem areas in the content model, and come up with questions and suggestions and open issues about changes or additions we might want to make to the content model in order to interoperate well with other apps.
- In the long run, there are several types of "interoperation" we will want the content model to support. Import and export are the most basic types of interoperation. We will also want to support one or more types of live sync feature, probably using SyncML, or something like it. Import and export will usually be infrequent operations -- for example, I might only import data once, when I first become a Chandler user. If the import process munges a few a little things (like the last-modified date of a contact), those are things I can live with, or fix by hand. But in contrast, if I use the sync feature twice a day to keep Chandler in sync with my PDA, then I need that sync process to work quite smoothly, so it's important that we design the Chandler content model to be compatible with other apps.
File formats
- I like the easy-to-read tables about file formats. I assume CSV means "comma separated values", but I'm not sure what LDIF is. It might be good to have links to glossary entries for LDIF and CSV, and maybe vCard and iCalendar as well.
- In the tables, right now there are an initial six apps (interoperation platforms):
- Eudora
- Evolution
- Mozilla
- OS-X Address Book
- Outlook
- Outlook Express
- Eventually, we should also consider a number of other interoperation platforms:
- Palm & Palm desktop
- other PDAs and cells phones (PocketPC?, RIM, etc.)
- Windows Longhorn
- We should prioritize those interoperation platforms based on some criteria, like market share or number of users. Our effort on the import/export work should be very much focused on the specific platforms that we've identified as being most important. Here's a page with pointers to previous OSAF work on this topic: ChandlerEcosystem20030808 -- also, Chao may have some sense of the market share of different platforms, and he may have some good intuition about which platforms to put the most effort into.
Interoperation
- Q: In the Interoperation section, it says "Chandler should strive to follow published formats closely. Other applications, however, may output non-standard files or behave in unexpected ways when importing standard files. Is interoperation with buggy applications a priority?"
- A: My two cents: I think that defacto interoperation is more important than following the published standards. There may be some vCard features that nobody uses, and that we can safely ignore. And there may be some "non-standard" Outlook fields that a million people store data in, and that we need to be able to interoperate with even though they don't adhere to a published standard.
- So, practically speaking, I think the best place to start is to collect a motley assortment of real-world data files exported from various apps -- and only look at the published standards later, if needed, to help understand the exported data files.
- At some point it might be useful to post something on the dev list, asking people what platforms they use now, and asking for small sample export files in various formats. Might be a good reality check to see all the different versions of stuff that people are actually using. And it might be good to get a real-world sampling of what sort of fields/features/formats people actually use.
A couple of things:
- Before you go crazy on supporting data formats, make sure you expose a good API for interested parties.
Check with third-parties to see who's interested in writing specific import/export filters.
- Be sure to support Notes/textual items that didn't happen to arive on the email transport. I.e. Notes should be a priority. At a technical level, it shouldn't be much different than supporting email bodies.
--
MicahDubinko - 16 Feb 2004
This is a top-of-the-head reaction that I'm throwing in for consideration; I'm not sure that I believe it completely myself:
There should be a text-based format (probably XML-based) that is capable of capturing all the data types and relationships in the repository (i.e., it should be possible to "dump" and "restore" a repository to/from this format without losing anything). Some arguments for this:
- True persistence: there's a good chance that information in a text-based format will be readable in places and times where Chandler doesn't (any longer) exist.
- Interlingua: it could provide a common source/target for translation from/to other formats, independent of Chandler code or versions; standard tools of the XSLT variety could be used to do the transformations.
- Versioning: it would provide a crude facility for keeping versions of a repository. It might also be possible to cobble up a "diff" tool that would make the differences between versions humanly readable.
--
DonDwiggins - 17 Feb 2004
- Looks like a great start
- I'd emphasisze Brian's point that the first priority is to get data as a reality check for the content model
- We probably want to deprioritize mail right now, for this project. We're going to hire a full time engineer to own mail in general, including the content model and what libraries to use. We have enough open issues still that its not yet a good use of time to get started on the import/export feature for mail yet.
--
KatieCappsParlante - 17 Feb 2004
- This is a good list of issues.
- I agree with use of XML and/or XSLT. XML export provides the opportunity for many to use their own transforms if they want to.
- While I really see a need for Palm support, I do not think it is worthwhile to try to export to vCalendar format directly, for two reasons: first, you can go to an XML representation that can later be transformed to vCalendar OR iCalendar if necessary, and secondly it's time that Palm Desktop imported iCalendar anyway (I'm only stating here what I've already communicated to PalmSource?, so this is not a "troll").
- Sort of import-related: it's critical (to me anyway) to have support for opening iCalendar objects found on web pages via href which means there's a handler somewhere for opening such an item. This I feel will be an important use case for people in the future - they find an event online, say perhaps a concert by a favorite performer, or a festival happening that they want to attend, and they want to add it to their own scheduler.
--
TimHare - 04 Dec 2004