Notes from the meeting
The purpose of the meeting was to see if we had reached agreement on the approach outlined in the query proposal <http://wiki.osafoundation.org/twiki/bin/view/Chandler/ChandlerQuerySystem>, and if so to establish next steps for the query project.
John had some concerns about the performance of queries. He gave the example of a query over 10 years worth of e-mail.
At this point Mitch stopped in and clarified some expectations. A key observation was the notion of a person's working set. Queries over items in the working set need to be efficient. Queries over items outside the working set (archival items) could take longer. Mitch said that it was acceptable that the user do some work in order to tell Chandler which items were working vs archival. He wanted to be able to query efficiently (response in 45sec to 1 minute) in working sets containing low millions of items, where each item also had text attached.
John raised a question about the efficiency of using reference collections. Ted pointed out that reference collections are implemented using Berkeley DB B-trees, which provide log2(n) access to the data that they index. For the case of e-mail, the contacts in the From: To: Cc: etc fields will be members of reference collections, so searches should be efficient. Searching the text of the messages would be handled via Lucene. Andi described a bottom up query evaluation plan for the query "all messages from Freda with the word property manager in them". In this plan, Lucene would be used to determine the UUID's of items whose message text contained a particular text string; these UUID's can then be tested for membership in Freda's From: reference collection. There was also a concern about the amount of space that would be used by using reference collections to accomplish indexing tasks. John was interested in having some way to graphically display the breakdown of space usage in a repository. The big take away was that we need to measure and keep an eye on both time and space performance for queries.
In general people agreed that the proposal captured the direction that we want to move in.
Tasks noted at the end of the meeting
- John will write his ideal code for using a query a List view, and he and Ted will work to implement as much of that as possible in the repository design.
- There is a need to go through the content model and address some of the issues raised by the query examples. Ted will work with Katie to make sure that this happens.
- 25 Feb 2004