r1 - 17 Mar 2004 - 15:37:03 - ChaoLamYou are here: OSAF >  Journal Web  >  MeetingNotes > SmallMeetingNotes > SampleDataMeeting20040217

Notes from the Design Group Meeting on 17 Feb 2004

  • MitchKapor says:
    • points out that even a tiny amount of the right sample data would be extremely useful -- important that the sample data capture examples of rich interconnections (e.g. stamped item)

  • MimiYin says:
    • stresses that it's important that the data be realistic -- it should be based on realistic user profiles and realistic use cases
    • it would be good to have a couple user profiles -- maybe one set of data for a home user and one set of data for a business user
    • would find the data most useful if it were made available in a spreadsheet format
    • needs to have enough data so that different views can be fully populated, including views that show filtered or collapsed views of the sample data -- maybe 50 to 100 mail messages, and comparable amounts of other kinds of items?

  • Next Actions:
    1. Mitch to harvest real data from Mitch's PIM and create a small starter set of quality sample data. Needs to be scubbed of any personal info so that it can be used publically in examples.
    2. Mimi may add additional realistic sample data so that we have a good amount of sample data.
    3. Brian to take combined data sets and hand them off to Jeffrey.
    4. Jeffrey to map sample data to 0.4 content model, and convert data into parcel.xml formet. Jeffrey to write transform to make parcel.xml data also available in some simple spreadsheet format (e.g. CSV format).

Open Questions (and answers from Ted via email)

  • Can we have a single, unified set of sample data, which meets our needs for design work, content model validation, and repository testing (repository unit tests, not performance tests)? Or would it be better to have different sample data for different purposes?
    • "I think that having a reference set of sample data that works for design, content model, and repository purposes is a good idea. I also think that we will have additional data sets for other purposes, (e.g. my 17,000 RSS feeds of data for stress testing the repository)." -- TedLeung - 9 Feb 2004

  • How much sample data do we want? A few dozen items? A few hundred? A few thousand?
    • "I think that we probably want a few dozen items of each kind for this purpose. If you want a standard set of test data, we probably need more than this, a few hundred items each." -- TedLeung - 9 Feb 2004

  • Can this be made-up data, or would it be better to collect real data from a PIM program that one of us uses? Should we put some effort into making sure that the sample data reflects the use cases we care most about?
    • "I think that things like people's names and addresses can be made up, but the relationships between the items should be as close to realistic as possible." -- TedLeung - 9 Feb 2004

  • What format do we want the sample data in? Parcel.xml files? Python unit test code? Excel spreadsheets? Interconnected wiki pages?
    • "If you want live data that can be shown in Chandler/prototypes, then we need at least parcel.xml files. Unit test code should access the data via the repository API anyway. Encoding lots of data in test files leads to problems. It would be great to generate the various formats (xml, XLS, wiki pages, etc) from a single source, in order to reduce the number of typing bugs." -- TedLeung - 9 Feb 2004

  • If we parcel.xml files as the standard source format for sample data, does that format meet everybody's needs, or should we have transformas or export tools that can convert it into another format?

  • Should we just dive in, and learn as we go, or should we do a little planning first to identify "requirements" and figure out who wants to be involved?
    • "I think that generating a small subset of the total data set accompanied by feedback about what's easy/hard/whatever would be a good start. I've already noted some issues with modeling in the query proposal" ChandlerQuerySystem -- TedLeung - 9 Feb 2004

-- ChaoLam - 17 Mar 2004

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.