r3 - 07 Jul 2005 - 14:24:50 - LisaDusseaultYou are here: OSAF >  Journal Web  >  AndyGlew? > AndyGlewWishlist20030130 > SelfSchematizingData20030130
I was quite concerned to see "Schema Evolution" on Mitch Kapor's slides, presented at Stanford EE380 on Wednesday, January 29, 2003.

Schemas are a pain.

Self Schematizing

Self-schematizing data structures are pretty damned useful. I believe I showed this pretty well with PerlSQL?. There's a Polish guy (TBD: get ref) who has been publishing a sequence of papers on formal models of self schematization.

XML makes self-schematization much more doable. You can write generic queries for almost free-form fields

   <name>Andy Glew</name>
   <name>
        <first-name>Andy</first-name>
        <last-name>Glew</last-name>
   </name>
   <personal-name>Andy Glew</personal-name>
   <nom>Andre Colle</nom>
  

Sure, you may not know all of the possible field names that might have been used. Allowing queries (such as wildcards) to be used on fieldnames gets you much of the way there:

    SELECT * FROM Addresses AS a
      WHERE a.*name* = Glew

Further such functions support translation.

Nulls

Appropriate treatment of NULLs, a la SQL, allows nonexistent fields to be handled in pretty generic manner for most queries.

Schema Evolution

Grudgingly, you'll probably want schemas. But schemas will evolve. If User1 and User2's AddressBook? schema evolve divergently, there will be loss of information whenever an address card is moved between them.

Sure, you can do most common ancestor, but loss of info is bad.

I hope that migrating such data across schemas will not lose data - that the foreign fields will still be preserved, even though they are not consistent with other records in the database. And that queries on them will be allowed.

Invariants and Behaviour

One objection to self schematization has been that it makes it harder to maintain database invariants. E.g.
  • each Person may have no more than 1 mother or father
  • each person may have no more than one Address
I trust it is obvious that these invariants are not useful in general.

Freeform text can preserve information that doesn't fit into such fixed schema, and I personally find myself, more and more, abandoning the forms of PIMs such as the primitive ones on my PalmPilot? and/or Outlook, and just using text. (InfoCentral? did pretty well).

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.