I was quite concerned to see "Schema Evolution"
on Mitch Kapor's slides, presented at Stanford EE380
on Wednesday, January 29, 2003.
Schemas are a pain.
Self Schematizing
Self-schematizing data structures are pretty damned useful.
I believe I showed this pretty well with
PerlSQL?.
There's a Polish guy (TBD: get ref) who has been publishing
a sequence of papers on formal models of self schematization.
XML makes self-schematization much more doable.
You can write generic queries for almost free-form fields
<name>Andy Glew</name>
<name>
<first-name>Andy</first-name>
<last-name>Glew</last-name>
</name>
<personal-name>Andy Glew</personal-name>
<nom>Andre Colle</nom>
Sure, you may not know all of the possible field names
that might have been used.
Allowing queries (such as wildcards) to be used on fieldnames
gets you much of the way there:
SELECT * FROM Addresses AS a
WHERE a.*name* = Glew
Further such functions support translation.
Nulls
Appropriate treatment of NULLs,
a la SQL, allows nonexistent fields
to be handled in pretty generic manner
for most queries.
Schema Evolution
Grudgingly, you'll probably want schemas.
But schemas
will evolve.
If User1 and User2's
AddressBook? schema evolve
divergently, there will be loss of information
whenever an address card is moved between them.
Sure, you can do most common ancestor,
but loss of info is bad.
I hope that migrating such data across schemas
will not lose data - that the foreign fields
will still be preserved,
even though they are not consistent with other
records in the database.
And that queries on them will be allowed.
Invariants and Behaviour
One objection to self schematization has been that
it makes it harder to maintain database invariants.
E.g.
- each Person may have no more than 1 mother or father
- each person may have no more than one Address
I trust it is obvious that these invariants are
not useful in general.
Freeform text can preserve information that doesn't fit
into such fixed schema, and I personally find myself,
more and more, abandoning the forms of PIMs
such as the primitive ones on my
PalmPilot?
and/or Outlook, and just using text.
(
InfoCentral? did pretty well).