Data Modeling and Schema Design -- Introduction
This page gives a quick overview of what data modeling is, for people who are new to data modeling and schema design.
You may also want to look at the main
Data Model page, which has links to all the wiki pages related to the data model.
Contents
(all of these links stay inside this page)
What is a schema?
A schema is a description of a database. A schema tells what kinds of information a database is supposed to keep track of, and the schema serves as a "blueprint" for the layout of the information in the database.
Sometimes schemas are drawn as diagrams; sometimes schemas are described in plain-English documents; and sometimes schemas are represented in special modeling languages.
What is a data model?
Some people use the term
data model as a synonym for
schema. When we talk about our data model at OSAF, we usually mean something slightly different. We're not always consistent, but we generally use the term data model to describe the set of building blocks that schema authors can use to describe schemas. More on that in minute, after some examples...
The bookstore example
Most databases are set up to store information as records in tables.
For example, if we were starting a company like amazon.com, then we would need a database with a table for keeping track of books. It might look like this:
| Books |
| title | author | publication date | price | quantity in stock |
| The Cat in the Hat | Dr. Suess | 1957 | $8.99 | 34 |
| The Transparent Society | David Brin | 1999 | $12.60 | 29 |
| The Future of Ideas | Lawrence Lessig | 2002 | $10.50 | 62 |
The different columns in the table will hold different types of data. For example, the
title column will always have a text string, the
publication date will always have a date, and the
price will always have a dollar amount.
Along with the
Books table, we might also want tables to keep track of things like
Customers,
Orders, and
Shipments. Entries in one table may refer to entries in other tables. For example, the records in the
Orders table will refer to records in the
Books table and the
Customers table.
The bookstore schema
Our bookstore database would have a schema that described all the tables. The schema would describe:
- The names of all the tables
- The names of all the columns in each table
- The type of information expected in each column
- The relationships between the tables
Why is the schema important?
A schema is the "blueprint" for a database design. Before building the database, you need to know what you're going to put in it.
But the schema isn't just the design for the database. Schemas often end up describing the data structures used throughout a system -- the same data structures end up being used in the database, and in application code, and protocols and APIs between sub-systems. The schema becomes an important tool for making sure that all the different parts of the system are all in sync; that everyone is on the same page in terms of the syntax and the semantics of data that's getting passed around from one component to another.
Different types of databases
There are lot of different types of database software, and different types of database management systems (DBMSs). Here are a few of the different kinds of DBMSs:
| era | type of database | examples | concepts |
| 1960s | network databases | | |
| 1960s | hierarchical databases | | |
| 1970s onward | relational databases | Oracle, Sybase, MySQL | tables, records, columns |
| 1990s onward | object databases | | classes, instances, attributes, object persistence |
| 1990s onward | object/relational hybrids | Informix Universal Server | |
| 1990s onward | associative databases | | tuples, sentences, associations |
Each of the different kinds of DBMS offers its own concepts about how data should be represented. For example, object databases usually have the idea that classes can be sub-classes of other classes, whereas relational databases don't have any analogous concept. So these different types of system each have different
data models -- different conceptual models for how data is represented, and different building blocks that schema authors use to build schemas.
Different types of schemas
In the bookstore example above, we described the schema in terms of tables and columns, which are concepts from the relational database paradigm. If you were planning to use a relational database for the bookstore, then it would make sense to describe the schema in terms of relational database concepts. But if you were planning to use an object database for the bookstore, then we would probably want to describe the schema in terms of classes and sub-classes and attributes.
There are a lot of different ways to describe schemas, and it's important to pick a representation that fits the tools you're working with. Different representations embrace different concepts, and you want to pick a representation that seems to match the actual data-representation features offered by the DBMS that you're using. Here's are a few of the different kinds of schema representations that people use:
- Entity-Relationship diagrams (ER diagrams)
- UML Class diagrams
- Data dictionaries
- AI frame-based representations
- XML DTDs
- RDF schemas
Data in Chandler
In the bookstore example, we talked about
Books and
Customers and
Orders. The Chandler schema focuses on things like
Email Messages,
Address Book Contacts,
Calendar Appointments, and
To-do List Tasks.
In Chandler, we'll need a PIM schema to describe all the different kinds of information that are managed by the standard PIM parcels, like the Email parcel and the Calendar parcel. The PIM schema will need to:
- describe what kinds of things are used by the standard parcels
- define what kinds of things will be stored in the repository
- define what kinds of things will be shown in views
But in addition to the Chandler PIM schema, Chandler also needs to have an underlying data model. Chandler has an open architecture that's designed to let third parties create new parcels. Those new parcels will also need to store their data in Chandler repositories. And users will need to be able to make queries about the new data, and share the new data, and assign access rights, and so forth. The Chandler data model needs to provide a means for new parcels to describe to Chandler what their data looks like. And end users will also be able to create new data structures. Users will be able to define their own classes of data, for keeping track of anything they want, from favorite movies to class assignments to bug lists.
Chandler needs to support some level of data expressiveness, and for each aspect of that expressiveness, have it be handled consistently across all the parts of Chandler, from a generic view, to a Python object mapping, to network transport, to a repository data store. The Chandler data model describes Chandler's basic notions about how data is represented, in terms of
items and
attributes, and
kinds and
types, and small set of other building blocks.
Chandler items and attributes
In designing Chandler, we need to decide what sorts of data and relationships can be represented in the repository (and expressed in Python objects, and manipulated in views, and exchanged between networked Chandler apps).
In the bookstore example above, the data model has tables and columns, like a relational database data model. The Chandler data model will have some similarities to that, but Chandler will also have some important differences. Relational databases are great for storing structured data, but they aren't designed for handling unstructured data. We want Chandler to be designed for storing both structured data and unstructured data. A Chandler user should be able to enter info in an unstructured way and gradually add structure later, if and when they feel like it.
Instead of using a relational database data model, Chandler has a data model that looks more like what you would see in an associative database, or in an RDF schema. All the data in Chandler is represented in terms of items and attributes. Every email message is an item, and so is every calendar appointment, and every address book contact. New parcels will create new kinds of items. For example, an MP3 player parcel might create MP3 items. And end users will also create new kinds of items. Someone running a small bookstore might create items to represent Books, and Orders, and Customers.
Each item contains a set of attributes. For example, for an item that represents a dentist's appointment, the attributes would be things like the start-time that the appointment is scheduled for, the duration of the appointment, and the dentist the appointment is a scheduled with. Items are related to each other through attributes. For example, in the dentist's appointment item, the "dentist" attribute could have a reference to a contact info item that represents the contact info for the dentist.
Each individual item can be an item of some kind, and different kinds of items will typically have different types of attributes. Calendar appointment items will have attributes for things like "start-time" and "duration", and e-mail message items will have "sender" and "subject line" attributes, and book items might have "title" and "author" attributes.
In a relational database, each record
must be in a table, and the record may only have fields that correspond to the columns of the table. Chandler will be more flexible. In Chandler, an individual item can be of some kind, but it doesn't have to be. Items don't have to be of any fixed kind at all. And even if an item is an item of some kind, that doesn't mean that the item is limited to only having the attributes typically associated with items of that kind. A user can always add more attributes to an item. If you wanted to, you could take a calendar appointment item and add a "directions" attribute to it, to keep track of the directions to get to your dentist's appointment. The dentist appointment can have a "directions" attribute, even though other calendar appointments don't.
That's a quick intro to items and attributes. If you're looking for more background about items and attributes, have a look at:
Data model status
Pointers to books about data models
Contributors
Discussion
(none yet)