Thoughts on server/client (dual-use) repositories
Fundamentally, you can do one of three things:
- Specialize on doing client repository well
- Specialize on doing server repository well
- Spend a lot more trying to do both and make compromises where you can't do both equally well.
Chandler's repository should do the first, which is already difficult. The choices we're making for storage, frequent XML parsing and inter-linkedness, as well as transactions and conflict resolution, are all geared towards doing client application support. We already have enough work ahead of us to make that work reliably, fast, and in a small downloadable application footprint. Here's a bit of explanation of the different requirements, as I've learned from implementing Web and WebDAV servers as well as email servers -- really this is applicable to any situation where a server repository must serve up data to a large number of clients.
- Performance: a server repository must be MUCH faster. It must be architected from the start to be able to quickly serve data to many clients. It must be able to quickly access data by reference, rather than follow links from item to item laboriously.
- Random access to data: a server accesses data in different patterns than clients. Servers may access data in bigger chunks more often, particularly if the server supports client synchronization.
- Different kinds of data: a server typically needs quota information and may need different information about data changes and data history, more than is needed on the client, to support synchronization.
- Data parsing: A client needs to open up data formats and parse the information much more often, in order to be able to display pieces of it here and there on the screen. Servers don't need to do that as often.
- Inter-linkedness: A server has less need to support bidirectional refs -- these are reconstructed on the client anyway, so why bother constructing them on the server.
- Transactions: A server can enforce locking because transactions happen in the span of one request/response. This is most true for HTTP but it's also as true as it needs to be in IMAP, SMTP, FTP etc. The server can tell when it recieves the client request what data to lock, lock it, perform the request, unlock it and commit, and send the client response. Transactions aren't long-lived and obviously can't be cancelled by the user. There's very little need to view the data in the middle of the transaction.
- Conflict resolutions: There's no user around to do human-based conflict resolution, thus it's very handy that the server is typically able to do transaction locking instead.
We know of some repositories that serve as both client and server repositories: mySQL, ZODB. However, these repositories are usually used for "toy" server implementations, and we rejected both of these as not having the features we wanted in a client repository. So both of these are examples of having a repository that does a middling job of being used for both client and server applications.
Consider Outlook and Exchange which have completely separate repository implementations. Consider Web clients and Web servers which always have independent repository implementations. Consider every email client and email server -- there are
a few IMAP clients and servers that store mail in the same file formats, but these formats aren't considered to be optimized for either an excellent client experience or for excellent server performance.
Thoughts on the server "plan of record"
What is the Westwood Server going to be? Must it support WebDAV? RAP? LDAP? POP3? IMAP? SMTP? XMPP? CalDAV? what functions are Chandler clients expected to perform against Westwood servers? Are other clients expected to be able to work against Westwood server as well (Web, email, calendaring, IM clients)? What are the most common actions -- e.g. synchronize, browse...? How is the data organized -- by user directory? Are there quotas? What is the adminstrator going to need to be able to do?
We would need answers to these questions to have a real and believable server architecture. Right now I would say that our plan of record has a fair amount of requirements and a few vague ideas about how to achieve those -- our plan of record should not yet include a server architecture. Some things to keep in mind for now:
- There are so many ways to approach architecting a Westwood server that we really need to compare several of those approaches, not only one or even two. Is it time to solicit some of those and start comparing and breaking them down into their fundamental characteristics?
- After narrowing down some of the ideas, we should take the tack of trying to keep the most reasonable opportunities open as long as it's low cost.
- We shouldn't let server planning influence us to do more implementation today than is otherwise necessary. Remember a cost today is higher than a cost tomorrow so it's usually unwise to do programming now that you don't need now and only think you'll need later (http://c2.com/cgi/wiki?YouArentGonnaNeedIt).
- 15 Oct 2004