Project Name Change
Most people won't have heard this, but the "Chandler Server" project has gotten a super duper code name: "Cosmo". I'll be updating the wiki pages to use the new name.
Cosmo Architecture
As the web UI component of Cosmo begins to come more in focus, it is increasingly clear that the server's architecture is going to need to change to support both HTML and CalDAV interfaces. Additionally, we need to ensure that the architecture can handle the scalability requirements of Westwood and beyond (including possibly hosting Cosmo at OSAF for large-scale public use).
(Note: what I have been calling the "shared file store", I would like to rename the "content store" [or interchangeably, "content repository"] to align more closely with industry standard terminology.)
Breaking It Down
We can think of the CalDAV component of the server as nothing more than a network protocol adapter to an API that accesses a content store. It's easy then to imagine the Web UI using the same API to access the content store. There are obvious advantages to this design.
Now let's detour to think about scalability. The typical route to scaling a J2EE web application is adding instances of your servlet container. You can cluster them to share web session state and/or use load balancing components to ensure host affinity for each web session and/or use remote servers to manage or cache your sessions (see memcached, which is essentially a distributed hashtable). If you're using EJB, you can run separate instances of the EJB container and have your webapps access their EJBs remotely. And you can cluster or distribute your RDBMS. All pretty well known stuff.
In our case, we're essentially replacing the RDBMS component with a higher level abstraction (the content store) which can be implemented using a filesystem, an RDBMS, whatever. And since we're not using EJB, we don't have to consider an appserver tier. So what we're left with is the servlet container and the content store (assuming that the servlet container presents both the HTML and CalDAV interfaces).
Deployment
The simplest and most popular deployment will be the "embedded" deployment, in which the servlet container and content store are colocated in a single JVM. We can assume that we'll use one webapp for the web UI and a second one for the CalDAV adapter, both of which access a global (server-wide) content store via our content repository API. This packaging allows a user to deploy 1) a bundle with only the servlet container and the Cosmo components or 2) a kitchen-sink appserver in which Cosmo is but one of many apps.
For hosting Cosmo, we'll want a "split" deployment in which the servlet container and content stores are separately scalable as described above. In this configuration, we have two questions: 1) how does the web UI talk to the content store across the network and 2) where does the WebDAV adapter go?
We can solve both problems by using a content repository API that can talk CalDAV across the network to the content store (let's call this "in-store CalDAV"). In this case, the CalDAV adapter lives in the same process as the content store. CalDAV clients can talk directly to the content store process, ignoring the overhead of going through the servlet container process. In this case we can't assume that the servlet container has provided any security; we have to ensure that the content store itself can also enforce all security constraints.
Alternatively, we can use RMI or IIOP or some other network protocol behind our content repository API and locate the CalDAV adapter in the process with the web UI (we can call this "out-of-store CalDAV"). If we configure things such that only trusted clients (the web UI and the CalDAV adapter) can access the content store, then we can potentially delegate the rest of our security to the servlet container layer, simplifying the content store.
More About Security
Both of these models that we persist and manage security entities (users, roles, permissions) partially or wholly outside of the content store, a conceptually cleaner model. Let's assume that security entities are persisted in an external database (or directory, or central authority/single signon server, or whatever). Now we need an interface for managing those entities, which we can provide with a web application deployed into the same place as the web UI, and we need an API for managing those entities. Each of our components (web applications, CalDAV adapter, content store) will use our security API to get the information it needs out of the security database. This is the position we want to be in - our security requirements do not force us to choose one deployment model over another.
Configuration and Logging
An implied requirements of our architecture are that it should be relatively simple to switch between the deployment models. This mandates that all of our components use the same configuration and logging mechanisms as much as possible.
Components
So, we are left with these architectural components. If we provide them all, then we can support both the embedded and split deployment models.
- calendar web UI
- admin web UI
- CalDAV adapter
- content repository API
- local implementation
- remote/CalDAV implementation
- remote/RMI (or something else) implementation
- security API
- content store
- security database
Open Questions
- Which of the split deployment models is preferable - the one that uses CalDAV to communicate between the web applications and the content store, or the one that uses RMI/whatever else? Are there particular reasons we'd want the CalDAV adapter to live in one process or the other? Could/should it live in both places - client CalDAV requests come into the servlet container process which relays them back to the content store? What would be the point?
- How does the content store scale? Are there significant architectural properties that cause us to choose say a clustering approach over for instance multiple content store nodes which each "own" their own subset of the overall universe of content?
- What sort of monitoring/management capabilities do we want? Java provides the JMX API which allows clients to monitor and control components. For instance, you can point a JMX browser at a Tomcat instance and browse statistics like the number of active sessions, the utilization of database connection pools, etc. Our components could provide MBeans which could be similarly browsed. This could be extremely useful for automated management tools in a hosted environment.
- How much and what sorts of information do we want to capture in logs? Will Cosmo be used in environments where its operations will need to be auditable? The platform we've chosen will make it easy for individual organizations to provide customized logging, but we should also consider what sorts of logging will be useful to all users.
- Can we imagine needing any sort of cobranding infrastructure? Consider a company that hosts Cosmo, selling accounts to ISPs and enterprises, each of whom want to use their own branding for the web applications. Or even just the guy who runs a small community Linux server and wants to have a Cosmo server with his own "skin". How much of this do we care about?
- Should Cosmo be localized to anything other than US English? Java and the various frameworks we're using give us most of the internationalization we need for free, but we do need to do a little bit of advance planning if we're going to provide any localizations.
- Will Westwood require any sort of portal support? One can easily imagine a campus portal including class schedules, event calendars and so forth as remote portlets. According to Stephane Croisier of Jahia Software, many universities now have "my.university.edu" portal sites similar to My Yahoo built with uPortal or Jahia's CMS and Portal Server. It could be very important to provide JSR 168 compliant calendar portlets to be included in these portals.
Software
Having arrived at a candidate architecture, it's worth examining what the open source world has to offer in terms of existing standards and projects in these areas.
JCR
Day Software have proposed a standard content repository API called "JCR" in
JSR 170 and have contributed a reference implementation to the Jackrabbit currently incubating at Apache.
JCR is a good candidate for our content store API. It's a very simple and clear API with two classes of compliance (roughly corresponding to read features and write features) and a series of optional features such as locking, versioning, and SQL-based searching. It was built to allow networked implementations.
Some reservation has been expressed about the ability for those outside the current working group to contribute to the specification process, but I'm not too worried about that. I've been impressed with how communicative the people on the Jackrabbit list have been, and I would be surprised if we would be completely shut out of the process if we truly wanted to be a part of it (but perhaps I'm naive).
Slide
I've built a prototype server over the past few weeks using Slide, mostly to get a feel for the quality of its design and implementation and to gauge the level of support from its community. I have been disappointed in just about every area. I find its APIs confusing and mostly poorly documented. The community seems stagnant and unwilling to offer much help (with the notable exception of one Daniel Florey who made several helpful suggestions). I am still unable to figure out what their release plans are; I keep hearing about a mythical Slide 3.0 (which supposedly will include some sort of support for JCR), but I have seen no concrete planning.
In general, I'm extremely skeptical of continuing to base any of our efforts on Slide. The only benefit to doing so is that they have a mature, complete implementation of the WebDAV specifications. Unfortunately I think the negatives outweigh the positives.
Jackrabbit
Jackrabbit aims to provide a complete
JCR implementation with RMI and WebDAV network support that can use many different storage technologies. It's still incubating and hasn't yet had a formal release.
As I mentioned above, the trajectory of this project is the opposite of Slide's. There is a quite active (if still small) community that has done a lot of work in a relatively short time (less than 6 months, I believe). Plus, the project leaders seem to be much more interested in building clearly documented, understandable, and most importantly extensible software. I think this is the perfect time to jump in, make some valuable contributions, help grow the community, and further our own goals at the same time.
WebDAV"> Acegi WebDAV
This is a nascent project started by Ben Alex of Acegi Security (see below) to provide a
WebDAV implementation based on Spring and Acegi Security.
I joined this project as a committer a few days ago, mainly to contribute to the discussions around where the project is going, but also to lend a hand in certain areas such as CalDAV support. One of the first orders of business was to evaluate JSR 170 and its emerging implementations, including Jackrabbit. The current feeling in the group is to build on top of Jackrabbit, leveraging their WebDAV capability and JCR implementation and integrating Spring and Acegi Security.
Spring Framework
Spring is a popular
J2EE framework that eschews EJB and other complicated J2EE constructs in favor of Inversion-of-Control and Aspect-Oriented-Programming principles. It provides facilities for integrated configuration and exception handling and decouples software layers from each other, making both development and testing much easier.
I have been using Spring as an essential component of my projects for the last year. It's hard to explain how great Spring is with any brevity. If you don't want me talking your ear off about, then just accept that it will be an integral part of Cosmo as well.
Acegi Security
This project provides
security module for Spring with extensive authentication and authorization features. It can integrate with servlet containers to provide security features for applications that must be portable across container implementations. It's a well designed and implemented framework using practices that are both industry best practices and my personal preferences.