Framing the Sharing Server Discussion
How data-model-aware does the sharing server need to be? This is a spectrum ranging from
extremely unaware such as an FTP server, to
somewhat aware such as a WebDAV server which understands the notion of resources with properties, to
very aware such as an application server running an implementation of the Chandler repository data model.
A server which understands at least some portion of the data model can be advantageous in a number of ways. For example, imagine a very "active" item collection where lots of clients are adding (or removing) items from that collection. As it stands today, a server which didn't know how to merge collection changes would have a very hard time dealing with this -- the clients would continuously be colliding with eachother, trying to submit their new copy of the collection list until nobody else was trying. A server which understands item collections would be able to simply merge those otherwise overlapping changes. Another advantage would be a server which could understand the ACL data in the model and use it. This doesn't mean the server needs to understand the complete data model; perhaps there is a subset which gets us most of the support we would like. There are also advantages for being able to use a mostly "off the shelf" server, such as being able to choose a different implementation, etc. Hopefully the data-model awareness of the server isn't an all or nothing proposition.
The other question is about the protocol and protocol data model (please read Lisa's
excellent post). I don't think we're in disagreement that we will want to be sending change-logs instead of entire items (as our current implementation does). I base my opinion on this from past reading I've done about how synchronization is traditionally accomplished, and based on conversations with people who have done this sort of thing, like Groove). Our repository already produces change logs which could be sent, in some form. I (admittedly not yet being a WebDAV expert) perceive an impedence mismatch with sending change logs via WebDAV, but this could be a completely bogus perception, and I would like to better understand if this is a good fit. Being able to transmit change-logs doesn't mean we can't also simply access items as WebDAV resources -- I think both options should be available.
I also want to understand how we can get by without transactions. Many items are being modified during a Chandler sharing operation, the chance for overlap is high, and I fear we're going to go through some pain trying to handle this without transactions. WebDAV supports locking of single resources or resource collections (not to be confused with item collections), but I don't think you can lock two separate items atomically (please correct me if I am wrong), and locking is an optional server feature.
I would like emphasize that this isn't a "WebDAV or Not" debate. I see the benefit of not inventing a new protocol. I agree that the work to create an industrial strength server shouldn't be underestimated. I think it's about what features we need from a server, as well as the procotol data model. I see a lot of benefits to leveraging the work that has been done in our repository, and I don't yet see how we will get by without some amount of that functionality on the server side (merging, transactions, etc.). It is my hope that in the next few days we can be really concrete about how we can implement solutions to these problems using an off the shelf WebDAV server (and what additional work it would require to extend such a server). Plus I really need to become a WebDAV expert quickly.
List of "how do we handle such-and-such",
a work in progress
- How do we handle sharing a collection with 1,000,000 items in it?
- How do we handle lots of people modifing a collection?
- How do we handle the case where items need be modified "transactionally" -- need a better example here
- How do we enforce "business logic" constraints such as how many students can be signed up for a class?
- ...
--
MorgenSagen - 06 Nov 2004
Some feedback on these thoughts: [LD = Lisa Dusseault] [MS = Morgen Sagen]
- [LD]: Merging item collection membership ("a server which didn't know how to merge collection changes would have a very hard time dealing with this [an active collection]"...) This doesn't need to be a problem, but it depends how we use the WebDAV data model (what mappings we make to the Chandler data model). For example, if we use WebDAV collections to model item collections, then there should be no problem with many users adding and removing resources from the WebDAV collections -- WebDAV servers already handle that.
-
- [MS]: Yeah, we definitely can't continue storing a collection inside a single attribute. I still haven't seen a great solution for how collections can be stored though, since putting items into webdav collections doesn't quite match our data model (items being in multiple collections); plus we don't just share item collections, we also share related items (clouds).
- [LD]: Change logs: It's really hard to understand why we'd want to focus on fine-grained performance improvements at this point. A program like sitecopy can synchronize 100's of MB of data in a shorter time than we currently do sharing -- we don't know why yet but it isn't through use of change logs, because sitecopy can only download or upload entire file bodies. We may even be in better shape using properties because PROPPATCH effectively does apply a diff to a large set of properties.
-
- [MS]: I had brought this up since we already produce change logs internally and are jumping through additional hoops to not send change logs. Let's rephrase the requirement as "we need to be able to send/fetch only those items/properties that have changed and need to efficiently handle changes to an item collection with 1,000,000 items in it." Agreed?
- [LD]: Transactions: The way to lock two resources which need to be changed together is: [1] LOCK A (if it fails, wait until unlocked and start again at 1), [2] LOCK B (if it fails, UNLOCK A, wait until B is unlocked, go back to 1), [3[ Do your thing and unlock both. It's not ideal but it works even with multiple clients. UNLOCK is key to avoid deadlocks. Many transaction systems work much this way anyway.
-
- [MS]: If there are more than a couple clients trying to modify multiple overlapping items I think this will not scale. For the case where there are few authors modifying single items at a time, this would work, but I think our model is different, many authors and many items.
- [LD]: Drawbacks of "smart"/aware servers: remember, the more work the server does, the more policy it has, and the more power the administrator has. Likewise this reduces the ability of the client to make choices, innovate, and control "its" data. My principle here is to make servers as dumb and general as they can be and only make the server know details of application semantics when the alternative is truly much worse.
-
- [MS]: Drawback, or feature? Do you trust the client not to maliciously hold onto the lock of a key resource? And who is going to enforce constraints such as how many students can be added to a class roster? The client?
It seems like there should be an application server (tomcat, etc.) running an application which manages/enforces these sorts of things.