I've been exploring, via beginner's Python, how one might construct a HTTP, WebDAV and CalDAV client library that was more than just a trivial set of helper functions, and one that made sense from a protocol model point of view. This page is my attempt to pull together these thoughts.
In this document, we have three entities to describe: the server, the client library (my prototype was called 'zanshin') and the application using the library. To avoid confusion between the two parts on the client I'll call these server, client and application.
The basic model of many protocol libraries is to instantiate, open and use a Connection object. This is not what HTTP does and the library shouldn't encourage the application to behave as if there is a connection.
Besides being actively misleading, the Connection model is just not helpful. The most useful object to model to help the application is to give it an object for each resource it addresses. Then the application can address the resource directly, query it for information or perform operations on it.
Besides resource objects, here's what else might be useful:
- An object representing the server to determine if the server is there, if login succeeds or fails, possibly to cache/learn what namespaces the server offers for features like principal access or timezones
Abstraction of interoperability Issues
Features to hide
: The client should completely abstract the concept of KeepAlive?
. If the client attempts to keep a connection alive, and the server can or cannot do so, or fails to keep the connection alive at some point, none of this should ever reach the attention of the Application (as long as the connection can be restored, that is).
Chunking and Transfer-encoding
: The client should abstract this and do the sensible thing.
I think both of these might be potentially interesting to the application. For example, if I am choosing a WebDAV server, I might want to choose it based on if it supports these features. The application should be able to query if the server supports these, and the application should be able to say in initialization phase that it will onlt accept servers that support these. It should also be able to toggle requirements on and off. But still, if the application does not care about these, it should not be forced to deal with this complexity - there we agree. -- HeikkiToivonen
- 16 Mar 2005
Can the client abstract the support, or lack of support, for ETags? If the server does not support ETags, can the client supply a replacement that works adequately?
Even if that's possible, there's still a lot of implementation dependencies around ETags. For example:
- If the server supports weak ETags, the client should know that and ask ASAP for a strong ETag. A weak ETag is useless in collaboration scenarios.
- If the server does return a strong ETag in response to PUT requests, that's great and the client should cache that. If not, the client should do a HEAD or PROPFIND to ask for the ETag so that it can be cached anyway without the application having to worry about it.
If a client library had a good representation of a resource, it becomes pretty easy to cache information so that the client doesn't have to make so many round-trips on behalf of the application. An application can ask first "Does this resource support WebDAV"? then ask "Does this resource support locking?" and the client can cache the answers to those two questions since they are both answered in the same OPTIONS response.
Some information can be deduced from other information already in the cache. For example, if a resource's parent supports WebDAV, then that resource MUST also support WebDAV. The client can either propagate that information as the cache is filled in or calculate it dynamically, and avoid a round trip.
In order for caching to work well, I propose:
- Eventually we'll do some kind of cache timeout, and toss out decayed information. This might be fast for some information (a resource's lock state or ETag) and slow for other information (whether a resource supports WebDAV, for example) or consistent for simplicity's sake.
- Whenever the client does a PROPFIND the client consistently asks for the most common properties needed in the cache, so that the cache can be updated via piggybacking on any property request.
- It's possible that the client might be used by more than one thread or piece of code in the application. If this is the case, it would be great for the cache to be shared and always valid. Otherwise, a naive approach like 'zanshin' might quickly end up with several identical ServerHandle? objects used by different parts of the application, and each of those with its own cache, which might largely duplicate another cache.
Pipelining is the most difficult feature to support and might not be worth it. I haven't prototyped this kind of thing at all so I have only a rough idea how this might work.
- We'd have to have some kind of queue for Request objects, and some way to notify the application when a Response was available.
- A client method like "createNewChild" would return zero or more Request objects (or put them in the queue). The application calling the method would have to get some kind of call-back handle to know when the new object was successfully created (examine the Response)
- The client might have to examine what's in the queue (already sent and not answered, or not yet sent) to determine whether a new item goes in the queue fully pipelined, or blocked on the prior Request, or blocking the subsequent Request. This is tricky to determine although failure to pipeline appropriately only rarely actually would cause problems.
We should check with projects that have implemented pipelining to get information about the difficulty of implementing them as well as what the wins were. I'm going to ask Mozilla. -- HeikkiToivonen
- 16 Mar 2005
Mark Nottingham pointed me to a library that does pipelining in .NET: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=268
No matter how good a protocol implementation library is, at some point it needs to be extended. Perhaps the application needs to do something complex like create entirely new methods, headers or bodies, when a major extension to HTTP is used. Or perhaps the application only needs to make minor tweaks -- e.g. the ability to add a certain header to certain otherwise-standard requests. The client library must not preclude this, and should not make it too difficult.