discussion with John
Similarity of indexing on client and server
John's major points:
- that the indexing must be the same on client and server
- that the same work we're doing to make local indexing fast based on our semantics, will have to be redone on the server
- if we didn't have to redo the indexing work on the server, we'd have less of a server project scope
Note that the 3rd point depends on the 2nd depends on the 1st... Also:
- there exist tools (sleepycat) to make the server and client work the same way
Debatable points
- the indexing shouldn't be the same on client and server because they're responding to different requests (API requests vs. protocol requests have different characteristics, different requirements, different frequencies)
- the work on local indexes will not need to be redone on the server:
- see first point -- the work would be different anyway
- servers do have indexes already that mostly work for arbitrary data, even though the server doesn't know which properties are important to Chandler
- even if the server searches are slow, who cares because the search is dwarfed by the round-trip time
- server project scope lies mostly in work other than index tweaking, so saving work there, even if possible, is a minor savings -- most of the savings lie in taking a server that already works and scales to many users and many pieces of hardware (that's where the big work is)
Source of Truth
We discussed whether it's the client, a set of clients, or the server that is the "Source of truth" (or some combination) in a WebDAV-based sharing architecture. I argued that we can choose for ourselves regardless of whether we use WebDAV. Some protocols may foster one "source of truth" but WebDAV doesn't particularly. Rather than "source of truth" I prefer to use phrases like "authoritative" and "high-fidelity".
Some analogies to understand:
- POP3 fosters an architecture where the client has a higher-fidelity view of the user's data. Whenever the client moves an email to a folder it must be a local folder, and the server no longer has a copy of that email in the correct location. It would be difficult for a client to make the server copy of the data be complete, authoritative and high-fidelity. POP3 servers are very easy to write.
- IMAP fosters an architecture where the server is the source of truth. However, some clients download IMAP mail to local folders which aren't replicated to the server, so often this is more of a mixed model. Also note that a client can shift to a more client-authoritative model because the client can choose to overwrite server data with it's own data. It's just that IMAP tends to work better when clients overwrite their client copy with the server data when there's a conflict.
WebDAV is more neutral than either POP3 or IMAP. A client like 'sitecopy' actually has two different modes-- one where it overwrites the server data with client data, and one where it does the opposite. The client makes commands or requests of the server, so the client is in control and can choose the replication and conflict-resolution model.
later thoughts
- The idea that you can brush local/remote repository differences (and the latency issues of remote repositories) under the rug, and not have performance issues, is wishful thinking.
- An API can be designed for networking and be (probably) harder to use but higher performance, or the API can be designed for local data and be easier to use but slow. I would prefer to design for networking and make the API as good as it can be while still making the right granularity (and roundtrip) choices for the network protocol chosen.
- The network protocol is going to be more locked in than anything else in the system -- more so than the development APIs, or the client indexes or server indexes. So it's an important choice and drives other choices, particularly API design which is far more of an open field.
--
LisaDusseault - 29 Oct 2004