Cosmo 0.2 with Derby
As I mentioned in last week's notes, Jackrabbit can persist to an RDBMS instead of directly to the filesystem. Stefan from the Jackrabbit team suggested that i try the
DerbyPersistenceManager, embedding
Derby into the app server.
Also, Cyrus pointed out that I was neglecting to account for the indexes Jackrabbit creates for efficient querying of the repository. That's even more disk overhead.
| After this operation... | DB disk | Index disk | Total workspace disk |
| sign up for account | 1,956kb | 76kb | 2,032kb |
| share 535-event test calendar | 16,120kb | 1,192kb | 17,312kb |
That breaks down to 32kb per event, compared to the 92kb required by
ObjectPersistenceManager (and remember, that number doesn't account for indexes).
So our total disk overhead will be:
32kb/event * 1000 events/calendar = 32,000kb/calendar * 1 calendar/user * 10k users = 320,000,000kb = ~300gb
Accounting for a 15mb user quota, our total disk requirement becomes:
15mb/user * 10k users = 150,000mb = ~150gb quota + 300gb metadata = ~450gb total disk
Compare that to the ~6tb projected last week.
Now obviously the overhead will increase in 0.3 when we have to index calendar properties and parameters to support CalDAV reports. Cyrus estimates that stuff will require 2-3x the size of the raw iCalendar stream itself. The 535-event calendar uses 4,368kb to store the raw event streams, so lets allot 3x that, or 13,104kb, or 24.5kb per event. This brings us to ~540gb of disk overhead, almost twice what we require now, and a total disk requirement of 690gb, or two thirds of what we projected with
ObjectPersistenceManager without the metadata.
In terms of disk, using Derby for storage is a clear win. When we get automated testing we'll be able to evaluate its performance as well.