If you care about distributed systems, you need to read the paper about Amazon's Dynamo.

Comments:

  • Making node joining/leaving an administrative command is not something most academics consider, but it significantly reduces complexity.  We made a similar decision with the PodServer system for Bloglines.  I believe this is the right decision, since a node changing membership on the long term is a rare event.  Even with our growing blog index, we only add new nodes once every 6 months or so. (Plan ahead :-) )

  • Shout out to BerkeleyDB.  Glad to see other people pushing it hard. Combined with the older white-paper about Google using BerkeleyDB for their Google Accounts system, it just validates my positive feelings on continuing to use it as a core part of the Bloglines architecture.

  • The configurability of N/R/W is a great idea.  Most systems make N configurable, but skimp out on giving full flexibility to the people using the system.

  • I'm convinced I need to read more about Vector Clocks.  For the Bloglines PodServer, we are blessed with only have a single writer per record due to how our crawlers work, so we just 'cheat' on versioning, but this has caused us pain a few times.

  • I wish Amazon would Open Source Dynamo. I can understand the difficulties in doing that, but its a nice thing to dream about.

  • I think I will propose an Apache Labs project to start something like Dynamo.  For a basic key/value storage system on a constant hashing ring, without all of the High Availability concerns, you could get something working pretty quickly.   Adding all of the high end features could take time of course.....

More discussion over Sam Ruby's Blog: Key + Data.

This all ties in nicely with the GeekSessions 1.2 topic  of "Designing beyond the database", where I presented last night.




Share