Archive for October, 2007

new software: mod_timer

Thursday, October 25th, 2007

Do you have a custom logging module?

Ever wondered how long it took to actually finish logging?

At $work I was helping with some problems with Apache, and we wanted to know how long until an Apache Worker Process actually goes back into the Accept Queue.

So mod_timer is born. It hooks into Apache when the connection is accepted, before we start reading any data, and the timer ends when the connection memory pool is destroyed. It also performs the same measurements on requests inside the connection.

It produces a log file like this:

r:127.0.0.1:51886:1193364411078414:1568
r:127.0.0.1:51886:1193364411077069:3034
r:127.0.0.1:51886:1193364411080117:21150
r:127.0.0.1:51886:1193364411101293:99477
r:127.0.0.1:51886:1193364411200792:1856762
r:127.0.0.1:51886:1193364413057577:7000364
c:127.0.0.1:51886:1193364411077016:8980955
r:127.0.0.1:51887:1193364427909070:96608
r:127.0.0.1:51887:1193364428006034:2031335
r:127.0.0.1:51887:1193364430037392:1086699
r:127.0.0.1:51887:1193364431124508:916482
r:127.0.0.1:51887:1193364432041014:5315190
c:127.0.0.1:51887:1193364427909020:9447211

Log Fields:

  • ‘r’ or ‘c’ represents if this is a request or connection being logged.
  • Remote IP Address
  • Remote Port
  • Start time, in apr_time_t (64bit int time since 1970 in microseconds)
  • Run time in microseconds

Using this, it becomes easier to look for ‘evil’ clients that are doing things like sending one byte of a GET request a second.

gltail

Monday, October 8th, 2007

We thought gltail sounded pretty cool. So we hooked it into bloglines.com:gltail screenshot

(Screenshot is clipped to protect user data)

It worked okay for 1 webserver. But hooking it up to the entire cluster, it was just a little bit too slow — drawing a new frame once every 8 seconds. Time to port it to C :-)

geeksessions presentation

Thursday, October 4th, 2007

Slides from my GeekSessions 1.2 Presentation

Video from the presentation and panel are supposed to show up here at some point soon.

goodbye RDBMs

Thursday, October 4th, 2007

The End of an Architectural Era (It’s Time for a Complete Rewrite)
[Via Wesley Felter]

With CouchDB gaining traction, and the recent paper on Dynamo, it feels like people everywhere are dropping their relational databases.  Are the database vendors going to figure this out, and change their products in time to matter?

Dynamo

Wednesday, October 3rd, 2007

If you care about distributed systems, you need to read the paper about Amazon’s Dynamo.

Comments:

  • Making node joining/leaving an administrative command is not something most academics consider, but it significantly reduces complexity.  We made a similar decision with the PodServer system for Bloglines.  I believe this is the right decision, since a node changing membership on the long term is a rare event.  Even with our growing blog index, we only add new nodes once every 6 months or so. (Plan ahead :-) )
  • Shout out to BerkeleyDB.  Glad to see other people pushing it hard. Combined with the older white-paper about Google using BerkeleyDB for their Google Accounts system, it just validates my positive feelings on continuing to use it as a core part of the Bloglines architecture.
  • The configurability of N/R/W is a great idea.  Most systems make N configurable, but skimp out on giving full flexibility to the people using the system.
  • I’m convinced I need to read more about Vector Clocks.  For the Bloglines PodServer, we are blessed with only have a single writer per record due to how our crawlers work, so we just ‘cheat’ on versioning, but this has caused us pain a few times.
  • I wish Amazon would Open Source Dynamo. I can understand the difficulties in doing that, but its a nice thing to dream about.
  • I think I will propose an Apache Labs project to start something like Dynamo.  For a basic key/value storage system on a constant hashing ring, without all of the High Availability concerns, you could get something working pretty quickly.   Adding all of the high end features could take time of course…..

More discussion over Sam Ruby’s Blog: Key + Data.

This all ties in nicely with the GeekSessions 1.2 topic  of “Designing beyond the database”, where I presented last night.