I will be at ApacheCon EU 2008, in Amsterdam in a week or so. (April 6-11)
Not giving any talks this year.
I will also be at Joost’s Leiden office the week following. (April 12-19)
I will be at ApacheCon EU 2008, in Amsterdam in a week or so. (April 6-11)
Not giving any talks this year.
I will also be at Joost’s Leiden office the week following. (April 12-19)
I spent the last week or so in New York City, at the Joost office there. I kinda forgot to post anything. This was the first time I had been to NYC, and it was a fun trip. Not sure I would ever want to live in NYC.
Returning to San Jose, was not so fun.
Yesterday, American Airlines cancelled a couple hundred flights due to problems with their MD-80s.
I was originally going LGA -> ORD -> SJC.
The ORD -> SJC leg got cancelled.
They re-routed me LGA -> DFW -> SJC.
When I landed in DFW, the flight to SJC had been cancelled.
There were no more flights to SJC.
The flights to SFO and OAK were all fully booked.
I luckly got on the top of the standby list to OAK, and got on that flight.
My checked bag however, is somewhere between New York and California, and American Airlines doesn’t know where it is yet. Sigh.
Monday will be my first day at Joost.
Today is my last day at Bloglines aka Ask.com ak IAC Search and Media aka IAC/Interactive.
It’s been a fun couple years here, and I am very grateful for the great team I helped build at Bloglines, but it is time for me to move on.
In reply to Scoble’s post today, “Bloglines Sucks“…..
I will first try to outline the “issue”.
At the bottom of every post on a wordpress.com blog, is a tracker image used for statistics. It includes a rand parameter, which changes every time the feed is fetched over HTTP. The image URL is something like this:
http://stats.wordpress.com/b.gif?host=scobleizer.com&rand=2045631674&blog=3428&post=3957&subd=scobleizer&ref=&feed=1
Because this rand value changes every time we read the feed, we considered the Item ‘Updated‘.
The behavior of the last 40 posts being shown as updated, every time a new post was added was caused by our use of the HTTP ETags and Last-Modified features. Since Wordpress.com returns a 304 Not Modified for most of our crawls, we would only ‘reparse’ the entire feed when a new post was added.
Now, The reason users do not see this problem in Google Reader, is that Google Reader has no concept of an “Updated” item. When a writer edits a blog post later, users in Google Reader would never see the changes. In Bloglines, we have always considered this a feature, showing you the user when a blog post is edited.
In Bloglines you can disable this feature, on a per-feed basis:
In Bloglines Beta, click on the feed, then select Edit. Change the “Updated Items:” to “Ignore”.
In Bloglines Classic, click on the feed, then select edit subscription. Change the “Updated Items:” to “Ignore”.
As far as I can tell, the use of a rand parameter in the Wordpress.com statistics image is a new change, also introduced at the same time the inline comment images were added to feeds.
FeedBurner includes similar statistics, tracking images and comment images, but they do not include a constantly changing image url. This works correctly in Bloglines.
In regards to placing blame, Dana Epp says “Bloglines says it’s not them”. I don’t know who Dana has talked to inside Bloglines. When these type of issues are reported, we generally try to get in touch and investigate with the publisher, and hopefully figure out what is going on together, rather than outright saying its not our fault. It is a bad experience for our users, and we always want to be involved and help fix it.
I first heard about this issue on Friday, December 21st from Matt via email. (also my birthday) I forwarded that email onto our internal Bloglines Engineering Mailing list, but frankly, I didn’t expect anyone to work on the issue on the Friday before Christmas. IAC Search and Media, the parent company of Bloglines and Ask.com, also has a mandatory Holiday Shutdown this week for all employees. No one will be in the office officially until January 2nd, 2008.
Luckily or unlucky, depending on your perspective, I took some time this afternoon away from my family to read my feeds. For now the bug^H^H^Hfeature in Bloglines of showing edited posts has been fixed. I’ve have simply turned it off for all users.
I hope you had a Merry Christmas, and have a Happy New Year.
Getting older…
Really getting tired this year. Since ApacheCon in Atlanta on November 11th, I haven’t been home in San Jose for more than 6 days straight.
Thankfully, I have the next 2 weeks in Spokane at my parent’s house to chill for a bit.
Brian McCallister has a new post on a service location technique dubbed “Shredding”.This post started out as a comment on Brian’s site, but it got a little long….
All that said, for the Bloglines FS, we proxy writes to the data storage nodes, but that is mostly to ensure redundancy of data. For reads, we send back a sorted list of the data nodes that have a chunk to the client. The client then connects directly, and will try the other entries on the list if the first one fails.
See also:
Now in httpd trunk: mod_serf. A reverse proxy module that uses Serf for its HTTP Client. Woot.
Don’t steal music. Thank You Apple for the reminder.
I wonder if new IPhones will include wrappers saying ‘Don’t jail break‘.
Well, of course they won’t this is Apple we are talking about, it would be more like:
Don’t jail break
No encarcele la rotura
Setzen Sie nicht Bruch gefangen
壊れ目を拘留してはいけない
Do you have a custom logging module?
Ever wondered how long it took to actually finish logging?
At $work I was helping with some problems with Apache, and we wanted to know how long until an Apache Worker Process actually goes back into the Accept Queue.
So mod_timer is born. It hooks into Apache when the connection is accepted, before we start reading any data, and the timer ends when the connection memory pool is destroyed. It also performs the same measurements on requests inside the connection.
It produces a log file like this:
r:127.0.0.1:51886:1193364411078414:1568
r:127.0.0.1:51886:1193364411077069:3034
r:127.0.0.1:51886:1193364411080117:21150
r:127.0.0.1:51886:1193364411101293:99477
r:127.0.0.1:51886:1193364411200792:1856762
r:127.0.0.1:51886:1193364413057577:7000364
c:127.0.0.1:51886:1193364411077016:8980955
r:127.0.0.1:51887:1193364427909070:96608
r:127.0.0.1:51887:1193364428006034:2031335
r:127.0.0.1:51887:1193364430037392:1086699
r:127.0.0.1:51887:1193364431124508:916482
r:127.0.0.1:51887:1193364432041014:5315190
c:127.0.0.1:51887:1193364427909020:9447211
Log Fields:
Using this, it becomes easier to look for ‘evil’ clients that are doing things like sending one byte of a GET request a second.
We thought gltail sounded pretty cool. So we hooked it into bloglines.com:
(Screenshot is clipped to protect user data)
It worked okay for 1 webserver. But hooking it up to the entire cluster, it was just a little bit too slow — drawing a new frame once every 8 seconds. Time to port it to C
Slides from my GeekSessions 1.2 Presentation
Video from the presentation and panel are supposed to show up here at some point soon.
The End of an Architectural Era (It’s Time for a Complete Rewrite)
[Via Wesley Felter]
With CouchDB gaining traction, and the recent paper on Dynamo, it feels like people everywhere are dropping their relational databases. Are the database vendors going to figure this out, and change their products in time to matter?
If you care about distributed systems, you need to read the paper about Amazon’s Dynamo.
Comments:
More discussion over Sam Ruby’s Blog: Key + Data.
This all ties in nicely with the GeekSessions 1.2 topic of “Designing beyond the database”, where I presented last night.
The Amazon’s MP3 store is almost perfect.
The downloader works great on OSX.
The music is DRM free.
It’s fast. (Faster than ITunes Music Store!)
Only downside so far is the lack of good recommendations. But I trust Amazon to fix that with a little time and data.
Bought 5 Albums so far.
Thank you Amazon for making an awesome Music Store. It’s about time someone did it right.