New Simple MPM in httpd trunk

October 29th, 2008

Earlier this week, I committed a new ‘Simple’ MPM to Apache httpd trunk.

More info on the mailing lists.

I should write more about the Simple MPM, but ApacheCon New Orleans is  coming up next week, so I want to try to get more of the code in before then, since traditionally coding during the conference never works out, and I’ll try to write up a blog post about why its cool sometime next week during the conference.

chroot in 2.2.10

October 29th, 2008

Apache HTTP Server 2.2.10 was released more than a week ago.

One of the new features I don’t think anyone has mentioned much is that we now have built in support for chroot.  Just add ChrootDir “/srv/my_root”, and all IO in Apache after the initial startup will be inside the chroot.

the economy.

September 24th, 2008

I don’t often write about politics, or anything related to it, on my journal.

However, the SEVEN HUNDRED BILLION DOLLAR buyout deal that Henry Paulson, the Secretary of the United States Treasury, is trying to get done, is insane.

This isn’t Capitalism.

If I didn’t believe in Capitalism, I wouldn’t be working at a startup.

If Joost fails, I don’t expect anyone to help me.  I expect it to hurt. But it is my risk to take, why do giant financial companies, who have done apparently stupid things, get treated differently?

If Joost is successful, I expect rewards, and this is the basis of any capitalistic system,  but these buyouts are creating a system in which stupid companies can do badly — and their employees still get rewarded.

I didn’t live during the Great Depression.  I don’t know how bad it can get. I have lived an ‘easy’ life in modern America.

But I would rather see another depression than this massive buyout.

Whatever is going on, it is not capitalism, and I am sad to see America going in this direction.

Event MPM Updates and mod_dialup

September 20th, 2008

I know, I know, I haven’t written a blog post in months.

Forgive me :-)

Anyways, figured some people might be interested in what I commited last night for the Apache HTTP Server….

First is the ability to ’suspend’ an HTTP Request from a Module.

Normally in Apache, there is a one to one relationship between the running thread, and a client connection that it is servicing.  In the Event MPM previously, we had broken it out so between different HTTP requests in the same TCP connection, aka Keep Alives, could be serviced by different threads.

Now with r697357 in httpd trunk, you can return SUSPENDED from a content handler.  The core  will assume you have taken over handling of the request, and get out of your way.

This is insanely important, because in the long run it lets you write very cool modules, like a full async HTTP Proxy, aka something like Varnish, where you could easily handle tens of thousands of client connections, using only a hundred odd threads, where today you would need a huge amount of RAM, and a single thread or process for every client to do this with Apache.

Secondly, as a demonstration that it works, I wrote mod_dialup. It is a module that sends static content at a bandwidth rate limit, defined by the various old modem standards. So, you can browse your site with a 56k V.92 modem, by adding something like this:

<Location /mysite>

ModemStandard V.92

</Location>

Previously to do bandwidth rate limiting modules would have to block an entire thread, for each client, and insert sleeps to slow the bandwidth down.  Using the new suspend feature, a handler can get callback N milliseconds in the future, and it will be invoked by the Event MPM on a different thread, once the timer hits.  From there the handler can continue to send data to the client.

in spokane

May 18th, 2008

To the other few people in Spokane with the the Internet: I’m in back in Spokane until Sunday May 25th.

Apache HTTP Server development for this week

April 8th, 2008

The last week has seen many new features and improvments made to httpd. Many of them have been accelerated by people at the ApacheCon EU Hackathon this week.

  • mod_session

    On Friday Graham Leggett introduced a series of modules to support generation of sessions from HTTPD. Included is mod_session_crytpo, which encrypts the data using AES. This is the first time ‘form based’ authentication has had real support in the Apache Core.
    [docs: mod_session]
    [thread: Apache support for form authentication]

  • mod_socache

    On Tuesday Joe Orton commited the new Small Object Cache modules, which have been under discussions for a couple months now. The mod_sslsession cache has been changed to use this. Currently suported cache backends are DBM, memcached, and Shared Memory. I expect many other modules will changed to use this cache API as time goes on.
    [svn: ap_socache.h]
    [thread: [PATCH] ap_socache.h & mod_socache_*]

  • If/Else blocks added

    Nick Kew ported the expression parser from mod_includes, and has used this to add If and Else blocks to the core.This provides a viable alternative to mod_rewrite and RewriteCond, and letsyou set any modules configuration values.
    [docs: if]
    [thread: Dynamic configuration for the hackathon?]
    [commit: r644253]

  • Turkish Documentation

    Nilgün Belma Bugüner contributed a complete translation of the Apache HTTP Server documentation in Turkish.
    [docs: Turkish]
    [thread: New Turkish Documents]
    [commit: r645667]

  • Serf Bucket Discussions

    Discussion at the Hackathon covered how Serf Buckets use a “pull” method, for both input and output, unlike the current filter stack in httpd, which is Pull for input filters, but push for output filters. There was general agreement that the expieriment of mod_serf should be expanded up the filter stack.
    [svn: mod_serf.c]

  • Simple MPM created

    Paul Querna started work on a new MPM at the Hackathon. The MPM hopes to run on both Unix and Win32 platforms, and keep the same behavoirs on both.
    [svn: SIMPLE.README]

ApacheCon EU 2008

March 27th, 2008

I will be at ApacheCon EU 2008, in Amsterdam in a week or so.  (April 6-11)

Not giving any talks this year.

I will also be at Joost’s Leiden office the week following.  (April 12-19) 

traveling.

March 27th, 2008

I spent the last week or so in New York City, at the Joost office there.  I kinda forgot to post anything.  This was the first time I had been to NYC, and it was a fun trip.  Not sure I would ever want to live in NYC.

Returning to San Jose, was not so fun.

Yesterday, American Airlines cancelled a couple hundred flights due to problems with their MD-80s.

I was originally going LGA -> ORD -> SJC.

The ORD -> SJC leg got cancelled.

They re-routed me LGA -> DFW -> SJC.

When I landed in DFW, the flight to SJC had been cancelled.

There were no more flights to SJC.

The flights to SFO and OAK were all fully booked.

I luckly got on the top of the standby list to OAK, and got on that flight.

My checked bag however, is somewhere between New York and California, and American Airlines doesn’t know where it is yet. Sigh.

March Madness on Joost

March 20th, 2008

ncaa_featurebox.jpg
Watch all of the March Madness games LIVE on Joost!

Joining Joost

January 18th, 2008

Joost Logo

Monday will be my first day at Joost.

Today is my last day at Bloglines aka Ask.com ak IAC Search and Media aka IAC/Interactive.

It’s been a fun couple years here, and I am very grateful for the great team I helped build at Bloglines, but it is time for me to move on.

in reply to “bloglines sucks”

December 27th, 2007

In reply to Scoble’s post today, “Bloglines Sucks“…..

I will first try to outline the “issue”.

At the bottom of every post on a wordpress.com blog, is a tracker image used for statistics. It includes a rand parameter, which changes every time the feed is fetched over HTTP. The image URL is something like this:

http://stats.wordpress.com/b.gif?host=scobleizer.com&rand=2045631674&blog=3428&post=3957&subd=scobleizer&ref=&feed=1

Because this rand value changes every time we read the feed, we considered the Item ‘Updated‘.

The behavior of the last 40 posts being shown as updated, every time a new post was added was caused by our use of the HTTP ETags and Last-Modified features. Since Wordpress.com returns a 304 Not Modified for most of our crawls, we would only ‘reparse’ the entire feed when a new post was added.

Now, The reason users do not see this problem in Google Reader, is that Google Reader has no concept of an “Updated” item. When a writer edits a blog post later, users in Google Reader would never see the changes. In Bloglines, we have always considered this a feature, showing you the user when a blog post is edited.

In Bloglines you can disable this feature, on a per-feed basis:

In Bloglines Beta, click on the feed, then select Edit. Change the “Updated Items:” to “Ignore”.

In Bloglines Classic, click on the feed, then select edit subscription. Change the “Updated Items:” to “Ignore”.

As far as I can tell, the use of a rand parameter in the Wordpress.com statistics image is a new change, also introduced at the same time the inline comment images were added to feeds.

FeedBurner includes similar statistics, tracking images and comment images, but they do not include a constantly changing image url. This works correctly in Bloglines.

In regards to placing blame, Dana Epp says “Bloglines says it’s not them”. I don’t know who Dana has talked to inside Bloglines. When these type of issues are reported, we generally try to get in touch and investigate with the publisher, and hopefully figure out what is going on together, rather than outright saying its not our fault. It is a bad experience for our users, and we always want to be involved and help fix it.

I first heard about this issue on Friday, December 21st from Matt via email. (also my birthday) I forwarded that email onto our internal Bloglines Engineering Mailing list, but frankly, I didn’t expect anyone to work on the issue on the Friday before Christmas. IAC Search and Media, the parent company of Bloglines and Ask.com, also has a mandatory Holiday Shutdown this week for all employees. No one will be in the office officially until January 2nd, 2008.

Luckily or unlucky, depending on your perspective, I took some time this afternoon away from my family to read my feeds. For now the bug^H^H^Hfeature in Bloglines of showing edited posts has been fixed. I’ve have simply turned it off for all users.

I hope you had a Merry Christmas, and have a Happy New Year.

22->23

December 21st, 2007

Getting older

Really getting tired this year. Since ApacheCon in Atlanta on November 11th, I haven’t been home in San Jose for more than 6 days straight.

Thankfully, I have the next 2 weeks in Spokane at my parent’s house to chill for a bit.

on shedding

November 29th, 2007

Brian McCallister has a new post on a service location technique dubbed “Shredding”.This post started out as a comment on Brian’s site, but it got a little long….

  • Don’t underestimate using load balancers where they make sense.. You don’t need to spend tons of money on a commercial one. 2x 1u pizza boxes with modern CPUs + 1/10GigE running {Free,Open}BSD + CARP + pfsync.
  • For ‘dumb clients’: Just Proxy it. Perlbal does this for LiveJournal infront of their MogileFS boxes. Or look at Dynamo for another example, the ‘dumb’ clients can connects to any nodes, and that nodes proxies to the correct one. Reducing the number of request/response cycles down is important to keep client latency down. Its not so much about the persistent TCP connection, as the send/reply of the data just to find something.
  • For ’smart clients’: I personally prefer a daemon running on each local machine, which uses a multicast/gossip communication with other nodes to keep a local ‘cache’ of where services are located and their status. Every couple seconds, based on the current state of the cluster, it would write it out to blob file on disk. Clients Just slurp up this file to find anything. (You can also do the same thing, but based on a unix daemon socket, but its generally slower.)
  • There is some discussion about RFC issues with 302s and sending a POST to the redirected URL. The larger issue is that almost no HTTP Client Libraries will do this correctly out of the box.

All that said, for the Bloglines FS, we proxy writes to the data storage nodes, but that is mostly to ensure redundancy of data. For reads, we send back a sorted list of the data nodes that have a chunk to the client. The client then connects directly, and will try the other entries on the list if the first one fails.

See also:

mod_serf in trunk

November 13th, 2007

Now in httpd trunk: mod_serf.  A reverse proxy module that uses Serf for its HTTP Client. Woot.

ipod warning

November 11th, 2007

ipod warning

Don’t steal music. Thank You Apple for the reminder.

I wonder if new IPhones will include wrappers saying ‘Don’t jail break‘.

Well, of course they won’t this is Apple we are talking about, it would be more like:

Don’t jail break

No encarcele la rotura

Setzen Sie nicht Bruch gefangen

壊れ目を拘留してはいけない