On Blog Pinging (again)

Dear Internet,

When a major ping service has problems, I will remind you that Blog Pings are stupid. Some companies rely upon blog pings for their crawling system. What happens when a ping service is down? Your posts don’t get indexed. Oh Snap.

People have noticed that if you want to get higher rankings in some search engines, you should find everyone who has linked to you, and send Pings for them. Ping-spoofing. It really feels like email-spoofing to me. Maybe SPF could be extended to prevent this……

Anyways, At $work we do not currently accept pings from anyone. Instead we crawl every site in our system, every 30 minutes. Its not that hard, honestly, thanks to memcached and conditional http requests. (Oh, and lots of bandwidth).

Blog Pings have fundamental problems. Until they are both reliable and secure, I hope they will die. (By secure I mean knowing that the person who sent the ping is the person who owns the feed….). FeedTree tries to go down this path. It has many problems, the foremost is using Java. If you want everyone in the world to support your protocol, you need support for everyone still using c, perl, python, php, and ruby. The other problem with FeedTree, is that it makes crypto signing of pings optional. Rule number one of making a spec: Nothing is optional, because anything that is, won’t be implemented by half the software out there.

Have a happy Monday.

-Paul

3 Responses to “On Blog Pinging (again)”

  1. Dan Sandler Says:

    Hey, thanks for noticing our project! A couple of follow-up notes on FeedTree:

    1. Yeah, Java is a pain. Unfortunately, most of the leading academic research into scalable peer-to-peer overlays is done in Java, so that’s the platform FeedTree is built on. (Yes, I know BitTorrent is Python; it’s also really the wrong kind of p2p system for this application.) Fortunately, some of the future work of the FreePastry team includes developing a new runtime-independent binary wire protocol, so one could develop a compatible Pastry implementation in some other language (FeedTree then being layered atop that).
    2. It’s true, we could have made digital signatures mandatory on the part of publishers. We chose not to for a couple of reasons, chief among them adoption. We figured it would be daunting enough to get feed owners to install a persistent daemon, let alone messing around with certificates. Maybe it’s a flimsy reason, but we’re eager to do whatever we can to spur adoption. This is a problem domain (feed updates on a push schedule, fixing the “ping crisis”, keeping the bandwidth manageable) that can really benefit from peer-to-peer techniques.

      [It's also worth noting that sometimes you can't sign things. When there's no authoritative publisher pushing feed content to the FeedTree network (what we call a "conventional" or "legacy" feed), FeedTree subscribers self-organize into a collaborative polling scheme (staggering their requests, and sharing new events with one another). The thing is, none of them is really authoritative, so it's not meaningful for them to sign their content; as such, if you use FeedTree, you'll see that these shared events carry no signature and can't be authenticated.]

    If, in the worst (best?) case, FeedTree should become so popular that spammers start to try to fill it with spam pings, there are a couple of properties of the network that should make this a losing proposition, long-term:

    • If you’re not subscribed to the feed, you don’t see its updates. If a user injects a bunch of bogus updates for a splog nobody cares about, those updates won’t reach any eyeballs. There will be some network overhead if this happens, but it should be limited to a small portion of the system (and, again, no users will see any of the spam pings).
    • If spammers start injecting junk into legit feed channels, it’s pretty easy to say, “OK, for this feed I’ll ignore any unsigned updates.” If FeedTree’s ever so popular that this scenario becomes reality, the publishers of those spammed feeds will have incentive to reach their readers using FeedTree, and will hopefully invest the effort to start pushing signed updates.
  2. Ed Kohler Says:

    I don’t think proactive pinging is anything like email spoofing. Spoofing has two victims: those mislead by the spoofed email, and the person/business who’s email was spoofed.

    Proactive pinging, on the other hand, rewards blogs who’ve been mentioned on other blogs by giving them proper credit for the links, and drives additional traffic to the linker’s previously unpinged blog. Blog search engines also benefit because it adds additional link data into their systems, allowing them to rank sites properly.

  3. Greg Gershman Says:

    “Anyways, At $work we do not currently accept pings from anyone. Instead we crawl every site in our system, every 30 minutes. Its not that hard, honestly, thanks to memcached and conditional http requests. (Oh, and lots of bandwidth).”

    This does help Bloglines maintain a fresh index, but there are various ways that your algorithm could be improved that would save others bandwidth and respect established protocols such as the RSS 2.0 channel-level ttl element. Intelligent spacing of visits, based on historical frequency of posting would also not be a bad thing. For systems with lots of feeds, a deluge of 10 requests per second, twice an hour, on top of normal loads of traffic can be crippling. We’ve had to start sending 304’s to Bloglines during peak hours.

Leave a Reply