Welcome to WebmasterWorld Guest from 22.214.171.124
Infoworld.com experiences a "massive surge of RSS newsreader activity at the top of every hour," according to Chad Dickerson, the CTO of Infoworld. "If I didnít know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs," Dickerson writes. "Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues."
I think this is an encouraging sign for those who hope to have RSS feeds become more widely adopted. I've got feeds on a few sites, and I've yet to see anything resembling a DDOS attack. I wish... ;)
With broadband being "always on", they typically will have programs running 24/7 that collect headlines for them. And they typically collect all at the same time, syncronized, on the hour.
Lots of "valid" RSS bots out there too, including MoreOver, News4Sites, Feedster, Technorati, Waypath, Daypop, k-collector, Blogdex, Popdex, etc.
I think part of the problem is that most RSS programs (and bots) do not check the proven technologies for regular web pages like eTags and cache control. They just suck down the whole feed no matter what.
Thats why we only publish 5 headlines on our feeds. Plus gzip it for clients how are using it.
I do the same thing - limit it to five. I've also started testing something. I randomly throw in a 'headline' that points to an 'ad' of sorts, explaining they're getting my content (or at least using my services/bandwidth) and if they want, they can help defray the cost.
Not to get too far OT, but anyone else playing around with RSS ads?
User A consumed 344 KB
User B consumed 11,4 MB
User A's RSS client is checking if the feed has been updated 4 times per hour 24/7. Nothing wrong with that really. IP points to a normal cable internet user.
User B's RSS client downloads the whole feed 6 times per hour almost 24/7. IP points to some "data mining" company. Hmmm...
Guess who got the boot? (IP banned).
Other thing I don't like are RSS clients with no identifications ("GET /rss.xml HTTP/1.1" 200 8224 "-" "-"). Thinking of banning them too.
Providing well formed and valid rss with set TTL's will help. It's surprising at how many feeds that are submitted to AHN are simply poorly formed and don't contain a stated TTL.
Using feed compression techniques will also go a long way to reducing your bandwidth use.
I already ban anyone without both a user agent and referer, it's a good practice.
I wonder how widespread gzip support is in readers though. I would want to be very sure it's okay to do that before I started compression with mod_gzip, etc. Even robust feeds tend to be under 25k, but I bet the code is very compressable.
Testing one of my feeds:
original 13302 bytes
(mod)Gziped 4562 bytes
65% compression is nice, but not worth it yet perhaps.