homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / RSS, ATOM, and Related Technologies
Forum Library, Charter, Moderators: bill & werty

RSS, ATOM, and Related Technologies Forum

RSS traffic becoming a concern for larger publishers
almost like a DDoS attack at times

 3:39 pm on Jul 20, 2004 (gmt 0)

I had predicted this to myself about a year ago. It is no wonder why Google is resisting RSS/Atom feeds for web/news searches. Imagine what Yahoo goes though!

Infoworld.com experiences a "massive surge of RSS newsreader activity at the top of every hour," according to Chad Dickerson, the CTO of Infoworld. "If I didnít know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs," Dickerson writes. "Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues."



 5:52 pm on Jul 20, 2004 (gmt 0)

Do they mean traffic spikes from the many requests for the RSS feed or traffic spikes from people clicking on the links to their site in their newsreader/website?


 5:54 pm on Jul 20, 2004 (gmt 0)

I bet it's due to RSS Aggregrators hitting RSS servers every hour on the hour to refresh content. Must be certain software that goes after content at a set time instead of a random hourly interval.


 5:55 pm on Jul 20, 2004 (gmt 0)

I thnink it has to do with RSS readers that check for updates every hour.


 7:34 pm on Jul 20, 2004 (gmt 0)

You'd think hourly hits might be somewhat random due to different settings, clock times, etc. Apparently, the common software must default to :00.

I think this is an encouraging sign for those who hope to have RSS feeds become more widely adopted. I've got feeds on a few sites, and I've yet to see anything resembling a DDOS attack. I wish... ;)


 12:11 am on Jul 21, 2004 (gmt 0)

well most RSS collectors are servers and not Joe's pc. Most of these servers have atom time or somthing similar installed to set the clock to world time.

With a RSS check on the hour that will generate a nice spike.


 3:47 pm on Jul 21, 2004 (gmt 0)

Actually there are alot of users who are very comfortable with RSS for personal use. Many come from a blogging background so they are familar with the format.

With broadband being "always on", they typically will have programs running 24/7 that collect headlines for them. And they typically collect all at the same time, syncronized, on the hour.

Lots of "valid" RSS bots out there too, including MoreOver, News4Sites, Feedster, Technorati, Waypath, Daypop, k-collector, Blogdex, Popdex, etc.

I think part of the problem is that most RSS programs (and bots) do not check the proven technologies for regular web pages like eTags and cache control. They just suck down the whole feed no matter what.


 10:22 am on Jul 22, 2004 (gmt 0)

Thats why we only publish 5 headlines on our feeds. Plus gzip it for clients how are using it.


 7:49 pm on Jul 23, 2004 (gmt 0)

Thats why we only publish 5 headlines on our feeds. Plus gzip it for clients how are using it.

I do the same thing - limit it to five. I've also started testing something. I randomly throw in a 'headline' that points to an 'ad' of sorts, explaining they're getting my content (or at least using my services/bandwidth) and if they want, they can help defray the cost.

Not to get too far OT, but anyone else playing around with RSS ads?



 10:27 am on Jul 28, 2004 (gmt 0)

One example from my logs. Both have around 1300-1500 hits to my RSS feed this month. (I have a small non-commercial site)

User A consumed 344 KB
User B consumed 11,4 MB

User A's RSS client is checking if the feed has been updated 4 times per hour 24/7. Nothing wrong with that really. IP points to a normal cable internet user.

User B's RSS client downloads the whole feed 6 times per hour almost 24/7. IP points to some "data mining" company. Hmmm...

Guess who got the boot? (IP banned).

Other thing I don't like are RSS clients with no identifications ("GET /rss.xml HTTP/1.1" 200 8224 "-" "-"). Thinking of banning them too.


 11:50 am on Jul 28, 2004 (gmt 0)

The RSS issues use issues will only continue to increase as this technology is just now coming out of it's early adopter phase and is a long way from reaching critical mass.

Providing well formed and valid rss with set TTL's will help. It's surprising at how many feeds that are submitted to AHN are simply poorly formed and don't contain a stated TTL.

Using feed compression techniques will also go a long way to reducing your bandwidth use.


 11:59 am on Jul 28, 2004 (gmt 0)

RSS actually has a spec for saying how often it's updated. Not all readers obey it though. They should be also looking at etags and last-modified.

I already ban anyone without both a user agent and referer, it's a good practice.

I wonder how widespread gzip support is in readers though. I would want to be very sure it's okay to do that before I started compression with mod_gzip, etc. Even robust feeds tend to be under 25k, but I bet the code is very compressable.

Testing one of my feeds:
original 13302 bytes
(mod)Gziped 4562 bytes

65% compression is nice, but not worth it yet perhaps.


 9:49 am on Aug 1, 2004 (gmt 0)

An Infoworld columnist has done a follow up on the RSS story and they got feedback that pretty much matched our conclusions here:

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / RSS, ATOM, and Related Technologies
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved