Do they mean traffic spikes from the many requests for the RSS feed or traffic spikes from people clicking on the links to their site in their newsreader/website?
I bet it's due to RSS Aggregrators hitting RSS servers every hour on the hour to refresh content. Must be certain software that goes after content at a set time instead of a random hourly interval.
I thnink it has to do with RSS readers that check for updates every hour.
You'd think hourly hits might be somewhat random due to different settings, clock times, etc. Apparently, the common software must default to :00.
I think this is an encouraging sign for those who hope to have RSS feeds become more widely adopted. I've got feeds on a few sites, and I've yet to see anything resembling a DDOS attack. I wish... ;)
well most RSS collectors are servers and not Joe's pc. Most of these servers have atom time or somthing similar installed to set the clock to world time.
With a RSS check on the hour that will generate a nice spike.
Actually there are alot of users who are very comfortable with RSS for personal use. Many come from a blogging background so they are familar with the format.
With broadband being "always on", they typically will have programs running 24/7 that collect headlines for them. And they typically collect all at the same time, syncronized, on the hour.
Lots of "valid" RSS bots out there too, including MoreOver, News4Sites, Feedster, Technorati, Waypath, Daypop, k-collector, Blogdex, Popdex, etc.
I think part of the problem is that most RSS programs (and bots) do not check the proven technologies for regular web pages like eTags and cache control. They just suck down the whole feed no matter what.
Thats why we only publish 5 headlines on our feeds. Plus gzip it for clients how are using it.
|Thats why we only publish 5 headlines on our feeds. Plus gzip it for clients how are using it. |
I do the same thing - limit it to five. I've also started testing something. I randomly throw in a 'headline' that points to an 'ad' of sorts, explaining they're getting my content (or at least using my services/bandwidth) and if they want, they can help defray the cost.
Not to get too far OT, but anyone else playing around with RSS ads?
One example from my logs. Both have around 1300-1500 hits to my RSS feed this month. (I have a small non-commercial site)
User A consumed 344 KB
User B consumed 11,4 MB
User A's RSS client is checking if the feed has been updated 4 times per hour 24/7. Nothing wrong with that really. IP points to a normal cable internet user.
User B's RSS client downloads the whole feed 6 times per hour almost 24/7. IP points to some "data mining" company. Hmmm...
Guess who got the boot? (IP banned).
Other thing I don't like are RSS clients with no identifications ("GET /rss.xml HTTP/1.1" 200 8224 "-" "-"). Thinking of banning them too.
The RSS issues use issues will only continue to increase as this technology is just now coming out of it's early adopter phase and is a long way from reaching critical mass.
Providing well formed and valid rss with set TTL's will help. It's surprising at how many feeds that are submitted to AHN are simply poorly formed and don't contain a stated TTL.
Using feed compression techniques will also go a long way to reducing your bandwidth use.
RSS actually has a spec for saying how often it's updated. Not all readers obey it though. They should be also looking at etags and last-modified.
I already ban anyone without both a user agent and referer, it's a good practice.
I wonder how widespread gzip support is in readers though. I would want to be very sure it's okay to do that before I started compression with mod_gzip, etc. Even robust feeds tend to be under 25k, but I bet the code is very compressable.
Testing one of my feeds:
original 13302 bytes
(mod)Gziped 4562 bytes
65% compression is nice, but not worth it yet perhaps.
An Infoworld columnist has done a follow up on the RSS story and they got feedback that pretty much matched our conclusions here: