Forum Moderators: martinibuster
[For those in my former position: The idea is, send a Last-Modified Response header with the page, and the bot will send a If-Modified-Since Request header on a subsequent hit, which allows your server to send a 304 Not Modified page if it hasn't changed, saving vast amounts of bandwidth. Do it today!]
It has worked very well.
Not only the G_Bot (and M_Bot) implements it; so does Inktomi Slurp! and even the Baiduspider, but not msnbot, ia_archiver, BecomeBot, NutchCVS, SurveyBot, etc. (boo!). It is very surprising, then, to see the G-Media-Bot lumped in with these 3rd-rank spiders.
Also, this wretched bot has now taken 9,575 pages in Sept (humans have taken only 76,677 pages so far), which is 1 page for every 8 requested.
Here is a handful of examples from the most recent page of my access-log:
66.249.66.48 - - [12/Sep/2005:14:44:52 +0100] "GET /mfcs.php?mid=116&nid=7490 HTTP/1.1" 200 7084 "-" "Mediapartners-Google/2.1" In:28306 Out:7084:25pct.
66.249.66.48 - - [12/Sep/2005:14:44:56 +0100] "GET /mfcs.php?mid=116&nid=7490 HTTP/1.1" 200 7084 "-" "Mediapartners-Google/2.1" In:28306 Out:7084:25pct.
66.249.66.48 - - [12/Sep/2005:14:49:19 +0100] "GET /mfcs.php?mid=276 HTTP/1.1" 200 7849 "-" "Mediapartners-Google/2.1" In:36353 Out:7849:22pct.
66.249.66.48 - - [12/Sep/2005:14:49:22 +0100] "GET /mfcs.php?mid=276 HTTP/1.1" 200 7849 "-" "Mediapartners-Google/2.1" In:36353 Out:7849:22pct.
66.249.66.48 - - [12/Sep/2005:15:02:13 +0100] "GET /search.php?id=PCI%5CAZT_4002 HTTP/1.1" 200 7224 "-" "Mediapartners-Google/2.1" In:28373 Out:7224:25pct.
66.249.66.48 - - [12/Sep/2005:15:02:17 +0100] "GET /search.php?id=PCI%5CAZT_4002 HTTP/1.1" 200 7224 "-" "Mediapartners-Google/2.1" In:28373 Out:7224:25pct.
66.249.66.48 - - [12/Sep/2005:15:06:48 +0100] "GET /mfcs.php?mid=6&nid=14451 HTTP/1.1" 200 7348 "-" "Mediapartners-Google/2.1" In:29577 Out:7348:25pct.
66.249.66.48 - - [12/Sep/2005:15:06:52 +0100] "GET /mfcs.php?mid=6&nid=14451 HTTP/1.1" 200 7348 "-" "Mediapartners-Google/2.1" In:29577 Out:7348:25pct.
FWIW the "Mozillabot" ... doesn't seem to implement If-Modified-Since either
The M_Bot is a particular bugbear of mine. It was hitting my site at upto 3 times/sec [webmasterworld.com] and, when I asked them to slow it down, they virtually switched it off.
For me the MediaBot seems to come and respider a page right after each vistor to that page, mainly because the detailed ad layout changes each time IMHO
Thus the Mediabot probably does not and can not care about IMS, but only whether it believes the page content or layout has changed and needs to be respidered, and you could be wrong (or fibbing) in IMS for cloaking or other black-hat reasons.
All my guesses of course.
For the record I treat the MediaBot just like every other bot to limit bandwidth automatically, and it seems to do me no particular harm.
Rgds
Damon