Forum Moderators: open
User-agent: *
Disallow: /stats/stop_spider.htm
where stop_spider.htm is the spider trap page, but this morning I got this notification:
IP address: 66.77.73.32
Navigator user agent: Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com;
referrer: [alltheweb.com...]
Assuming this is a real yahoo crawler and not a spoof this seem to indicate that yahoo didn't pay attention to the robots.txt file, which has been up for over a month now.
Last week the yahoo crawler spidered the whole site without triggering the spider trap.
The only thing I can think of that might have gone wrong is that I just switched the site to apache mod_rewrite to convert to search engine friendly urls,
mysite.com/index.htm?section=main&page=5
is now mysite.com/main/page5.htm
through this htaccess:
RewriteRule ^(.*)/overview\.htm /index.htm?section=$1&page=0 [NC,L]
RewriteRule ^(.*)/page(.*)\.htm /index.htm?section=$1&page=$2 [NC,L]
RewriteRule ^(.*)/$ /index.htm?section=$1 [NC]
but this shouldn't make a difference that I see. Any ideas, or has anyone else seen this, or is this a fake yahoo spider?