| Welcome to WebmasterWorld Guest from 220.127.116.11 |
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
|Become a Pro Member|
Spider not honoring "robots.txt"
Just found ATW crawling through a spider banned forum. Grabbed the entire forum
18.104.22.168 - - [01/Mar/2004:12:30:03 -0700] "GET /forum/viewforum.php?f=3& HTTP/1.0" 200 18718 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; htXp://www.alltheweb.com/help/webmaster/crawler"
That's odd herb. Are you sure your robots.txt is valid?
Otherwise try disallowing user-agent: slurp
How old is your robots.txt file?
Maybe the bot isn't aware of the new restrictions yet...
|"That's odd herb. Are you sure your robots.txt is valid?" |
The robots file is and has been valid.
Ink has spidered the site ignoring the restricted areas for the past year. This just started two days ago.
Yesterday it was visiting about once an hour and grabbing only the forums.
Received a message from the folks at Yahoo and they are reviewing the situation.
Glad to hear somebody's on it for you herb. Let's know how it went.
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved