Welcome to WebmasterWorld Guest from 54.146.246.4

Forum Moderators: open

Message Too Old, No Replies

Yahoo/Overture/Alltheweb

Spider not honoring "robots.txt"

     

herb

8:26 pm on Mar 1, 2004 (gmt 0)

10+ Year Member



Just found ATW crawling through a spider banned forum. Grabbed the entire forum

User-agent: *
Disallow: /cgi-bin
Disallow: /images
Disallow: /forum

66.77.73.32 - - [01/Mar/2004:12:30:03 -0700] "GET /forum/viewforum.php?f=3& HTTP/1.0" 200 18718 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; htXp://www.alltheweb.com/help/webmaster/crawler"

Rumbas

8:11 am on Mar 2, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



That's odd herb. Are you sure your robots.txt is valid?

[searchengineworld.com...]

Otherwise try disallowing user-agent: slurp

marcs

8:18 am on Mar 2, 2004 (gmt 0)

10+ Year Member



How old is your robots.txt file?

Maybe the bot isn't aware of the new restrictions yet...

herb

1:33 pm on Mar 2, 2004 (gmt 0)

10+ Year Member



"That's odd herb. Are you sure your robots.txt is valid?"

The robots file is and has been valid.

Ink has spidered the site ignoring the restricted areas for the past year. This just started two days ago.

Yesterday it was visiting about once an hour and grabbing only the forums.

Received a message from the folks at Yahoo and they are reviewing the situation.

Rumbas

8:46 pm on Mar 2, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Glad to hear somebody's on it for you herb. Let's know how it went.