Welcome to WebmasterWorld Guest from 54.224.230.193

Forum Moderators: open

Message Too Old, No Replies

Yahoo/Overture/Alltheweb

Spider not honoring "robots.txt"

     
8:26 pm on Mar 1, 2004 (gmt 0)

Full Member from US 

10+ Year Member

joined:July 12, 2000
posts:323
votes: 4


Just found ATW crawling through a spider banned forum. Grabbed the entire forum

User-agent: *
Disallow: /cgi-bin
Disallow: /images
Disallow: /forum

66.77.73.32 - - [01/Mar/2004:12:30:03 -0700] "GET /forum/viewforum.php?f=3& HTTP/1.0" 200 18718 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; htXp://www.alltheweb.com/help/webmaster/crawler"

8:11 am on Mar 2, 2004 (gmt 0)

Moderator from DK 

WebmasterWorld Administrator 10+ Year Member

joined:Oct 23, 2000
posts:2536
votes: 2


That's odd herb. Are you sure your robots.txt is valid?

[searchengineworld.com...]

Otherwise try disallowing user-agent: slurp

8:18 am on Mar 2, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 11, 2003
posts:442
votes: 0


How old is your robots.txt file?

Maybe the bot isn't aware of the new restrictions yet...

1:33 pm on Mar 2, 2004 (gmt 0)

Full Member from US 

10+ Year Member

joined:July 12, 2000
posts:323
votes: 4


"That's odd herb. Are you sure your robots.txt is valid?"

The robots file is and has been valid.

Ink has spidered the site ignoring the restricted areas for the past year. This just started two days ago.

Yesterday it was visiting about once an hour and grabbing only the forums.

Received a message from the folks at Yahoo and they are reviewing the situation.

8:46 pm on Mar 2, 2004 (gmt 0)

Moderator from DK 

WebmasterWorld Administrator 10+ Year Member

joined:Oct 23, 2000
posts:2536
votes: 2


Glad to hear somebody's on it for you herb. Let's know how it went.