Slurp not respecting robots.txt

Forum Moderators: open

Message Too Old, No Replies

Slurp not respecting robots.txt

pulling down hundreds of files over multiple sites

jcoronella

3:34 am on Dec 10, 2004 (gmt 0)

Anyone else seing slurp misbehaving? Going to have to ban them another way.

soapystar

1:31 pm on Dec 10, 2004 (gmt 0)

its been mentioned before.

Staffa

6:46 pm on Dec 10, 2004 (gmt 0)

Had and occurrence today as well.
Accessed a directory that has been off-limits for ages.

internetheaven

4:21 pm on Dec 14, 2004 (gmt 0)

Yep, they've completely ignored robots.txt - this means that they are indexing all the search results pages of my site. (i.e. my internal search program) which makes me think that it is something to do with the Yahoo browser as no-one actually links to any of my internal search results pages.

artdog

2:13 am on Dec 15, 2004 (gmt 0)

I've read here that the search engines don't follow java links, correct?

If so, could I open a page with java so Y! doesn't follow it and give me a dup content penalty?

This way I could load a page that's identical almost to a page on my other site.

Am I dreamin?

cyberprosper

12:42 am on Dec 16, 2004 (gmt 0)

My web site was hit by "ahoo Seeker" instead of "Yahoo Seeker" today. Perhaps it is a way to check for cloaking... I checked the IP address, and it was definitely coming from Inktomi.

Yahoo often ignores my robots.txt file. I now block Yahoo from certain directories using .htaccess.

walkman

12:51 am on Dec 16, 2004 (gmt 0)

maybe they need 8 billion pages ;)

soapystar

10:45 pm on Dec 16, 2004 (gmt 0)

maybe they need 8 billion pages

yup, you may have something there. That would explain the 6.5 billion template auto-generated yahoo travel pages.

lizardx

8:59 am on Dec 17, 2004 (gmt 0)

I've had slurp trigger a bot trap before. Which means its robots.txt handling leaves something to be desired.