Forum Moderators: open

Message Too Old, No Replies

Slurp not respecting robots.txt

pulling down hundreds of files over multiple sites

         

jcoronella

3:34 am on Dec 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Anyone else seing slurp misbehaving? Going to have to ban them another way.

soapystar

1:31 pm on Dec 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



its been mentioned before.

Staffa

6:46 pm on Dec 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Had and occurrence today as well.
Accessed a directory that has been off-limits for ages.

internetheaven

4:21 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep, they've completely ignored robots.txt - this means that they are indexing all the search results pages of my site. (i.e. my internal search program) which makes me think that it is something to do with the Yahoo browser as no-one actually links to any of my internal search results pages.

artdog

2:13 am on Dec 15, 2004 (gmt 0)

10+ Year Member



I've read here that the search engines don't follow java links, correct?

If so, could I open a page with java so Y! doesn't follow it and give me a dup content penalty?

This way I could load a page that's identical almost to a page on my other site.

Am I dreamin?

cyberprosper

12:42 am on Dec 16, 2004 (gmt 0)

10+ Year Member



My web site was hit by "ahoo Seeker" instead of "Yahoo Seeker" today. Perhaps it is a way to check for cloaking... I checked the IP address, and it was definitely coming from Inktomi.

Yahoo often ignores my robots.txt file. I now block Yahoo from certain directories using .htaccess.

walkman

12:51 am on Dec 16, 2004 (gmt 0)



maybe they need 8 billion pages ;)

soapystar

10:45 pm on Dec 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



maybe they need 8 billion pages

yup, you may have something there. That would explain the 6.5 billion template auto-generated yahoo travel pages.

lizardx

8:59 am on Dec 17, 2004 (gmt 0)

10+ Year Member



I've had slurp trigger a bot trap before. Which means its robots.txt handling leaves something to be desired.