homepage Welcome to WebmasterWorld Guest from 54.145.209.80
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Scooter 3.3 ignores robots.txt?
this is ridiculous
berli

10+ Year Member



 
Msg#: 23 posted 11:13 pm on Jun 24, 2003 (gmt 0)

Just found Scooter/3.3.vscooter grabbing a bunch of files disallowed to all ("*") in my robots.txt file. The only thing I can think of is that the directories were named things like

img/
foo/images/

and I used a trailing slash. I saw here that apparently some browsers misinterpret that? Removing the slash somewhat bothers me, because what's stopping some *other* stupid robot from deciding that "foo/images" is a text file called images?

Does this mean I should block Scooter from certain directories in .htaccess? Let it chew 403's?

Or is this robots.txt-ignoring bot not really Scooter at all...?

IPs: 216.39.50.114, 216.39.50.13

 

outrun

10+ Year Member



 
Msg#: 23 posted 11:23 pm on Jun 24, 2003 (gmt 0)

There should be a slash at the front
instead of
foo/images/
it should be this
/foo/images/

regards,
Mark

berli

10+ Year Member



 
Msg#: 23 posted 12:39 pm on Jun 25, 2003 (gmt 0)

My mistake when I posted the example. The following is a real line from my robots.txt file:

Disallow: /img/

Other spiders, such as Googlebot and -- get this -- ia_archiver have obeyed this directive.

mcavic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 23 posted 3:09 am on Jun 26, 2003 (gmt 0)

I'd try:
Disallow: /img

That way, if it requests it with or without the ending slash, it should be denied.

stupid robot from deciding that "foo/images" is a text file called images?

As far as robots.txt is concerned, it doesn't matter what images is. It's supposed to just compare the disallow string with the beginning of the url.

Scooter seems to be obeying on my site.

cespedes

10+ Year Member



 
Msg#: 23 posted 7:50 pm on Jun 26, 2003 (gmt 0)

I have EXACTLY the same problem. I have the following robots.txt:

User-agent: *
Disallow: /img

Everybody obeys that except someone which identifies itself as "Scooter/3.3.vscooter" and makes connections from 216.39.50.0/24.

Does anyone know how to solve it and/or notify the culprits?

mcavic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 23 posted 2:39 am on Jun 27, 2003 (gmt 0)

Apparently, vscooter is Altavista's image indexer.

See here: [webmasterworld.com...]

and here: [photodude.com...]

I don't know why it isn't obeying, but maybe try writing to Altavista?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved