homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Scooter
Not Good
wilderness




msg:402087
 10:06 am on Jul 6, 2003 (gmt 0)

216.39.51.5 - - [06/Jul/2003:00:29:00 -0700] "GET /robots.txt HTTP/1.0" 200 2390 "-" "Scooter/3.3.vscooter"
216.39.51.5 - - [06/Jul/2003:00:29:01 -0700] "GET /deniedFolder/DeniedSubFolder/denied.jpg HTTP/1.0" 200 27743 "mypage.html" "Scooter/3.3.vscooter"

 

killroy




msg:402088
 10:15 am on Jul 6, 2003 (gmt 0)

Please change the title to "Scooter seems toviolate robots.txt", thanks.

Would be a lot more usefull.

SN

mack




msg:402089
 10:57 am on Jul 6, 2003 (gmt 0)

Scooter seams to respect robots.txt in my sites. Unusual for it to disrespect. Not being funny but are you sure you used the correct syntax in your robots.txt file?

Mack.

Romeo




msg:402090
 3:40 pm on Jul 6, 2003 (gmt 0)

I see lot of ...sv.av.com scooter bots activities in my logs; they are very decent bots and always respecting my robots.txt so far.

Regards,
R.

wilderness




msg:402091
 3:57 pm on Jul 6, 2003 (gmt 0)

are you sure you used the correct syntax in your robots.txt file?

syntax, /deniedFolder/DeniedSubFolder/

Actually mack, your sort of half-correct?
My robots conatins a deny for "/deniedFolder."
There is no mention of "/DeniedSubFolder" which is part of the aforementioned deny.

So apprently that makes everything in sub-folders fair game according to robots?

tschild




msg:402092
 5:54 pm on Jul 6, 2003 (gmt 0)

Subfolders should be covered by a Disallow directive.

Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.

Rumbas




msg:402093
 6:08 pm on Jul 6, 2003 (gmt 0)

SearchEngineWorld Robots.txt Validator [searchengineworld.com]

chiyo




msg:402094
 6:14 pm on Jul 6, 2003 (gmt 0)

I'm pretty sure the robots.txt is to enable web site owners to advise which folders not to index, not the folders not to be spidered. So you can respect robots.txt even if you do spider such folders, as long as you dont index them.

if you are worried about bandwidth, there are better ways to solve that problem, but i wouldnt classify scotter as disrespectful bot just because it had a peek!

mack




msg:402095
 7:31 pm on Jul 6, 2003 (gmt 0)

wilderness hope you didnt think I was being funny when I made that suggestion. It is the same thing that hapened to me a few months ago. Very easily done.

Mack.

wilderness




msg:402096
 8:21 pm on Jul 6, 2003 (gmt 0)

i wouldnt classify scotter as disrespectful bot just because it had a peek!

chiyo
IMO respect or even courtesy hasn't a thing to do with it.
Personally and considering the precautions I have taken (by assigning the majority of my "image files" numerals rather than names, I consider any reading of images by a bot as an intrusion.
Whether the SE's or IP's regard it as such or not is not imperative to me.
It does present me with an uregncy to make a correction in my htaccess to prevent both an expanded intrusion and future intrusions.

Don

wilderness




msg:402097
 8:43 pm on Jul 6, 2003 (gmt 0)

Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.


SearchEngineWorld Robots.txt Validator

113 Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
user-agent: szukacz
114 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
disallow: /
131 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. (eg: User-agent)
User-Agent: Whizbang

reagrding lines 113 & 114?
szukacz honors the request to disallow and yet simple syntax error should make the entire file invalid and allow Scooter to override?

(edited by wilderness 07/06/03 17:00 EST)
I might add that this image Scooter grabbed doesn't even show on the page it linked from. Rather the page has a thumbnail which links to this larger image.

[edited by: wilderness at 8:56 pm (utc) on July 6, 2003]

wilderness




msg:402098
 8:48 pm on Jul 6, 2003 (gmt 0)

wilderness hope you didnt think I was being funny

Mack no harm or even vengence on puns :-)

Aside from that massive misbehaviour when Scooter 1.0 reactivated early in 2003, I'm not sure I can recall a Scooter disregard?
Although it's entirely possible and it has just slipped my memory?

Don

WarmGlow




msg:402099
 8:55 pm on Jul 6, 2003 (gmt 0)

The vscooter robot is AltaVista's image thief. It requests image files which are used by AltaVista to create and archive thumbnail images. The vscooter robot does not obey the robots.txt exclusion standard. Both Scooter and vscooter are denied access to my image files by .htaccess directives because of copyright violations and disregard for my robots.txt denied directories.

# enable Apache mod_rewrite 
RewriteEngine on
# deny access to JPEG, GIF and png files from known harvesters and
# external referrers except language translators
RewriteCond %{HTTP_USER_AGENT} ^ArribaPacketRat [OR]
RewriteCond %{HTTP_USER_AGENT} ^Digimarc [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mercator-2\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} vscooter [OR]
# exclude requests with empty referrer string from RewriteRule
RewriteCond %{HTTP_REFERER} !^$
# exclude requests by Norton proxy from RewriteRule
RewriteCond %{HTTP_REFERER} !^Blocked\ by\ Norton$
# exclude known language translators from RewriteRule
RewriteCond %{HTTP_REFERER} !fets\.freetranslation\.com
RewriteCond %{HTTP_REFERER} !babel\.altavista\.
RewriteCond %{HTTP_REFERER} !babelfish
RewriteCond %{HTTP_REFERER} !translate
# exclude my domain from RewriteRule
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com [NC]
RewriteRule (.*\.gif$)(.*\.jpe?g$)(.*\.png$) - [NC,F,L]

carfac




msg:402100
 9:05 pm on Jul 6, 2003 (gmt 0)

Chiyo:

>>> So you can respect robots.txt even if you do spider such folders, as long as you dont index them

No, I think you are wrong there.... an deny in robots.txt is NOT an exclusion to index, buit it is OK to peek (there are spiders that do not index at all!) If it is denied in robots.txt, that means do NOT go there at all, not do not index anything there.

Don:

I have also found that scooter itself, the web bot, is usually OK. But they have a problem with their image bot. I had not noted the name change to vscooter yet (so thanks!)... but they did have an older scooter that just went after images, and did not respect robots.txt.

For some reason I do not know, AV does not seem to like my sites. One of my sites has ONLY it's main page in AV, others have none at all. These are sites that have been around since 1995, and place well everywhere else. So I personally could care less about AV at all...

dave

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved