Welcome to WebmasterWorld Guest from 54.166.227.36

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Scooter

Not Good

     
10:06 am on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



216.39.51.5 - - [06/Jul/2003:00:29:00 -0700] "GET /robots.txt HTTP/1.0" 200 2390 "-" "Scooter/3.3.vscooter"
216.39.51.5 - - [06/Jul/2003:00:29:01 -0700] "GET /deniedFolder/DeniedSubFolder/denied.jpg HTTP/1.0" 200 27743 "mypage.html" "Scooter/3.3.vscooter"
10:15 am on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please change the title to "Scooter seems toviolate robots.txt", thanks.

Would be a lot more usefull.

SN

10:57 am on Jul 6, 2003 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Scooter seams to respect robots.txt in my sites. Unusual for it to disrespect. Not being funny but are you sure you used the correct syntax in your robots.txt file?

Mack.

3:40 pm on Jul 6, 2003 (gmt 0)

10+ Year Member



I see lot of ...sv.av.com scooter bots activities in my logs; they are very decent bots and always respecting my robots.txt so far.

Regards,
R.

3:57 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



are you sure you used the correct syntax in your robots.txt file?

syntax, /deniedFolder/DeniedSubFolder/

Actually mack, your sort of half-correct?
My robots conatins a deny for "/deniedFolder."
There is no mention of "/DeniedSubFolder" which is part of the aforementioned deny.

So apprently that makes everything in sub-folders fair game according to robots?

5:54 pm on Jul 6, 2003 (gmt 0)

10+ Year Member



Subfolders should be covered by a Disallow directive.

Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.

6:08 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



SearchEngineWorld Robots.txt Validator [searchengineworld.com]
6:14 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'm pretty sure the robots.txt is to enable web site owners to advise which folders not to index, not the folders not to be spidered. So you can respect robots.txt even if you do spider such folders, as long as you dont index them.

if you are worried about bandwidth, there are better ways to solve that problem, but i wouldnt classify scotter as disrespectful bot just because it had a peek!

7:31 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



wilderness hope you didnt think I was being funny when I made that suggestion. It is the same thing that hapened to me a few months ago. Very easily done.

Mack.

8:21 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



i wouldnt classify scotter as disrespectful bot just because it had a peek!

chiyo
IMO respect or even courtesy hasn't a thing to do with it.
Personally and considering the precautions I have taken (by assigning the majority of my "image files" numerals rather than names, I consider any reading of images by a bot as an intrusion.
Whether the SE's or IP's regard it as such or not is not imperative to me.
It does present me with an uregncy to make a correction in my htaccess to prevent both an expanded intrusion and future intrusions.

Don

8:43 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.


SearchEngineWorld Robots.txt Validator

113 Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
user-agent: szukacz
114 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
disallow: /
131 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. (eg: User-agent)
User-Agent: Whizbang

reagrding lines 113 & 114?
szukacz honors the request to disallow and yet simple syntax error should make the entire file invalid and allow Scooter to override?

(edited by wilderness 07/06/03 17:00 EST)
I might add that this image Scooter grabbed doesn't even show on the page it linked from. Rather the page has a thumbnail which links to this larger image.

[edited by: wilderness at 8:56 pm (utc) on July 6, 2003]

8:48 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



wilderness hope you didnt think I was being funny

Mack no harm or even vengence on puns :-)

Aside from that massive misbehaviour when Scooter 1.0 reactivated early in 2003, I'm not sure I can recall a Scooter disregard?
Although it's entirely possible and it has just slipped my memory?

Don

8:55 pm on Jul 6, 2003 (gmt 0)

10+ Year Member



The vscooter robot is AltaVista's image thief. It requests image files which are used by AltaVista to create and archive thumbnail images. The vscooter robot does not obey the robots.txt exclusion standard. Both Scooter and vscooter are denied access to my image files by .htaccess directives because of copyright violations and disregard for my robots.txt denied directories.

# enable Apache mod_rewrite 
RewriteEngine on
# deny access to JPEG, GIF and png files from known harvesters and
# external referrers except language translators
RewriteCond %{HTTP_USER_AGENT} ^ArribaPacketRat [OR]
RewriteCond %{HTTP_USER_AGENT} ^Digimarc [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mercator-2\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} vscooter [OR]
# exclude requests with empty referrer string from RewriteRule
RewriteCond %{HTTP_REFERER} !^$
# exclude requests by Norton proxy from RewriteRule
RewriteCond %{HTTP_REFERER} !^Blocked\ by\ Norton$
# exclude known language translators from RewriteRule
RewriteCond %{HTTP_REFERER} !fets\.freetranslation\.com
RewriteCond %{HTTP_REFERER} !babel\.altavista\.
RewriteCond %{HTTP_REFERER} !babelfish
RewriteCond %{HTTP_REFERER} !translate
# exclude my domain from RewriteRule
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com [NC]
RewriteRule (.*\.gif$)(.*\.jpe?g$)(.*\.png$) - [NC,F,L]
9:05 pm on Jul 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chiyo:

>>> So you can respect robots.txt even if you do spider such folders, as long as you dont index them

No, I think you are wrong there.... an deny in robots.txt is NOT an exclusion to index, buit it is OK to peek (there are spiders that do not index at all!) If it is denied in robots.txt, that means do NOT go there at all, not do not index anything there.

Don:

I have also found that scooter itself, the web bot, is usually OK. But they have a problem with their image bot. I had not noted the name change to vscooter yet (so thanks!)... but they did have an older scooter that just went after images, and did not respect robots.txt.

For some reason I do not know, AV does not seem to like my sites. One of my sites has ONLY it's main page in AV, others have none at all. These are sites that have been around since 1995, and place well everywhere else. So I personally could care less about AV at all...

dave

 

Featured Threads

Hot Threads This Week

Hot Threads This Month