Welcome to WebmasterWorld Guest from 54.157.225.99

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Scooter

Not Good

     
10:06 am on Jul 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


216.39.51.5 - - [06/Jul/2003:00:29:00 -0700] "GET /robots.txt HTTP/1.0" 200 2390 "-" "Scooter/3.3.vscooter"
216.39.51.5 - - [06/Jul/2003:00:29:01 -0700] "GET /deniedFolder/DeniedSubFolder/denied.jpg HTTP/1.0" 200 27743 "mypage.html" "Scooter/3.3.vscooter"
10:15 am on July 6, 2003 (gmt 0)

Senior Member from MT 

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 1, 2003
posts:1843
votes: 0


Please change the title to "Scooter seems toviolate robots.txt", thanks.

Would be a lot more usefull.

SN

10:57 am on July 6, 2003 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7617
votes: 21


Scooter seams to respect robots.txt in my sites. Unusual for it to disrespect. Not being funny but are you sure you used the correct syntax in your robots.txt file?

Mack.

3:40 pm on July 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 28, 2002
posts:505
votes: 0


I see lot of ...sv.av.com scooter bots activities in my logs; they are very decent bots and always respecting my robots.txt so far.

Regards,
R.

3:57 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


are you sure you used the correct syntax in your robots.txt file?

syntax, /deniedFolder/DeniedSubFolder/

Actually mack, your sort of half-correct?
My robots conatins a deny for "/deniedFolder."
There is no mention of "/DeniedSubFolder" which is part of the aforementioned deny.

So apprently that makes everything in sub-folders fair game according to robots?

5:54 pm on July 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 27, 2003
posts:166
votes: 0


Subfolders should be covered by a Disallow directive.

Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.

6:08 pm on July 6, 2003 (gmt 0)

Moderator from DK 

WebmasterWorld Administrator 10+ Year Member

joined:Oct 23, 2000
posts:2538
votes: 4


SearchEngineWorld Robots.txt Validator [searchengineworld.com]
6:14 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


I'm pretty sure the robots.txt is to enable web site owners to advise which folders not to index, not the folders not to be spidered. So you can respect robots.txt even if you do spider such folders, as long as you dont index them.

if you are worried about bandwidth, there are better ways to solve that problem, but i wouldnt classify scotter as disrespectful bot just because it had a peek!

7:31 pm on July 6, 2003 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7617
votes: 21


wilderness hope you didnt think I was being funny when I made that suggestion. It is the same thing that hapened to me a few months ago. Very easily done.

Mack.

8:21 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


i wouldnt classify scotter as disrespectful bot just because it had a peek!

chiyo
IMO respect or even courtesy hasn't a thing to do with it.
Personally and considering the precautions I have taken (by assigning the majority of my "image files" numerals rather than names, I consider any reading of images by a bot as an intrusion.
Whether the SE's or IP's regard it as such or not is not imperative to me.
It does present me with an uregncy to make a correction in my htaccess to prevent both an expanded intrusion and future intrusions.

Don

8:43 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.


SearchEngineWorld Robots.txt Validator

113 Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
user-agent: szukacz
114 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
disallow: /
131 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. (eg: User-agent)
User-Agent: Whizbang

reagrding lines 113 & 114?
szukacz honors the request to disallow and yet simple syntax error should make the entire file invalid and allow Scooter to override?

(edited by wilderness 07/06/03 17:00 EST)
I might add that this image Scooter grabbed doesn't even show on the page it linked from. Rather the page has a thumbnail which links to this larger image.

[edited by: wilderness at 8:56 pm (utc) on July 6, 2003]

8:48 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


wilderness hope you didnt think I was being funny

Mack no harm or even vengence on puns :-)

Aside from that massive misbehaviour when Scooter 1.0 reactivated early in 2003, I'm not sure I can recall a Scooter disregard?
Although it's entirely possible and it has just slipped my memory?

Don

8:55 pm on July 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 23, 2003
posts:77
votes: 0


The vscooter robot is AltaVista's image thief. It requests image files which are used by AltaVista to create and archive thumbnail images. The vscooter robot does not obey the robots.txt exclusion standard. Both Scooter and vscooter are denied access to my image files by .htaccess directives because of copyright violations and disregard for my robots.txt denied directories.

# enable Apache mod_rewrite 
RewriteEngine on
# deny access to JPEG, GIF and png files from known harvesters and
# external referrers except language translators
RewriteCond %{HTTP_USER_AGENT} ^ArribaPacketRat [OR]
RewriteCond %{HTTP_USER_AGENT} ^Digimarc [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mercator-2\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} vscooter [OR]
# exclude requests with empty referrer string from RewriteRule
RewriteCond %{HTTP_REFERER} !^$
# exclude requests by Norton proxy from RewriteRule
RewriteCond %{HTTP_REFERER} !^Blocked\ by\ Norton$
# exclude known language translators from RewriteRule
RewriteCond %{HTTP_REFERER} !fets\.freetranslation\.com
RewriteCond %{HTTP_REFERER} !babel\.altavista\.
RewriteCond %{HTTP_REFERER} !babelfish
RewriteCond %{HTTP_REFERER} !translate
# exclude my domain from RewriteRule
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com [NC]
RewriteRule (.*\.gif$)(.*\.jpe?g$)(.*\.png$) - [NC,F,L]
9:05 pm on July 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 1, 2002
posts:774
votes: 0


Chiyo:

>>> So you can respect robots.txt even if you do spider such folders, as long as you dont index them

No, I think you are wrong there.... an deny in robots.txt is NOT an exclusion to index, buit it is OK to peek (there are spiders that do not index at all!) If it is denied in robots.txt, that means do NOT go there at all, not do not index anything there.

Don:

I have also found that scooter itself, the web bot, is usually OK. But they have a problem with their image bot. I had not noted the name change to vscooter yet (so thanks!)... but they did have an older scooter that just went after images, and did not respect robots.txt.

For some reason I do not know, AV does not seem to like my sites. One of my sites has ONLY it's main page in AV, others have none at all. These are sites that have been around since 1995, and place well everywhere else. So I personally could care less about AV at all...

dave

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members