killroy

msg:402088 | 10:15 am on Jul 6, 2003 (gmt 0) |
Please change the title to "Scooter seems toviolate robots.txt", thanks. Would be a lot more usefull. SN
|
mack

msg:402089 | 10:57 am on Jul 6, 2003 (gmt 0) |
Scooter seams to respect robots.txt in my sites. Unusual for it to disrespect. Not being funny but are you sure you used the correct syntax in your robots.txt file? Mack.
|
Romeo

msg:402090 | 3:40 pm on Jul 6, 2003 (gmt 0) |
I see lot of ...sv.av.com scooter bots activities in my logs; they are very decent bots and always respecting my robots.txt so far. Regards, R.
|
wilderness

msg:402091 | 3:57 pm on Jul 6, 2003 (gmt 0) |
| are you sure you used the correct syntax in your robots.txt file? |
| syntax, /deniedFolder/DeniedSubFolder/ Actually mack, your sort of half-correct? My robots conatins a deny for "/deniedFolder." There is no mention of "/DeniedSubFolder" which is part of the aforementioned deny. So apprently that makes everything in sub-folders fair game according to robots?
|
tschild

msg:402092 | 5:54 pm on Jul 6, 2003 (gmt 0) |
Subfolders should be covered by a Disallow directive. Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.
|
Rumbas

msg:402093 | 6:08 pm on Jul 6, 2003 (gmt 0) |
SearchEngineWorld Robots.txt Validator [searchengineworld.com]
|
chiyo

msg:402094 | 6:14 pm on Jul 6, 2003 (gmt 0) |
I'm pretty sure the robots.txt is to enable web site owners to advise which folders not to index, not the folders not to be spidered. So you can respect robots.txt even if you do spider such folders, as long as you dont index them. if you are worried about bandwidth, there are better ways to solve that problem, but i wouldnt classify scotter as disrespectful bot just because it had a peek!
|
mack

msg:402095 | 7:31 pm on Jul 6, 2003 (gmt 0) |
wilderness hope you didnt think I was being funny when I made that suggestion. It is the same thing that hapened to me a few months ago. Very easily done. Mack.
|
wilderness

msg:402096 | 8:21 pm on Jul 6, 2003 (gmt 0) |
| i wouldnt classify scotter as disrespectful bot just because it had a peek! |
| chiyo IMO respect or even courtesy hasn't a thing to do with it. Personally and considering the precautions I have taken (by assigning the majority of my "image files" numerals rather than names, I consider any reading of images by a bot as an intrusion. Whether the SE's or IP's regard it as such or not is not imperative to me. It does present me with an uregncy to make a correction in my htaccess to prevent both an expanded intrusion and future intrusions. Don
|
wilderness

msg:402097 | 8:43 pm on Jul 6, 2003 (gmt 0) |
| Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there. |
| | SearchEngineWorld Robots.txt Validator |
| 113 Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. user-agent: szukacz 114 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. disallow: / 131 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. (eg: User-agent) User-Agent: Whizbang reagrding lines 113 & 114? szukacz honors the request to disallow and yet simple syntax error should make the entire file invalid and allow Scooter to override? (edited by wilderness 07/06/03 17:00 EST) I might add that this image Scooter grabbed doesn't even show on the page it linked from. Rather the page has a thumbnail which links to this larger image. [edited by: wilderness at 8:56 pm (utc) on July 6, 2003]
|
wilderness

msg:402098 | 8:48 pm on Jul 6, 2003 (gmt 0) |
| wilderness hope you didnt think I was being funny |
| Mack no harm or even vengence on puns :-) Aside from that massive misbehaviour when Scooter 1.0 reactivated early in 2003, I'm not sure I can recall a Scooter disregard? Although it's entirely possible and it has just slipped my memory? Don
|
WarmGlow

msg:402099 | 8:55 pm on Jul 6, 2003 (gmt 0) |
The vscooter robot is AltaVista's image thief. It requests image files which are used by AltaVista to create and archive thumbnail images. The vscooter robot does not obey the robots.txt exclusion standard. Both Scooter and vscooter are denied access to my image files by .htaccess directives because of copyright violations and disregard for my robots.txt denied directories. # enable Apache mod_rewrite RewriteEngine on # deny access to JPEG, GIF and png files from known harvesters and # external referrers except language translators RewriteCond %{HTTP_USER_AGENT} ^ArribaPacketRat [OR] RewriteCond %{HTTP_USER_AGENT} ^Digimarc [OR] RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR] RewriteCond %{HTTP_USER_AGENT} grub-client [OR] RewriteCond %{HTTP_USER_AGENT} ^InfoSeek [OR] RewriteCond %{HTTP_USER_AGENT} ^Mercator-2\.0 [OR] RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR] RewriteCond %{HTTP_USER_AGENT} ^psbot [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR] RewriteCond %{HTTP_USER_AGENT} vscooter [OR] # exclude requests with empty referrer string from RewriteRule RewriteCond %{HTTP_REFERER} !^$ # exclude requests by Norton proxy from RewriteRule RewriteCond %{HTTP_REFERER} !^Blocked\ by\ Norton$ # exclude known language translators from RewriteRule RewriteCond %{HTTP_REFERER} !fets\.freetranslation\.com RewriteCond %{HTTP_REFERER} !babel\.altavista\. RewriteCond %{HTTP_REFERER} !babelfish RewriteCond %{HTTP_REFERER} !translate # exclude my domain from RewriteRule RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com [NC] RewriteRule (.*\.gif$)¦(.*\.jpe?g$)¦(.*\.png$) - [NC,F,L]
|
carfac

msg:402100 | 9:05 pm on Jul 6, 2003 (gmt 0) |
Chiyo: >>> So you can respect robots.txt even if you do spider such folders, as long as you dont index them No, I think you are wrong there.... an deny in robots.txt is NOT an exclusion to index, buit it is OK to peek (there are spiders that do not index at all!) If it is denied in robots.txt, that means do NOT go there at all, not do not index anything there. Don: I have also found that scooter itself, the web bot, is usually OK. But they have a problem with their image bot. I had not noted the name change to vscooter yet (so thanks!)... but they did have an older scooter that just went after images, and did not respect robots.txt. For some reason I do not know, AV does not seem to like my sites. One of my sites has ONLY it's main page in AV, others have none at all. These are sites that have been around since 1995, and place well everywhere else. So I personally could care less about AV at all... dave
|
|