Forum Moderators: open
are you sure you used the correct syntax in your robots.txt file?
syntax, /deniedFolder/DeniedSubFolder/
Actually mack, your sort of half-correct?
My robots conatins a deny for "/deniedFolder."
There is no mention of "/DeniedSubFolder" which is part of the aforementioned deny.
So apprently that makes everything in sub-folders fair game according to robots?
if you are worried about bandwidth, there are better ways to solve that problem, but i wouldnt classify scotter as disrespectful bot just because it had a peek!
i wouldnt classify scotter as disrespectful bot just because it had a peek!
chiyo
IMO respect or even courtesy hasn't a thing to do with it.
Personally and considering the precautions I have taken (by assigning the majority of my "image files" numerals rather than names, I consider any reading of images by a bot as an intrusion.
Whether the SE's or IP's regard it as such or not is not imperative to me.
It does present me with an uregncy to make a correction in my htaccess to prevent both an expanded intrusion and future intrusions.
Don
Your robots.txt seems to be quite long at 2390 bytes, might a syntax error have crept in somewhere? There are several robots.txt syntax checkers out there.
SearchEngineWorld Robots.txt Validator
113 Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
user-agent: szukacz
114 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots.
disallow: /
131 warning Field names of robots.txt maybe case insensitive, but do capitalize field names to account for challenged robots. (eg: User-agent)
User-Agent: Whizbang
reagrding lines 113 & 114?
szukacz honors the request to disallow and yet simple syntax error should make the entire file invalid and allow Scooter to override?
(edited by wilderness 07/06/03 17:00 EST)
I might add that this image Scooter grabbed doesn't even show on the page it linked from. Rather the page has a thumbnail which links to this larger image.
[edited by: wilderness at 8:56 pm (utc) on July 6, 2003]
# enable Apache mod_rewrite
RewriteEngine on
# deny access to JPEG, GIF and png files from known harvesters and
# external referrers except language translators
RewriteCond %{HTTP_USER_AGENT} ^ArribaPacketRat [OR]
RewriteCond %{HTTP_USER_AGENT} ^Digimarc [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mercator-2\.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} vscooter [OR]
# exclude requests with empty referrer string from RewriteRule
RewriteCond %{HTTP_REFERER} !^$
# exclude requests by Norton proxy from RewriteRule
RewriteCond %{HTTP_REFERER} !^Blocked\ by\ Norton$
# exclude known language translators from RewriteRule
RewriteCond %{HTTP_REFERER} !fets\.freetranslation\.com
RewriteCond %{HTTP_REFERER} !babel\.altavista\.
RewriteCond %{HTTP_REFERER} !babelfish
RewriteCond %{HTTP_REFERER} !translate
# exclude my domain from RewriteRule
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example.com [NC]
RewriteRule (.*\.gif$)¦(.*\.jpe?g$)¦(.*\.png$) - [NC,F,L]
>>> So you can respect robots.txt even if you do spider such folders, as long as you dont index them
No, I think you are wrong there.... an deny in robots.txt is NOT an exclusion to index, buit it is OK to peek (there are spiders that do not index at all!) If it is denied in robots.txt, that means do NOT go there at all, not do not index anything there.
Don:
I have also found that scooter itself, the web bot, is usually OK. But they have a problem with their image bot. I had not noted the name change to vscooter yet (so thanks!)... but they did have an older scooter that just went after images, and did not respect robots.txt.
For some reason I do not know, AV does not seem to like my sites. One of my sites has ONLY it's main page in AV, others have none at all. These are sites that have been around since 1995, and place well everywhere else. So I personally could care less about AV at all...
dave