Forum Moderators: open
I've looked the whole of my file through to be sure that everything is space delimited and it appears to be as it should be.
I thought that since I had the "^" that I wouldn't need to include the "_Commons" part of it...
Think I should include this as well then?
Or, should I write it in to include [NC]?...or not?
Thought maybe the [NC] wasn't necessary if I had the "^"........ as none of my others have the [NC] but just the "^"
This Jakarta is something that has only recently been coming around. I included the .htaccess after I realised that robots.txt didn't work.....then, after noticing that the .htaccess might not be doing the trick, I began to deny the ip 64.94.163.*** before I finally started to log the 403 for it.
Rewrite Info
Conditions only effect a rule immediately following.
[NC] = No Case : to match jakarta OR Jakarta OR JakaRta use [NC]
^ = Beginning of a Line : independent from [NC] : if the UA does not begin with a lowercase j nothing will be blocked by this condition.
$ = End of a Line : Not mentioned but good to know.
I always use this style:
RewriteCond %{HTTP_USER_AGENT} jakarta [NC,OR]
If I want someone blocked I do not want them throwing a character before the name or changing the case of a character to be all they have to do to get by.
You can also combine and shorten for efficiency
EG I don't know of any user agents that start with or contain jaka I want to let through, so not point in checking for the whole string... jaka is enough to know I don't really need them on the site.
Also, if I was blocking jaka and joke, there is no point in two rules - Of course sometimes I block a couple of extras this way, because of the way the rules work out, but as long as they are not real user-agents, there is not too much concern - In this example, I also block the UA's jake and joka:
RewriteCond %{HTTP_USER_AGENT} j(a¦o)k(a¦e)
Blocks j followed by an a or o (jo or ja) followed by a k (jak or jok) followed by an a or an e (jaka or joke or joka or jake) are all blocked.
Hope this helps the rewriters.
Justin
RewriteCond %{HTTP_USER_AGENT} ^jakarta [OR]
and the last one doesn't contain the [OR]
If I get you right, I add the [NC,OR] to eliminate any possibility of an upper / lower case letter change.
Basicly what I have written only targets that *exact name? and no other variations?
If eliminating the potential for variations to get through means that I've got to do a bit of rewriting, well then, I'm all up for that.
[google.com...]
If I get you right, I add the [NC,OR] to eliminate any possibility of an upper / lower case letter change.Basicly what I have written only targets that *exact name? and no other variations?
Yes and Yes and the exact name at the beginning of the UA string. To block the name anywhere in the UA string, remove the ^ from the beginning of the line. Not having the OR on the last line is correct.
Here are a couple of lines from my file:
# Web followed by any of the strings
RewriteCond %{HTTP_USER_AGENT} Web(Account列apt列opier字ank名hack吁trip后ip存ter在andit) [NC,OR]
# Wget
RewriteCond %{HTTP_USER_AGENT} Wget [NC,OR]
# Begins exactly with User-Agent
RewriteCond %{HTTP_USER_AGENT} ^User-Agent [OR]
# Xenu
RewriteCond %{HTTP_USER_AGENT} Xenu [NC,OR]
# robot or abot that is not gigablast, gigabot, walhello - these two must be in order and not contain an OR in the first condition, so AND IS NOT is implied
RewriteCond %{HTTP_USER_AGENT} (Ro地)bot [NC]
RewriteCond %{HTTP_USER_AGENT} !(Giga(blast在ot)名alhello) [NC]
Hope this helps.
Justin
did not want to accidentally block a good one
Hey Justin,
Nothing good begins with "Web" or contains the word "spider" ;)
Although I have noticed an old SetEnv on spider catching the Lycos mod spider in the last week. It's the only exception I've seen on the word.
(Before this recent round of activity from Lycos, I'm unable to recall when the last time I saw their bot was active.)
Don
From what I've seen online, this isn't a spider or a bot, but a wrapper for Java developers. I suppose someone could wrap a bot/spider in it, but I'd think that people can use it in normal ways as well and just be browsing a site.
Or, am I wrong? I personally don't care about bad bots unless they are actually causing problems. And, if this can be a legitimate person viewing the site, I definitely don't want them banned.
Normal people browsing a site have no need for a Java wrapper. They use a regular browser. The Java wrapper strongly implies that it is a Java program being used to fetch pages from our sites. And a program that fetches pages is what we call a robot or a spider. If it checks and obeys robots.txt, that might be OK, but I've never seen one do it.
The problem with a lot of library functions and open source robots is that they're written by programmers who are utterly naive to the abuse that takes place on the Web. They provide very powerful tools for both good and bad.
But when all the Webmaster sees is the bad, then the tool is widely-banned, which makes it useless for the good. So, without an effective and enforceable terms-of-use agreement for all users, and enforcement of same, many many initially-useful tools end up being useless because the abuse greatly exceeds the good use, and they end up banned.
Look at Indy Library. It's just a useful HTTP functions library that anyone might have a good use for. But it's so widely-banned because of abuse that's it's now unusable. Same with LWP-simple and many others.
Jim