Forum Moderators: open
Since he was taking up a fair amount of bandwidth to no evident end, I decided to ban it. I updated robots.txt and, lo and behold, in my next database check (I now use a database/cookie/session system to track visitors) there he was again still sucking away. I then went to my .htaccess and added a new rewrite - and he came back AGAIN using another ID string - and this time with a 'buddy' robot ('PlantyBot') from an adjascent IP. I expanded my rewrite list accordingly and he's gone... for now.
Anyone else have a similar experience with this one?
Jim
NaverBot-1.0 (NHN Corp. / +82-2-3011-1954 / nhnbot@naver.com)
Here's how I tried to block him
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} nhnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} naver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NHN.Corp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverBot [NC,OR]
...
RewriteCond %{HTTP_USER_AGENT} Sleipnir [NC]
RewriteRule ^.* - [F,L]
How can he make it past that .htaccess? He did it all the same.
RewriteCond %{REMOTE_ADDR} ^61\.(7[89]¦8[0-5])\. [OR]
RewriteCond %{REMOTE_ADDR} ^218\.(14[4-9]¦15[0-9])\. [OR]
Don't forget to make the corrections of how the fourm displays the pipe character
A Simple Beginning
[webmasterworld.com...]
Use of UA's in denails is a first preferecne for many. The benefits can be both good and bad.
In some instances a particualr portion of a UA may only be used by a solitary user or small group. In other instances an unidentified bot will travel using a standard UA which prevents denial on those grounds without taking out many innocents.
I've read that some bots are using a system which circumvents denials and redirects. I have no clue if it's possible or true. Only that I read it. (I don't believe naver fits this exception though, as they are easily denied.)
Your code should have worked, but did not. However, in your listing, you show "..." indicating that there is more code that is not shown. Look for you problem there. Assuming that any mod_rewrite code works for you, then what you showed above should have worked, too.
I don't believe in any "magic work-around" for bypassing .htaccess denials. In every case I've ever seen, it has been a coding error or regular-expressions error -- some very, very subtle, such as a missing ")" or "]".
BTW, since you are using unanchored regex patterns, your fourth RewriteCond is redundant, as anything that will match it will have already matched the second RewriteCond. Also, flag [F,L] is redundant too. [F] carries an implied [L], as does [G] and [P]. However, neither of these details would have stopped your code from working.
Jim
A summary: All of this came about because of a faulty server configuration, namely in the Apache httpd.conf file.
When I first noted that several supposedly banned bots were making it into my site, I went through the code again and found no errors. I then (erronously) thought that somehow name blocking wasn't as 'hard-core' as blocking IP ranges. In reality they are equally effective, the only difference is the conditions set so no more delusions there, either.
jdMorgan came to the rescue with info about the loading/execution order of server modules loaded through the Apache httpd.conf file - what is loaded last executes first. In my case, jdMorgan's theory was that the PHP module was loading after, thus executing before, the Rewrite functions, thus the server was sending my PHP-generated pages before it had a chance to execute the Rewrite Cond's I had set. This turned out to be true. After my hosting service had flipped things around, the bots were effectively stopped at root level.
But then the bots tried getting in through a lower directory, and were making it in.
Again the httpd.conf file, but this time because of the still-fairly-new Apache addition, the Rewrite Options module (best to look it up than I take the space to explain here). It is set by default to "on" (as far as I know) but can be overridden by another later command. This still is not clear to me, but I know for certain that when I added 'Rewrite Options inherit' line to all of my lower-directory .htaccess files the bots were blocked for good. My pleasure now, watching from the ramparts, as the badBots are deflected from their path to my gateway to crash headlong into the stone of my 403-error wall. Picturesque, non? Satisfying, for sure.
My hosting service, because of the above debacle, will be conducting a head-to-toe configuration and security recap on their servers at the end of this month. Many thanks, from me as well as my hosting service, to jdMorgan for his help in both exposing and clearing up this matter. I hope that this thread in some way helps others seeing the same sort of 'breach' in their sites.
One last thought: I wonder what they do with all the information they gather, as they certainly don't put it into their search engine.