Forum Moderators: phranque

Message Too Old, No Replies

blocking a visitor

blocking bluecoat sitereview visits

         

revrob

10:39 am on Nov 11, 2011 (gmt 0)

10+ Year Member



Based on this log entry:

2.97.66.#*$! - - [10/Nov/2011:21:34:27 +0100] "GET / HTTP/1.1" 302 232 www.mysite.org.uk "http:/ /sitereview.bluecoat.com/sitereview.jsp" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1" "-"
2.97.66.#*$! - - [10/Nov/2011:21:34:55 +0100] "GET /mypage.html HTTP/1.1" 200 7278 w w w.mysite.org.uk "http:/ /www.mysite.org.uk/index.html" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1" "-"
2.97.66.#*$! - - [10/Nov/2011:21:34:56 +0100] "GET /robotsrestrictedmediafolder/mediafile.jpg HTTP/1.1" 200 469067 www.mysite.org.uk "http:/ /www.mysite.org.uk/mypage.html" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1" "-"

how would I block visits using .htaccess that include references to bluecoat and sitereview in the way that these log entries have them?

Am I correct in assuming that those strings don't occur in the useragent part of the log - and if so, what sort of redirect or rewrite would block them (I would want to serve up a 403 response)?

Many thanks in advance.

wilderness

5:51 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are thousands of these examples here at Webmaster World, even, in your own previous inquires.

Use either of the following methods.

mod_setenvif [httpd.apache.org]
SetEnvIfNoCase Referer



RewriteCond %{HTTP_REFERER}

Mod Rewrite Anti-Leech Solution in the forum library [webmasterworld.com]

revrob

6:33 pm on Nov 11, 2011 (gmt 0)

10+ Year Member



Thank you for your reply.
I'm sorry for troubling you.

My problem is that I don't know what kind of instruction I need to create - even after reading your kind reply and the threads you point me to.

If there is a more appropriate help forum where those unskilled in Apache can get help, I will gladly go there and not trouble you further - but I've always found this place a very helpful one up to now.

Thanks in advance for any gracious offers of assistance.

wilderness

6:43 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The forum library [webmasterworld.com] offers examples for these very simple procedures.

FWIW, the forum library link is within the following and located within the header of this page:

Forum Library : Charter : Moderators: jdMorgan

wilderness

6:47 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Close to Perfect htacess [webmasterworld.com] is a very old thread.
It also contains some invalid syntax by participants which carelessly copied their entire file (most of which they copied and pasted from another source), however and despite its length, remains a good tutorial for beginners

revrob

11:38 pm on Nov 11, 2011 (gmt 0)

10+ Year Member



Thank you.

I've had a look at
[webmasterworld.com...]

Would this cut the mustard for the visitor given in my log extract above?

RewriteCond %{HTTP_REFERER} ^http://sitereview.bluecoat.com$
RewriteRule !^http://[^/.]\.mysite.org.uk.* - [F]

thanks again

wilderness

11:51 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Please note; RewriteCond lines require escaping of all periods.

A more effective denial would be the following:

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (sitereview|bluecoat)
RewriteRule .*$ - [F]

Simple and clean

lucy24

12:09 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would this cut the mustard for the visitor given in my log extract above?

RewriteCond %{HTTP_REFERER} ^http://sitereview.bluecoat.com$
RewriteRule !^http://[^/.]\.mysite.org.uk.* - [F]

Only if the robot is asking for

http://www.example.com/http://[^/.]\.mysite.org.uk.*

RewriteRules start with the path.

RewriteRule .*$ - [F]

Simple and clean

Except that Apache now has to check the Conditions for every single request it ever receives. I think bluecoat only looks at pages, so you can constrain the Rule itself to

RewriteRule \.html$ - [F]


using whatever extension your files actually have.

wilderness

2:25 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule .*$ - [F]

Simple and clean


Except that Apache now has to check the Conditions for every single request it ever receives. I think bluecoat only looks at pages, so you can constrain the Rule itself to


Many thanks for the tip Lucy, however I used that exact line on a 2300 line htaccess for more than a decade, and there was not any delay and/or excessive server load.

lucy24

3:04 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



2300?! I guess that means I can keep merrily adding Deny from IP lines :)

I'm only up to around 300 lines combined. (I've got two domains in my userspace, so they share an htaccess for core and setenvif directives, and each have their own htaccess for mod_rewrite. I was going bonkers trying to keep two partly-identical files updated and putting them in the right places. There's still some overlap, but it isn't as crucial.)

wilderness

3:10 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess that means I can keep merrily adding Deny from IP lines


As long as you keep the lines in numerical order and somewhat divided, so that you don't pull your hair out and pop-your eyes out-of-focus when looking for syntax errors.

FWIW, my lines would have been three times that number if I hadn't learned how to condense and combine IP ranges, which I did on two or three occasions.

lucy24

6:32 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Strict and absolute numerical order, except that AmazonAWS and China are each listed separately. Ahem. The BrowserMatches are likewise in strict alphabetical order.

And yes, it gets to the point where I haven't the energy to sort out the Mostly Harmless from the Actively Malignant, and there goes all of 46.4 out the window. At least until I find someone I personally know, parked squarely in the middle of what I'd assumed was solid server-farm territory. (This is really true. There are also at least two* humans on this planet who still use MSIE 5 for Mac, so I had to modify that Rewrite. It now lets 5 and 6 use the same loophole.)


* That's assuming the one I came across is not the same one that someone hereabouts posted about recently. That would be funny, but definitely stretching probability :)

wilderness

8:24 am on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've still some devotees using the early WEB-TV's ;)

revrob

10:12 am on Nov 12, 2011 (gmt 0)

10+ Year Member



that looks like what I want - thanks.

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (sitereview|bluecoat)
RewriteRule .*$ - [F]

I sympathise with the maintenance issue - I too have 2 paralell .htaccess files to maintain on the one host, and keep promising myself to spend a few hours tidying them up to make them easier to maintain.

My last headbanging moment was discovering I had created a redirect loop because my line describing a file NOT to be subject to redirection, had two letters the wrong way round in the file name. However it was the bot that suffered the loop not me although it made for a long log that day.

I did spot it eventually.

Thanks again for the help. It is much appreciated. And as a result I've learnt a bit more on the syntax.

wilderness

4:12 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mod_Rewrite & Regular Expressions [webmasterworld.com]

^ defines the begining of a 'line' (starting anchor). Remember, ^ also designates 'not' in a regular expression, so please don't get confused.

$ defines the ending of a 'line' (ending anchor), and when followed by a number from 1 to 9, also references a variable defined in the RewriteRule pattern (used for variables on the right side of the equation or to match a variable from the rule in a condition, see example below).

contains is the absence of any anchors

lucy24

8:37 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Was that a reference to the superfluous $ in the Rule? At least that anchor-- unlike some-- won't do any harm :)

You would think, wouldn't you, that RegEx could manage to come up with a different symbol for each function. Looking at the top of my keyboard I see a perfectly good @ and # which aren't used for anything.

wilderness

9:01 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Was that a reference to the superfluous $ in the Rule? At least that anchor-- unlike some-- won't do any harm


lucy,
misuse of the anchors will certainly not cause "any harm", however neither will the rules function as desired.

Let us assume that the refer and or UA is:

some crap smells 1.0

#begins with
Using ^(crap|smells|1\.)
Will certainly fail.

#Using ends with
(some|crap|smells)$
Will also fail.

#begins and ends with
Using any thing except
^some\ crap\ smells\ 1\.0$
Will also fail

#contains
(some|crap|smells|1\.0)
catches any

This simple comprehension of anchors is vital before all other understanding of regex.

I'm struggling to utilize the KISS regex that I do, thus making suggestions to the Apache folks for alternative special characters is certainly beyond my scope or interest.