homepage Welcome to WebmasterWorld Guest from 54.205.207.53
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 33 message thread spans 2 pages: 33 ( [1] 2 > >     
Is it possible and advisable to block HTTP1.0 requests?
Before trying to block generic requests, need advice as to unwanted effects
not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 9:02 pm on Feb 21, 2012 (gmt 0)

I am seeing more and more things in my raw access logs that I'm sure are not good for my site. One that I am seeing more of is
"GET / HTTP/1.0" followed by someone's URL. I have been researching here for days but possibly searching for the wrong terms. Here is the problem:

nnn.137.129.75 - - [18/Feb/2012:15:09:55 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.00"
nnn.137.129.75 - - [18/Feb/2012:15:10:01 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
nnn.137.129.75 - - [18/Feb/2012:15:10:08 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/4.0 (compatible; MSIE 6.0; Update a; AOL 6.0; Windows 98)"

You can see that these 3 requests a few seconds apart are automated, the UAs are part of the script. There are other requsts that start out that way and add "somebrandname-HttpClient/3.1" after a blank UA.

From what I have been reading, the request just delivers the entire homepage, but I can't see a good reason to request it that way and it appears that it is done only to be able to spam my logs.

Is it a bad idea to block all requests for
"GET / HTTP/1.0" and "GET / HTTP/1.1"? I mean, is there a downside? I apologize for asking a basic-newb question, but before I try to redirect this to a 403 I need to know if I should.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 9:09 pm on Feb 21, 2012 (gmt 0)

If you block
GET / HTTP/1.1 requests you block all visitors and bots from accessing the root of your site.

Valid
HTTP/1.0 requests are few and far between though.
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 9:20 pm on Feb 21, 2012 (gmt 0)

Valid HTTP/1.0 requests are few and far between though.


Huh?

I've gotten many thousands of them already today, most are perfectly valid.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 10:09 pm on Feb 21, 2012 (gmt 0)

nnn.137.129.75


FWIW!

If you hadn't obscured the Class A, and, rather obscured the Class D (a multiple forum practice), somebody may have been able to provide some worthwhile insight the internet provider (i. e., server farm or other pest worthy of denying access)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 10:13 pm on Feb 21, 2012 (gmt 0)

I've this in a more complex set, however this may work.

RewriteCond %{HTTP_REFERER} ^http://.*\.ru{2}/
RewriteRule .*$ - [F,L]

Perhaps another may clean up the syntax for you.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 10:53 pm on Feb 21, 2012 (gmt 0)

Don't use .* at the beginning or in the middle of a RegEx pattern. It is uncomprehensibly inefficient in its usage of server processor cycles.

[F,L] can be replaced by [F]. It's one of the few occasions when 'L' can be omitted. Another is when using [G].

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 11:05 pm on Feb 21, 2012 (gmt 0)

Perhaps another may clean up the syntax for you.


Don't use .* at the beginning or in the middle of a RegEx pattern. It is uncomprehensibly inefficient in its usage of server processor cycles.

[F,L] can be replaced by [F]. It's one of the few occasions when 'L' can be omitted. Another is when using [G].


g1smd,
Those lines (and the accompanying other lines) were in place and functioning for more than ten years and never caused any issue while functioning fine.

FWIW my reference for "perhaps another" was in regard to the

ru{2}

I believe the two is not necessary because the RU is specific rather than random characters within a range (i. e., [a-z] ).

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 11:12 pm on Feb 21, 2012 (gmt 0)

g1smd - Thank you, that is what I thought but wasn't quite sure. I didn't think it would be that easy or I would have found that already.

wilderness, that sounds like a better way to fight this battle. They are not all .ru, but at least 95% are. I was going to block by referers but there are too many domains - over a hundred a week and they are not really referrers, there are no links that were followed.

I appreciate the quick help, I will see how that syntax fits. I am not adept at all this, but have been picking up bits and pieces, and can look up what baffles me - or ask. Thank you!

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 12:35 am on Feb 22, 2012 (gmt 0)

\.ru{2} matches .ruu - but why?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 12:45 am on Feb 22, 2012 (gmt 0)

\.ru{2} matches .ruu - but why?


I've no clue, as I've stated previously and on multiple occasions, these type of wildcard expressions (for lack of a better term) are not my cup of tea.

The entire line (one of three; with two of four being for exceptions)) that is:

RewriteCond %{HTTP_REFERER} ^http://.*\.[a-z]{2}/

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 12:57 am on Feb 22, 2012 (gmt 0)

Well,
\.[a-z]{2} matches any two-letter TLD, but only two-letter TLDs.
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 1:05 am on Feb 22, 2012 (gmt 0)

That's correct and precisely what it is intended to do.
The first line is for "leading two letter" sudomains.

The refers provided in this thread were trailng two letters (ru).

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4420156 posted 1:08 am on Feb 22, 2012 (gmt 0)

If you hadn't obscured the Class A, and, rather obscured the Class D (a multiple forum practice), somebody may have been able to provide some worthwhile insight the internet provider (i. e., server farm or other pest worthy of denying access)

Yes, I checked my existing IP list and found three different robots calling themselves \d+\.137-- including my Ukrainian pals at 178.136.0.0/15. The crystal ball strongly suggests that's who we're dealing with here. But I don't see how a request for the front page would get you any closer to the logs.

Those lines (and the accompanying other lines) were in place and functioning for more than ten years and never caused any issue while functioning fine.

It isn't that they don't work, it's that they make the server do a lot of extra work. For comparison purposes: when I use Regular Expressions in text editing I can throw together almost any old thing, because it only has to work once, and then it doesn't matter if it takes ten seconds or fifteen. But when your htaccess is processing a million requests a day, the nanoseconds add up.

I believe the two is not necessary because the RU is specific rather than random characters within a range (i. e., [a-z] ).

It's a good syntax for country codes: \.[a-z]{2} Until you have to distinguish between Colombia and a vanilla British corporation :( I do have one place that checks for \.(ru|ua|mobi) because for some reason nobody except those same Ukrainians ever gives .mobi as a referer.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 1:13 am on Feb 22, 2012 (gmt 0)

g1smd,
your sticky is full?
what's up with that?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 1:16 am on Feb 22, 2012 (gmt 0)

(Yeah, it gets battered: Twitter is more immediate)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 1:16 am on Feb 22, 2012 (gmt 0)

processing a million requests a day


how many visitors does that translate to ;)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 1:19 am on Feb 22, 2012 (gmt 0)

Don't do twiiter, FB, linkedin or none of that other crapp.

Wished to send you the completed.

DeeCee



 
Msg#: 4420156 posted 1:32 am on Feb 22, 2012 (gmt 0)

\.ru{2} matches .ruu because that is specifically what you are asking for.

X{2} means "specifically a repeat twice of character 'X'"
Just like '[a-z]{2}' means 'repeat two of any of the characters in the a-z set'

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 1:37 am on Feb 22, 2012 (gmt 0)

thus and for this instance exclusion of the {2} would apply the ru?

DeeCee



 
Msg#: 4420156 posted 1:51 am on Feb 22, 2012 (gmt 0)

wilderness,
Yes, except for might come after that might match the second 'u'.
In the case of '^http://.*\.ru{2}/' (which should merely be '^http://.*\.ru/), such extract matching is blocked by the '/' character, which the URI must match as well.

Similarly, in a more general sense, '\.ru[^u]*'

would match .ru, followed by "zero or more characters that are NOT (^) a 'u' character",

and '\.ru[^u]+' would match "one or more characters that is not 'u'"

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 3:54 am on Feb 22, 2012 (gmt 0)

Sorry I disguised the wrong part of the IP I quoted, lucy24 is only off by one number, it was 178, I just thought I should change a part of the IP to post it. I have blocked that IP and dozens of others as they appear.

I started to look into the code offered by wilderness, but wonder now if it would have the outcome I want because I don't believe that all these weird referrers are actual links to my site. There is no reason to believe that a student forum in India would have members whose profiles need to link to a website about fixing ceilings. An awful lot of "referers" are from forum profiles in Poland and Pakistan and hundreds more from domains with names that refer to jackets or jerseys or handbags or investing.

I had started by blocking referrers with more than 3 "referrals" since all of them come in only with the "GET / HTTP/1.0" and never request an actual page, but there are hundreds. It is referrer spam (why?) and will blocking all the .ru, .in, .pl, .pk referrers stop it or put a big dent in it? In that case I should opt for something like
RewriteCond %{HTTP_REFERER} ^http://.*\.(A-Z 0-9){2}/
to snag them all I think.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4420156 posted 5:51 am on Feb 22, 2012 (gmt 0)

If it's a known bad IP, block it:

Deny from 31.214.128.0/17
Deny from 38.100

(I don't do RegExes on IPs if I can help it, because they're such an awful mess.)

If it's a known hinky UA, block it:

RewriteCond %{HTTP_USER_AGENT} MSIE\ [56]
RewriteCond %{HTTP_REFERER} !\? {other stuff edited out}
RewriteRule (\.html|/)$ boilerplate/goaway.html [L]

(Obvious compromise on that one.)

If it's an obviously bogus referer, block it:

RewriteCond %{HTTP_REFERER} \.(ru|ua)/ [NC]
RewriteCond %{HTTP_REFERER} !(google|yandex)\.
RewriteRule (\.html|/)$ - [F]

RewriteCond %{HTTP_REFERER} fun/AlonzoMelissa\.html
RewriteRule fun/AlonzoMelissa\.html - [F]

(Random examples.)

If they don't behave themselves, but have nice roommates, bring on the combinations:

RewriteCond %{REMOTE_ADDR} ^(207\.46|157\.5[4-9]|157\.60)\.
RewriteCond %{HTTP_USER_AGENT} MSIE\ 7\.0;\ Windows\ NT\ 5\.[12]
RewriteRule (\.html|/)$ - [F]

There's no single fix.

Edit: I hardly get any bogus requests for the front page, and most of those count as No Skin Off My Nose. It's the requests for big fat pages that I really try to stop.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4420156 posted 7:48 am on Feb 22, 2012 (gmt 0)

Don't use .* at the beginning or in the middle of a RegEx pattern.

(a-z 0-9) is a literal string. If you wanted to specify a character group use [0-9a-z] instead.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 2:49 pm on Feb 22, 2012 (gmt 0)

OK, I think I know what I need to do now. I have a trap for bots that ignore robots.txt and it automatically adds individual IPs like this:
SetEnvIf Remote_Addr ^123\.238\.10\.71$ getout I clean it out when there is an obvious bad neighborhood and block the IP group. It sets an environment to block and allows all that are not in that environment. I have had trouble trying to block by additional rewrite rules further down the htaccess file and I think it is because of the inverse Env. set in this process. I'm thinking that I can add this referrer block into the environment that gets blocked as:
SetEnvIf Referer ^http://.\.ru/$ getout
SetEnvIf Referer ^http://.\.pl/$ getout

rather than a new rewrite rule.

I don't know how the whole thing works, but these crappy domains end up showing in my GWT account as inbound links. My stats/logs are not public. I commonly read that referrer spam cannot hurt your site but I don't think it is helping anything if non-existent links appear as links to your site and the list keeps growing.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 2:57 pm on Feb 22, 2012 (gmt 0)

SetEnvIf Referer ^http://.\.(pl|ru)/$ getout

you may add as many as you desire.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 4:42 pm on Feb 22, 2012 (gmt 0)

Oh, wow! Of course. Thank you, I am trying to do too many things at the same time right now, and sometimes my brain is like that spinning beachball. I appreciate all the helpful people here that help me learn and sort things out.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 4:49 pm on Feb 22, 2012 (gmt 0)

I am trying to do too many things at the same time right now


With htaccess, that time, is the time when you need to slow down the most and make sure you have the syntax correct.

Later on and as the size of your file progresses, it will take longer and longer to locate errors (especially when they fail to generate a 500 error and simply cause another line (s) not to function as you intended.

I just spent some two weeks going through a 1200 line file that had not been used in 2.5 years and it was a strain on the eyes and the patience.

Don

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 3:26 am on Feb 23, 2012 (gmt 0)

You are right, that is why I had not implemented anything although I have been trying to beat back this problem with normal means for months. I was very concerned about performance, blocking over 500 domains by referer, and that was only the multiple entries. My last look in GWT told me I had better do something so I started looking for what would work. The more I read, the less I could see what would help. Trying to decide how to stop it I kept running into a wall. "That won't work because.." I took a look at today's access logs and see a few unusual entries after the update but 403s where I was trying to put them in several instances already so I am encouraged. That one new line is making a difference. I will look closely for the next few weeks to see if there is another detail that might help. I have much more reading to do.
Thanks to all who helped.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 3:49 am on Feb 23, 2012 (gmt 0)

I've gotten many thousands of them already today, most are perfectly valid.


@incrediBill: I'm with OP on this. My HTTP/1.0s have been not only krapola for the last three years (my tracking years) but have become larger in volume in the last few months. This is a broad request which I realize needs to be pared down to one (1) or two (2) "perfectly valid" examples because I'm not seeing it. Most are Asian, Nigerian, Russian and Baltic areas, and some Middle East as far East as India. Might be valid, but none are viable as regards my site(s) because half the hits are for pages that don't exist, never have, and the other half are after images and nothing else.

So, I do query, begging knowledge from those with more experience in these things, What the Heck Difference is there between 1.0 and 1.1? and why should I care?

As far as rewrites and setenvif or setenvifnocase and others... that's covered, just need some advice re: 1.0 and it's current purpose/use to webmasters.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4420156 posted 5:56 am on Feb 23, 2012 (gmt 0)

tangor,
since your only looking for feedback.

I reactivated a stie on Feb 3 that had been offline for 2.5 years.
The traffic has not been overwhelming as I'm still restoring pages.

Http 1.0
1) My host (also my previous host) pings ever few hours (had the IP denied previously and it didn't effect my account)
2) I've about three dozen Euro IP's all requesting the same page which had been offline 2.5 years.
3) the hacked version of the majestic bot (1.4.2) has made a couple of requests.
4) I had a user from Argentina that was part of group, at least until he ran a downloader.
5)a log spammer from the Netherlands been hitting, despite denial
6) Trend
7) 1.202.218.8 "\"Mozilla/5.0", which has been eating 403s about every 90-minutes.

My "assumption" is that nothing valid (despite a small quantity of acceptable users) comes from HTTP/1.0s.

Don

This 33 message thread spans 2 pages: 33 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved