Welcome to WebmasterWorld Guest from 54.158.248.112

Forum Moderators: Ocean10000 & phranque

Blocking HTTP/1.0 Requests

     
9:15 pm on Jun 24, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Many of us block HTTP/1.0 which is used only by old bots nowadays and just a couple of beneficial link/file validators. Many ways to do this. One way is:

RewriteCond %{THE_REQUEST} HTTP/1\.0$
RewriteCond %{REMOTE_ADDR} !^1\.2[34]\.
RewriteCond %{REMOTE_ADDR} !^1\.234\.12[01]\.
RewriteRule - [F]
(IPs are examples only)

[edited by: phranque at 11:47 pm (utc) on Jun 25, 2018]
[edit reason] edited errata in code snippet [/edit]

9:35 pm on June 24, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14932
votes: 653


only by old bots nowadays
Every time I think I've got a complete (and very short) list, i find another. Considering only the ones that ask for robots.txt or show similar pretensions to robotitude:

DeuSu
MJ12
Qwantify
ia_archiver
archive.org_bot
CCBot
Findxbot
rogerbot
SafeDNSBot
SEOkicks-Robot

Not necessarily saying everyone on this list is on the side of the angels--in fact I'm currently re-evaluating at least one of the names--but darn, the list keeps growing.
9:45 pm on June 24, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 311
votes: 21


And your RewriteRule directive is incomplete, it’s missing a pattern (first argument).

[edited by: phranque at 11:56 pm (utc) on Jun 25, 2018]
[edit reason] cleanup [/edit]

11:50 pm on June 24, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Every site has different needs.

The above example blocks all requests from anything/anyone using HTTP/1.0 and gives example how to allow a couple IP ranges. Nothing additional really needs to be added, although there certainly could be more strict conditions applied. My own rules are pretty strict, but they wouldn't be for everyone.

Lucy24 gives bot examples, so a list of allowed UAs could then be added, but what may be allowed on one site, may not at another site.
12:34 am on June 25, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14932
votes: 653


:: returning from poring over recent logs ::

HTTP/1.0 may be close to a non-issue. I find around 70% of /1.0 page requests blocked for whatever reason. At least 98% of the remainder were either authorized, or would have been blocked if they visited today. (Even header-based access controls need to be tweaked now and then.) Even among robots who sent no UA header--i.e. the dimmest of the dim--only about 3% used /1.0. I get the impression robots have to go out of their way and make a special effort if they want to use /1.0.

On the other hand I was staggered to find the occasional human using /1.0. Mostly, for some reason, from various places in Europe. (I thought they prided themselves on being cutting-edge?)

But overall I'm reminded of only a few years ago, when we could require user-agents to start with “Mozilla/[45]” except for selected authorized robots, and that would keep almost everyone out. Today robots all claim to be Mozilla anyway, so that gets us nowhere.
12:41 am on June 25, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


True enough. I brought it up because the last several UAs I documented in the Search Engine Spider & User Agent ID Forum [webmasterworld.com] came in on a HTTP/1.0 protocol.

If you weren't blocking the usual suspects, a HTTP/1.0 filter might catch a few though.
2:44 am on June 25, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11388
votes: 157


And your RewriteRule directive is incomplete, it’s missing a pattern (first argument).

there are two ways to use the "-" (dash) syntax for the Substitution string.
you can use a placeholder for the Pattern:
RewriteRule ^ - [F]

you can leave the Pattern unspecified:
RewriteRule - [F]

while i've seen the second usage often i haven't found it documented by apache.

regarding the RewriteCond directives:
without the [OR] it means you're excluding HTTP/1.0 except for those two IP patterns.

mods note:edited for thread cleanup

[edited by: phranque at 12:09 am (utc) on Jun 26, 2018]

3:28 am on June 25, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14932
votes: 653


without the [OR] it means you're excluding HTTP/1.0 except for those two IP patterns.
Yes, that’s the idea--except, of course, that in real life there would be more exclusions. It translates as “block any requests that use HTTP/1.0 AND don't come from {nice neighborhood #1} AND don’t come from {nice neighborhood #2} AND don’t come from {nice neighborhood #3} AND aren’t named {nice robot #1} AND aren’t named {nice robot #2} AND ” ... et cetera.

It’s AND rather than OR because all the conditions--except the first one, which triggers all the others--are expressed as negatives. If they were all positive conditions, it would be OR instead. “The request doesn’t meet any of these conditions” (requiring you to test all of them) vs. “The request meets at least one of these conditions” (allowing you to stop testing as soon as one is met).

It’s largely a matter of individual coding style whether you want to say
RewriteCond %{REMOTE_ADDR} !^1\.2\.3
RewriteCond %{REMOTE_ADDR} !^4\.5\.6
or whether you instead choose to say
RewriteCond %{REMOTE_ADDR} !^(1\.2\.3|4\.5\.6)
since shaving picoseconds off processing time isn’t the only consideration.
3:34 am on June 25, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Nicely explained, thanks lucy24
8:08 am on June 25, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Here's the code with both the allowed IP addresses & allowed UAs:


RewriteCond %{THE_REQUEST} HTTP/1\.0$
RewriteCond %{REMOTE_ADDR} !^1\.2[34]\.
RewriteCond %{REMOTE_ADDR} !^1\.234\.12[01]\.
RewriteCond %{HTTP_USER_AGENT} !(example1|example2|example3)
RewriteRule - [F]


This should not be a very long list for almost anyone. I only allow 6 IPs and 4 UAs.

[edited by: phranque at 11:48 pm (utc) on Jun 25, 2018]
[edit reason] cleanup [/edit]

11:23 am on June 25, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Nov 13, 2016
posts:463
votes: 65


Good idea to block old protocols HTTP/1.0 and for HTTPS to accept only connections in TLSv1.2 (and above, for browsers already supporting TLSv1.3).

I am doing this since 2 years, but I do log blocked requests to be sure nothing legitimate is blocked. And so far, I didn't see any valuable visitor (or robot) blocked. Of course, good webmasters will always keep an eye on what they are blocking, always.
12:14 am on June 26, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11388
votes: 157


mods note:
i removed quite a bit of lateral discussion about errata in the OP which was unnecessary once that was clarified.
in order to keep the discussion on topic i removed posts or parts thereof in such a way that would keep the flow of the discussion as well as keep all interested parties subscribed.
i also modified my remaining post to fill in a couple of blanks.
12:17 am on June 26, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Thanks phranque
12:26 am on June 26, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11388
votes: 157


looks MUCH better now eh?

maybe add a link to this thread from the pinned Blocking Methods [webmasterworld.com] thread in UAID?
12:30 am on June 26, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


Great idea... done.
2:01 am on June 26, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 311
votes: 21


Sorry to stomp on the mod-cleanup, but if you're going to fix the OP code, then you really do need to fix that RewriteRule directive (as well as some of the newly added missinformation).


RewriteRule - [F]


you can leave the Pattern unspecified


Ahh, no you can't.

while i've seen the second usage often i haven't found it documented by apache.


Yes, I've seen this usage often as well - in support forums, posted by confused unsuspecting soles who have copy/pasted some "working" directives from the internet and are wondering why it's just not working.

You haven't seen it documented, because well... it's just not a thing.

If you omit the pattern (first argument) then the other arguments effectively just slide over. So the substitution is now the pattern and the flags are the substitution. The result is that this will only match requests that contain a hyphen, which it will attempt to rewrite to "[F]" (which is probably going to fail horribly - a 404 most likely). Any other requests will simply not match - no error - just nothing will happen.
2:45 am on June 26, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


With regards to how it is applied in OP code, I just tested:
RewriteRule - [F]
and
RewriteRule ^ - [F]
and as expected, they perform exactly the same.


Depending on your site interests, there are certain files that may be prudent to allow, even though other files are blocked. Some examples of allowed files may be:
RewriteRule !^(ads\.txt|custom403\.html|dnt-policy\.txt|robots\.txt)$ - [F]

Allowing these files (and others) may reduce the allowed UAs and IP addresses list.
6:13 am on June 26, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14932
votes: 653


Some examples of allowed files
Well, if we’re going into detail...

All things being equal, pipe-delimited groups (or lists of [OR] conditions*) should be listed in order of likeliest-to-succeed. On most sites that would be your custom 403 page first, followed by robots.txt. Then again, on many sites it would be equally expeditious to start with
RewriteRule \.txt - [L]
rather than go through the whole list of permitted .txt files.

:: detour to test site for surprising discovery ::

Rules in the form
RewriteRule - [F]
do not throw a 500-class error. On my site (Apache 2.2) they are simply ignored: the rule is not executed, and conditions--if any--are not evaluated. I would not consider that to be “performing exactly the same” as a syntactically correct
RewriteRule ^ - [F]
or
RewriteRule . - [F]


* Not to be confused with default [AND] lists, which start with likeliest-to-fail.
7:28 am on June 26, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


...should be listed in order of likeliest-to-succeed
While I agree in theory, I've never concerned myself with what amounts to trivial distinction in today's realm of Broadband speeds, SSD and HTTP/2 and how many daily requests do you really get coming in on HTTP/1.0?

One site I have gets between 60k and 70k page views per day and it can be weeks before I see a request for HTTP/1.0. Another site with less traffic may see one or two a day.

On my site (Apache 2.2) they are simply ignored...
Since I don't know exactly what/how you tested, I can't really comment. But if that's the case, then certainly do use:
RewriteRule ^ - [F]
I'm not really advocating leaving out the anchor, most all of my rules have it. It's just not always necessary in my experience.

2 of my servers are also Apache 2.2* (one at the same host as yourself) and although not used with an HTTP/1.0 block by itself, I do have a number of Rule sets written that way and they've performed as intended for years.

*Apache 2.4 with mod_http2 rocks!
9:23 am on June 26, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 311
votes: 21


RewriteRule - [F]

On my site (Apache 2.2) they are simply ignored...


It's not strictly "ignored". The pattern is simply a hyphen, so it matches any URL that contains a hyphen. eg. It matches "/foo-bar", but not "/foobar" or "/", etc.

But if it matches it will internally rewrite the request to "[F]" (ie. a nonsensical "relative URL" consisting of the characters "[", "F" and "]"). It doesn't issue a 403 subrequest. So, this can't "work" as intended under any test condition (on any version of Apache).

And since there is no "[L]" flag (no flags argument at all), rewriting will continue...
9:40 am on June 26, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12127
votes: 775


This should appease everyone:

RewriteCond %{THE_REQUEST} HTTP/1\.0$
RewriteCond %{REMOTE_ADDR} !^1\.2[34]\.
RewriteCond %{REMOTE_ADDR} !^1\.234\.12[01]\.
RewriteCond %{HTTP_USER_AGENT} !(example1|example2|example3)
RewriteRule !^(custom403\.html|robots\.txt|ads\.txt|dnt-policy\.txt)$ - [F,L]
6:31 pm on June 26, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14932
votes: 653


But if it matches it will internally rewrite the request
Y'know, I did wonder if that's what was really going on, but at the time couldn't think how to test it: the target - hyphen is read as the pattern, and then the flag [F] is read as the target, with--as you observe--no actual [L] flag.

:: quick run to test site ::

I happen to have a few existing pagenames with - in them, so I just loaded up one, activated the (malformed) rule and tried reloading. This led to a 404 at the originally requested URL. This was mystifying until I looked at the error logs--pause for “D’oh!” here--which said
[Tue Jun 26 11:22:43 2018] [error] [client my-own-IP] File does not exist: /physical-file-path/example.com/[F]
I’d actually forgotten that if the target doesn’t start with either http:// or / slash, then it defaults to the current hostname. Then again, it would not have occurred to me to put [ literal brackets ] in an URL. Does any site use them? In paths, I mean; I’ve definitely seen brackets in query strings.