Forum Moderators: phranque

Message Too Old, No Replies

Is there no way to get rid of this referrer spam?

         

thord

5:25 am on Aug 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Because of the 200 the below spam referrer is getting listed as a referring site in the statistics. The first 301 is due to www canonicalisation of the index page. The IPs clearly belong to hacked computers, often in Brazil, so no use block by REMOTE_ADDR.

*.54.246.68 - - [...] "GET / HTTP/1.1" 301 241 "http://*seo.*/try.php?u=http://mydomain.*"
*.54.246.68 - - [...] "GET /gone.html HTTP/1.1" 200 298 "http://*seo.*/try.php?u=http://mydomain.*"

The condition and rule are:

RewriteCond %{HTTP_REFERER} seo [NC,OR]
...
RewriteRule !^gone\.html - [G]

The reason for the need to use this rule is explained in my previous thread [webmasterworld.com...]

keyplyr

6:43 am on Aug 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Get rid of it? No. Block most of it? Yes.

Block server farm ranges, block bad UAs and ignore the rest. However the blocked requests will still display in your server logs. No way to stop it. If a request is made, it will be logged*

It comes with having a website. The target is you looking at it (and hopefully following the links) so ignore it and they fail.

*Exception is if you manage your own server and block via firewall. Then it is logged in a different place.

whitespace

9:23 am on Aug 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



The first 301 is due to www canonicalisation of the index page.


You shouldn't be seeing the 301 in your logs. As mentioned in lucy24's last comment in the linked thread, your "blocking" directives should come before any canonicalisation redirects. This would prevent the double/301 entry in your logs (and the double hit on your server).

*.54.246.68 - - [...] "GET /gone.html HTTP/1.1" 200 298 "http://*seo.*/try.php?u=http://mydomain.*"


The 200 status here would appear to be for direct accesses to the error document itself?! You would not expect to see this logged for a genuine 410 error. How is the spammer finding this document? Perhaps a more obscure filename would reduce the number of hits? How is your ErrorDocument configured? How is it being served?

You can configure your 410 error document to return a 410 HTTP status (or maybe a 403 Forbidden) for direct requests. (Presumably this would prevent your stats software from plucking it out as a valid referer?)

thord

3:21 pm on Aug 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



You shouldn't be seeing the 301 in your logs.


Precisely what I have been thinking too, as this referrer is 410ed. That is why I am so perplexed with this issue. I do not understand how this happens. The blocking directives are coming before the canonicalisation redirects.

The 200 status here would appear to be for direct accesses to the error document itself?! ... How is the spammer finding this document?


That is indeed just how it appears! The 200 follows immediately after the 301. In my previous thread I wondered if there is something wrong with my RewriteRule !^gone\.html - [G] , but no. I did change the filename from 410.html to gone.html – to no avail. The document itself is not configured to return a 410 or 403.

keyplyr is so right, I should just ignore referrer spammers. It is the very fact that I do not understand what is going on that gives me no peace. I do think I can get the hits invisible by configuring the statistics programme, but the 200s will still occur. Everything was fine for 13 years until mid may, when my host made some server reconfiguration or update. Blocking by IP or UA still works.

whitespace

9:02 pm on Aug 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



You shouldn't be seeing the 301 in your logs.


...because there shouldn't be a 301 redirect happening at all, anywhere (if the directives are in the correct order). If you are seeing a 301 in your logs, then something, somewhere is triggering a 301 redirect. Maybe this is in your server config or (another?) .htaccess file? Is this a shared host? What kind of server setup do you have? Apache? Any front end proxies, CDNs? We'd need to see your .htaccess file in its entirety to be sure nothing untoward is happening in there.

The 200 follows immediately after the 301.


Your logs would seem to suggest that your error document is being served by an external redirect, rather than an internal rewrite, which would not be correct. Do you see a 410 status anywhere in your logs?

Just to confirm... in your ErrorDocument directive you have specified a root-relative (starting with a slash) local path to your error document? Not an absolute URL? An absolute URL would indeed trigger a redirect. (Whereas a local path is internally rewritten.)

Just wondering... can the Apache log be configured to actually log the custom ErrorDocument that is served (in the case of an internal rewrite)? The servers I've worked on never have, as I recall, and I can't see any directives that look as if they would enable this "feature"?

keyplyr

9:18 pm on Aug 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just a FYI - I see nothing wrong with the log report. I see different configs at different hosts and they're not all the same.

Seeing the 301 response prior to the 200 usually indicates the server config is doing the redirect and thord's htaccess redirect is probably not needed. Maybe thord chose www version of site when account was set up.

Again, being hypersensitive to log spam is a waste of time IMO.

thord

6:30 pm on Aug 25, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



because there shouldn't be a 301 redirect happening at all

I do agree. Makes perfectly sense. It is shared hosting, so this issue may be due to some intentional or unintentional configuration of the server, and not due to my .htaccess at all. Especially as it worked faultlessly before mid May. whitespace, may I send you the RewriteCond %{HTTP_REFERER} section of my .htaccess by Stickymail? The whole file is too big.
Do you see a 410 status anywhere in your logs?

Many. But they are all due to blocking by IP or by UA. It is blocking by referrer that is not working as it should.
specified a root-relative (starting with a slash) local path to your error document?

RewriteRule !^gone\.html - [G] . No slash. The error document is in the root.

#keyplyr: The account with this host was set up last November and everything was fine until mid May. Blocking by IP or by UA still works properly. This is not really hypersensitivity to log spam, but more about frustration over not being able to make the .htaccess work perfectly.

keyplyr

7:25 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wouldn't it be best to work with their support dept to resolve your issues since they are the ones who know how their server is set up?

whitespace

9:22 pm on Aug 25, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I see nothing wrong with the log report. I see different configs at different hosts and they're not all the same.


Well, if those log entries are the result of trying to serve a "410 Gone" using mod_rewrite then I would say there is something wrong. Apart from messing up your logs, redirecting to an error document (sometimes the result of developer error) is detrimental to both server resources and potentially SEO (ok, this is just spam, but you might want to serve a 410, 404 or 403 using mod_rewrite for any number of reasons).

I assume you get similar log results for a 404 (ie. [R=404]) and a 403 (ie. [F]) on your RewriteRule? Or is the [G] unique in this respect (although that would be even more weird)?

It is shared hosting, so this issue may be due to some intentional or unintentional configuration of the server


Quite possibly. I have certainly come across some bizarre/unexplainable (shared) server configs in these forums! As keyplyr suggests, it would be a good idea to contact your host's support dept. (Although, best of luck with that!) If this is a server config issue then you should be able to build a concise test case - in fact, you need to do that anyway, if you haven't already.

may I send you the RewriteCond %{HTTP_REFERER} section of my .htaccess by Stickymail?


You can, but why not post it in the forum?

The whole file is too big.


To be honest, the problem is more likely to be somewhere else in the file (if at all). I somehow doubt that the problem is with that one RewriteCond...RewriteRule block. (But create a concise test case as mentioned above - if you are unable to repeat this issue then you've messed up somewhere else! ;)

RewriteRule !^gone\.html - [G] . No slash. The error document is in the root.


I mean, how is the ErrorDocument declared? Is it like this:


ErrorDocument 410 /gone.html


Or, like this:


ErrorDocument 410 http://example.com/gone.html


The second version (with an absolute path) would indeed result in an external redirect. (But it would have done this before mid May as well.)

keyplyr

11:29 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



may I send you the RewriteCond %{HTTP_REFERER} section of my .htaccess by Stickymail?
You can, but why not post it in the forum?
Let's not post sensitive server info in the public forum. Best to do that privately with someone you trust.

thord

4:45 am on Aug 26, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Here is the abridged and depersonalised (with *) .htaccess file. Moderators delete if improper.
Options +FollowSymLinks
RewriteEngine on

ErrorDocument 404 /notfound.html
ErrorDocument 410 /gone.html

RewriteCond %{REMOTE_ADDR} ^*\.(9[2-5])\. [OR]
...
RewriteRule !^gone\.html - [G]

RewriteCond %{HTTP_REFERER} seo [NC,OR]
...
RewriteRule !^gone\.html - [G]

RewriteCond %{HTTP_USER_AGENT} * [NC,OR]
...
RewriteRule !^gone\.html - [G]

RewriteRule *|* - [L]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?myowndomain\.*(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://(www\.)?google\. #plus a few others and variations#
...
RewriteRule \.(jpeg?|jpg|gif|wav|pdf|rtf)$ http://www.myowndomain.*/*.png [NC,R,L]

RewriteCond %{REQUEST_URI} ^([^.]+\.html)
RewriteRule \.html. http://www.myowndomain.*/$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^((www\.)?(*|*|*)|*)\.* [NC]
RewriteRule (.*) http://www.myowndomain.*/$1 [R=301,L]

[G] and [R=301] are the only flags of that kind used in my .htaccess. I do have contacted my host's support, but whitespace knew the realities (at least in Europe) when saying: "Although, best of luck with that!" I have done the tests suggested by lucy24 in my first thread regarding this strange problem: [webmasterworld.com...]

whitespace

8:48 am on Aug 26, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Many. But they are all due to blocking by IP or by UA. It is blocking by referrer that is not working as it should.


Ah, I hadn't appreciated that your IP and UA blocks were also performed by the very same (mod_rewrite) method as your referer blocks! Really? This does throw the ball back into the area of "developer error"! (For some reason I had assumed you were using mod_setenvif / mod_authz... but then that would have resulted in a 403, not a 410 - doh! Although a 403 is arguably a more appropriate response in this case.)

Here is the abridged and depersonalised (with *) .htaccess file.


Unfortunately, that would seem to have changed it beyond all recognition - we can't debug a pseudo-mockup. Using * to replace "personal info" in the regex - you do realise that * is a special meta char in regex? (Do you really have that much personal stuff in .htaccess?)


RewriteRule (.*) http://www.myowndomain.*/$1 [R=301,L]


Hhhmm, wondering why you would need to "depersonalise" the last part of the substitution? Thing is, that just looks like an error.

not2easy

1:34 pm on Aug 26, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The ideal way to depersonalize your shared details is to use "example.com" in place of actual domain names. I understand that "myowndomain" is meant to do that, but that is a real domain that probably belongs to someone while "example" is an example that can't ever be a real domain name.

You can read about it in this forum's Charter: [webmasterworld.com...]
Please do not post specific details such as domain names, full IP addresses, or personally-identifiable information such as name, e-mail address, IM screen name, etc. Such specifics will be edited or removed in accordance with our Terms of Service [webmasterworld.com...] which may render your post meaningless. Please replace all instances of your domain name with "example.com" before posting.

thord

3:34 pm on Aug 26, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



whitespace, I think it may be improper to disclose much more in this Forum. Revealing my domain would be considered self-promotion. Revealing banned IP:s or referrers could offend someone. I am aware of the example.com rule here, but found it important to somehow indicate that this very domain name is my own web site's, not just any site's. How to do that? There is no TLD stated.

The reasons why I am deliberately using 410 and not 403 is OT but can be the subject of an interesting separate discussion. Still, I sincerely hope it will be possible to find some solution to the main problem based on the limited information that can be given in a public forum. Otherwise future readers will also be left in the dark.

blend27

6:23 pm on Aug 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@thord.
/try.php?u=

I am assuming this is all semalt spam you are referring to.

There was/is [webmasterworld.com...] thread.

MSG4801456: <action type="AbortRequest" /> that is on IIS.

I started using that a while back and was able to clean up all requests from making it to log files.

Now I know this is an Apache forum, but take a look, perhaps there is a way to force this on Apache as well.

whitespace

10:06 pm on Aug 26, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Revealing my domain would be considered self-promotion. Revealing banned IP:s or referrers could offend someone.


No one is suggesting you should reveal your real domain or IPs - it shouldn't be necessary anyway. But in "depersonalising" the code you would need to replace like for like (letters with letters and numbers with numbers) whilst still keeping it syntactically valid with the original meaning. For example, take the first line of your "depersonalised" code (at least I assume this is "depersonalised" and not simply an error?):


RewriteCond %{REMOTE_ADDR} ^*\.(9[2-5])\. [OR]


You appear to have removed something towards the start of the regex and replaced it with *. The problem with this is that anyone trying to analyse/debug the code will see a 500 Internal Server Error ("cannot compile regular expression"). Replace original IP segments with 1, 12 or 123 (for example).

However...


# 1
RewriteCond %{REMOTE_ADDR} ^*\.(9[2-5])\. [OR]
...
RewriteRule !^gone\.html - [G]

# 2
RewriteCond %{HTTP_REFERER} seo [NC,OR]
...
RewriteRule !^gone\.html - [G]

# 3
RewriteCond %{HTTP_USER_AGENT} * [NC,OR]
...
RewriteRule !^gone\.html - [G]


If you are suggesting that #1 (IP blocking) and #3 (UA blocking) work OK and correctly log a 410, but #2 (Referer blocking) is not (for a similar canonical request) then this would seem to be bordering on the "magical". There would indeed appear to be some external forces (eg. server config, additional modules, etc.) in play (coupled with the canonical redirect that appears to precede this). But again, what this could be is anyone's guess. mod_security springs to mind - however - you would not expect a 200 OK response.

MSG4801456: <action type="AbortRequest" /> that is on IIS.


Interesting! I'm sure this must be possible somehow on Apache, however, whether you can do this "easily" in userland code is another matter. Even when Apache "aborts" the request, there is always at least a response header returned - whilst the response body might be empty.

thord

7:08 am on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I originally said that the 301ed request GET / is immediately (or within the next second according to the log) followed by a 200ed request by the same IP for /gone.html. That is, however, not always so. The interval is sometimes longer. I noticed this now when there was another, legitimate, request between the spammer's ones.

A thorough recheck of the raw logs reveals intervals of up to 8 seconds. That is not how mod_rewrite works. whitespace said: "The 200 status here would appear to be for direct accesses to the error document itself?!"

As the rule is !^gone\.html - [G] a direct request for /gone.html will inevitably result in a 200. The reason for me having to use that awkward rule is explained in the previous thread. (The old rule .* - [G] that had worked fine for 13 years started to produce 500 errors in mid May, and that problem is unsolved.) The spammer or his bot is smart. Yes, blend27, it is semalt. A lot to study in those two threads.

But that leaves the other mystery. As whitespace said: "You shouldn't be seeing the 301 in your logs." The spammer is blocked by RewriteCond %{HTTP_REFERER} but this section of the .htaccess is obviously not functioning properly. I have for the time being reverted to .* - [G]. Let us see if this eliminates the spammer's 301s.

whitespace

10:35 am on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



That is, however, not always so. The interval is sometimes longer. .... intervals of up to 8 seconds. That is not how mod_rewrite works.


That's "normal" for any kind of redirect. Whilst 8 seconds might seem a bit long for a "normal user", it is still quite plausible - the nature of the beast - particularly for a bot. This has nothing to do with mod_rewrite (or your server). The redirect is processed entirely by the client / user-agent / bot. So is dependent on the user-agent and their connection. For a normal user sat at their browser, the redirect is normally perceived as instantaneous (at least within a second). However, a bot (which is no doubt what this traffic is) can choose when to issue the second request (the target of the redirect)... the bot is probably busy pestering many sites and might be on a bad connection as well.

whitespace said: "The 200 status here would appear to be for direct accesses to the error document itself?!"


Yes, it could be, or as the result of a redirect - which equates to the same thing. To be honest, when I said that, I assumed your error document was being served correctly (a 410 via the [G] flag, using the code in your initial post), in which case you would indeed only see a 200 response when there was a direct request to the error document.

As the rule is !^gone\.html - [G] a direct request for /gone.html will inevitably result in a 200.


That rule doesn't really have anything to do with a direct request resulting in a 200. Error documents must be accessible - this is an Apache thing. You can't get Apache to trigger a 410 when the custom 410 document is accessed - that would create an endless loop (ie. 500 error). Trigger 410, serve gone.html, trigger 410, serve gone.html, etc. You need to prevent that loop somehow. This is probably covering the same ground as your other thread, but for your old rule (.* - [G]) to work correctly you must have also had an exception somewhere to permit direct access to the error document (otherwise you'll get a 500 error). (You can, however, trigger the 410 in your server-side script ie. in the 410 error document itself, since this won't trigger Apache to re-serve the custom error document - this all happens long after the Apache config/.htaccess file has been processed.)

For example, the following two rulesets are equivalent...


# Exception "built-in" to the one directive
RewriteRule !^gone\.html - [G]



# Exception for the error document
RewriteRule ^gone\.html - [G]
# ...lots of other stuff here...
RewriteRule .* - [G]


If .* - [G] worked fine and then it didn't then it's more likely that the earlier "exception" got moved/changed/overridden.

This does make me wonder what the following directive really is and whether this should be at the top of your script? ...


RewriteRule *|* - [L]


I have for the time being reverted to .* - [G]. Let us see if this eliminates the spammer's 301s.


But the other two rulesets for IP and UA blocking (on either side of the Referer block) use !^gone\.html - [G] and they appear to work OK? (Which is an even bigger mystery.)

thord

2:47 pm on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I see – 8 seconds does not prove anything. Some of the hacked computers used by this log spammer has been in Laos. A waste of the communication resources of a poor country.

My RewriteRule *|* - [L] is irrelevant here. It stands for RewriteRule image1|image2 - [L], i.e. those two image files are always to be excepted in the hotlinking protection.
But the other two rulesets for IP and UA blocking (on either side of the Referer block) use !^gone\.html - [G] and they appear to work OK?

I can confirm that. All hits from banned IPs and banned UAs result in a 410, and no 301.

I have now been testing the old rule .* - [G] for the HTTP_REFERER block for half a day and caught the following sample:
*.242.246.23 - - [27/Aug/2016:13:59:56] "GET / HTTP/1.1" 500 - "http://pizza-tycoon.*/" "Mozilla*"

Just that single entry. No 301 hit. No 500 loop (unless not automatically prevented by the server). No spam entry in the referring site report – fine. But deliberately allowing 500s is improper.

whitespace

4:02 pm on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



No 500 loop


The 500 HTTP status ("500 error") you see in the "access log" is the result of the rewrite loop. (There's no such thing as a "500 loop".) The server is basically "giving up" and bombs out with a 500 error. At that point processing stops.

If you were to examine the server "error log", you should see something like:

Request exceeded the limit of 10 internal redirects due to probable configuration error.


And if you had access to the server config, you could also set "LogLevel debug" and see exactly what those "10 internal redirects" were. (Most probably a repeating series of requests for "gone.html".)

Just to add... In this instance, the user (although a bot in this case) may well see, what looks like, a system generated 410 error page - but this will be served with a 500 status with an additional note along the lines of: "Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request."

No 301 hit.


Is it possible that this request is already for the canonical "www" version of your site? Otherwise, there shouldn't really be any difference in this respect?

thord

5:04 pm on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



"Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request." That was what I was seeing myself when conducting the tests suggested by lucy24.
Yes, it is possible that the request was for the canonical version. There is no way for me to know. Although the spammer is obviously the same they may use both versions randomly. So this did not bring us any further.

blend27

6:04 pm on Aug 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why cant You try

RewriteCond %{HTTP_REFERER} (seo|pizza) [NC]
RewriteRule .? /errorfolder/410.php [L]

where 410.php throws it own 410 status code from with in PHP code?

thord

4:24 am on Aug 28, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



the user (although a bot in this case) may well see, what looks like, a system generated 410 error page - but this will be served with a 500 status with an additional note along the lines of: "Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request."

Yes, that is what happens when the rule is simply .* - [G]. I have seen it all on the screen myself when testing. But why? If Apache has generated its standard 410 page, why does it additionally look for a custom 410 ErrorDocument? If there is none a 500 will result.

But I do have a custom 410 page. Why does Apache not find it?

It looks like the host in mid May made some server reconfigurations causing this new problem.

whitespace

4:27 pm on Aug 28, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



the user (although a bot in this case) may well see, what looks like, a system generated 410 error page - but this will be served with a 500 status with an additional note along the lines of: "Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request."


CORRECTION: Actually, this does appear to be served with a 410 status (not a 500, as I stated) in the response sent to the client. However, only the 500 status is recorded in the access log (no mention of a 410). (I guess I was only looking at the access log when I tested that yesterday!?)

This "system generated" 410 page is different from the default 410 Apache error page.

whitespace

5:06 pm on Aug 28, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month




Why cant You try

RewriteCond %{HTTP_REFERER} (seo|pizza) [NC]
RewriteRule .? /errorfolder/410.php [L]


It's worth a shot. Although why something like this would work and the former [G] method does not would tend to fuel the mystery.

However, for this to work in .htaccess (to not generate a rewrite loop) you would need to either remove the slash prefix on the substitution (make it relative), or include an exception to avoid rewriting the error document itself.

The other potential caveat is that the OP appears to be using plain HTML - there's been no mention of PHP (or any server-side scripting for that matter)?

whitespace

6:01 pm on Aug 28, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, that is what happens when the rule is simply .* - [G]. I have seen it all on the screen myself when testing. But why? If Apache has generated its standard 410 page, why does it additionally look for a custom 410 ErrorDocument? If there is none a 500 will result.


It's really the other way round. It doesn't "additionally" look for a custom 410 ErrorDocument. It generates its own 410 page when it has already failed to serve the custom 410 ErrorDocument. It fails because when it requests the custom 410 ErrorDocument, that directive tells it to serve the 410 ErrorDocument... again and again. It's like a recursive call without a terminating condition. Only when it has failed 10 times (the default) does it generate its own 410 page.

As mentioned earlier, the custom error documents themselves must be accessible. ie. You must not rewrite the request when a custom ErrorDocument is requested, otherwise you get this "loop".

But I do have a custom 410 page. Why does Apache not find it?


It's not really a case of "not finding it". The rule in .htaccess blocks it, but then requests it again (the [G] flag)... rinse and repeat.

IMPORTANT: However, if you don't have a custom ErrorDocument defined and are instead just relying on the default built-in Apache error message, then you don't get a rewrite loop (and no 500 error). You simply get the default 410 page returned with a 410 status and 410 is recorded in the access log.

It looks like the host in mid May made some server reconfigurations causing this new problem.


This sounds plausible... if the host declared a custom ErrorDocument in the server config then a lone RewriteRule .* - [G] directive in your .htaccess file could potentially start generating these rewrite loops and 500 errors recorded in your access log. (Wasn't this covered in your earlier thread?) However, you can override this by declaring your own ErrorDocument, or even reverting to the default:


ErrorDocument default


However, this does not seem to explain the initial behaviour you are seeing. (A supposed 200 OK response when an ErrorDocument is served.)

----

Just to clarify... there are 3 different error reponses we are dealing with here...

1. The "default" Apache error response. Such as when a 410 is triggered and there is no custom error document defined.
2. A "system generated" (for want of a better term) Apache error response. Such as when a 410 is triggered, but an additional error occurred trying to serve the "custom" error document. This notably contains the text, "Additionally, a 500 Internal Server Error error was encountered....".
3. The "custom" error document as defined by the ErrorDocument directive. Such as when a 410 is triggered and there are no further (config) problems.

thord

6:06 am on Aug 29, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



if you don't have a custom ErrorDocument defined and are instead just relying on the default built-in Apache error message, then you don't get a rewrite loop (and no 500 error). You simply get the default 410 page returned with a 410 status and 410 is recorded in the access log.

That is how it was for my site before mid May. Everything was fine. In mid May the 500s started to appear.
if the host declared a custom ErrorDocument in the server config then a lone RewriteRule .* - [G] directive in your .htaccess file could potentially start generating these rewrite loops and 500 errors recorded in your access log.

In my previous thread I presented the only explanation I have gotten from the host (verbatim translation):
"RewriteRule .* - [G] makes the server look also within the .htaccess area for a custom 410 error document [which I did not have at that time], in doing which another 410 is caused as well as a loop resulting in a 500 error in the log".

They suggested I use the following rule:
RewriteRule foobar.html - [G]
which will produce a 404 in the log, they said. If I next define
RewriteRule foobar.html - [G]
ErrorDocument 410 /410.html
a normal 410 will be logged, they said.

I thought that was preposterous and that extra 404 undesired and interfering with analysis, so I started to use RewriteRule !^gone\.html - [G] and informed the host, but they did not comment. I am unable to understand why that works fine for blocking by IP and by UA but not for blocking by referrer.

I have now removed my custom 410 error page and is testing (for all three blocking sections) RewriteRule .* - [G] in combination with declaring ErrorDocument 410 default

thord

7:55 am on Aug 30, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have now for 24 hours been testing this setup in the .htaccess file:

Reverted, for all blockings (by IP, UA, referrer), to my original RewriteRule .* - [G], i.e. the one that in mid May started to produce 500 errors.

Removed my custom 410 page from the server and replaced its declaration with ErrorDocument default as whitespace said I could do. (Before mid May I had no 410 declaration at all and had been relying on the built-in Apache error message, and that functioned perfectly.)

So far everything has worked fine! No 500s, and of course no code 200 hits on an error document. The only thing novel in the .htaccess is the declaration ErrorDocument default. Who can explain this?

whitespace

9:16 pm on Aug 30, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



In my previous thread I presented the only explanation I have gotten from the host (verbatim translation):
"RewriteRule .* - [G] makes the server look also within the .htaccess area for a custom 410 error document [which I did not have at that time], in doing which another 410 is caused as well as a loop resulting in a 500 error in the log".


I think we're probably covering old ground, but anyway... it would only look for a "custom 410 error document" if one had been declared somewhere (with an ErrorDocument directive) - perhaps in the server config. ("the .htaccess area" is an odd/incorrect phrase - unless that is just a fault of the translation?)

They suggested I use the following rule:
RewriteRule foobar.html - [G]
which will produce a 404 in the log, they said.


It would only do this if they had configured this in the server config. One way to do this is to define a custom 410 ErrorDocument and within that document set a different HTTP response header with a 404 status. (This does seem silly. Sometimes you might want to override a 404 and serve a 410 instead, but not the other way round?)

What does this error document look like?

These examples wouldn't result in a rewrite loop because they only block one file ("foobar.html"). But when you start diverting all files that match some other criteria, then you can potentially run into problems unless there is some kind of exception configured as well.

If I next define
RewriteRule foobar.html - [G]
ErrorDocument 410 /410.html
a normal 410 will be logged, they said.


By setting your own 410 ErrorDocument you would override any that were declared earlier. Further evidence that the host has declared an ErrorDocument in the server config.

I am unable to understand why that works fine for blocking by IP and by UA but not for blocking by referrer.


Yes, that is still a "mystery". Please feel free to PM me a link to your actual .htaccess file. I'm curious. Since you are able to change the .htaccess code to change the result it rather discounts the possibility that another server module (such as mod_security) is intercepting the request.

...replaced its declaration with ErrorDocument default ... So far everything has worked fine! No 500s, and of course no code 200 hits on an error document.


"ErrorDocument default" again overrides any that are declared earlier and resets it to the Apache default (further evidence). "No 500s, and of course no code 200 hits" - and presumably a bunch of 410s?

Who can explain this?


During mid May, your host changed the server config and declared a custom ErrorDocument. However, the "mystery" (above) suggests an error with your .htaccess file.

thord

4:47 pm on Sep 6, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



One week ago I removed my custom 410 page, replaced the declaration with ErrorDocument default and reverted to the original RewriteRule .* - [G] in all three blocking sections. Three days ago I further changed that rule to the more server friendly (^|/|\.html)$ - [G] as recommended by lucy24. No other changes has recently been made to my .htaccess file.

I can now report that my .htaccess is finally working exactly as it should, although it has proved impossible to explain what was wrong and where. I want to say thank you to whitespace, as well as the other members who have taken part in this lengthy but instructive discussion.