Welcome to WebmasterWorld Guest from 34.204.191.31

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Is this a proper rule for 410 Gone?

And an update to a previous thread

     
7:33 am on Jul 31, 2016 (gmt 0)

New User

10+ Year Member

joined:Jan 7, 2007
posts:39
votes: 1


After a long delay a person at my hosting company replied to my ticket regarding the problem described in thread [webmasterworld.com...]

Their explanation was brief. My RewriteRule .* - [G] makes the server look for a custom 410 error document (which I did not have) causing a loop (that is what they called it) resulting in a (single) 500 error in my log. As everything functioned perfectly before mid May I suspect they have made some server reconfiguration or update affecting .htaccess, but they did not comment on that.

So I made a custom 410 error document (410.html) and changed the rule to RewriteRule !^410\.html - [G] This seems to usually work and produces 410s in my log, but is this really a proper solution? (Just adding 410.html but keeping RewriteRule .* - [G] did not help.)

In addition it does not always function for referrer spam.
RewriteCond %{HTTP_REFERER} pizza\-xxxxx\.xxx
RewriteRule !^410\.html - [G]
may result in:
"GET / HTTP/1.1" 301 242 "http://pizza-
and then
"GET /410.html HTTP/1.1" 200 294 "http://pizza-
I.e. a successful request by pizza-xxxxx for 410.html is logged.

Neither does it always work for IPs blocked by {REMOTE_ADDR}
"GET /robots.txt HTTP/1.1" 301
"GET /410.html HTTP/1.1" 200

Those 301s are probably due to my www/non-www canonicalisation or image hotlinking protection. But they are at the end of the .htaccess file.
8:51 am on July 31, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


The [G] flag forces Apache to return a 410 Gone status with the response. This indicates that a resource used to be available, but is no longer available.

If the file has never existed, IMO it is better to not use any code and just let Apache return a 404.

Nothing in Apache says you need a custom 410 document. It is your server config (the hosting company) that is set up that way.

The examples you gave show it *is* working.
1:40 pm on July 31, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5507
votes: 5


The most obvious answer to your dilemma, which I don't see anybody asking you in the previous thread or this thread?

Are you error docs defined?
5:36 pm on July 31, 2016 (gmt 0)

New User

10+ Year Member

joined:Jan 7, 2007
posts:39
votes: 1


ErrorDocument 404 /notfound.html
ErrorDocument 410 /410.html

Must these lines be put before the www/non-www canonicalisation and hotlinking protection? They are now at the very end of the .htaccess file. Requests that I want to serve a [G] may be for html as well as jpg files.
7:24 pm on July 31, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


My RewriteRule .* - [G] makes the server look for a custom 410 error document (which I did not have) causing a loop (that is what they called it) resulting in a (single) 500 error in my log.


Just to clarify one point... the "G" flag makes the server respond with a 410 status (as keyplyr states). It does not necessarily make the server look for a custom 410 error document, unless this is explicitly stated with an "ErrorDocument 410 ..." directive - that defines the custom error document. Only if the ErrorDocument directive is present and pointing to a non-existent file could a "G" flag result in a 500 error - but this isn't a "loop". The error log should have more detail as to the nature of the 500 error.

However, if you did have a custom 410 error document defined (although you say you didn't) then a directive, such as the one above (RewriteRule .* - [G]) would indeed result in a "loop" (and possibly a 500 error), since a request for the 410.html document itself would trigger another 410, etc, etc, You would then need an exception, such as what you have included in your later code blocks.
8:13 pm on July 31, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15896
votes: 876


Must these lines be put before the www/non-www canonicalisation and hotlinking protection?

It doesn't matter at all in any way whatsoever, because different mods are involved. (ErrorDocument is technically not a mod at all; it's core. But the principle is the same.)

Ordering of directives only matters within any one mod. Otherwise, put things in whatever order is simplest and most convenient for you. For most people, that means putting all your one-liners--such as ErrorDocument directives--at the very beginning. Keep all directives for any one module grouped together, because anything else leads to madness. Madness for you, that is. The server doesn't care.
7:11 am on Aug 1, 2016 (gmt 0)

New User

10+ Year Member

joined:Jan 7, 2007
posts:39
votes: 1


Just to clarify one point... the "G" flag makes the server respond with a 410 status (as keyplyr states). It does not necessarily make the server look for a custom 410 error document,

Exactly. That is how I think too. But the web host's (shared hosting...) customer service person claimed**: "RewriteRule .* - [G] makes the server look also within the .htaccess area for a custom 410 error document [which I did not have at that time], in doing which another 410 is caused as well as a loop resulting in a 500 error in the log". **Verbatim translation.

Only if the ErrorDocument directive is present and pointing to a non-existent file could a "G" flag result in a 500 error - but this isn't a "loop".

Exactly. Makes all totally sense to me.

However, if you did have a custom 410 error document defined (although you say you didn't)

I did not have one defined at that time. The rule was simply .* - [G] , and that did work fine during the previous 13 years.

The error log should have more detail as to the nature of the 500 error.

Here is a sample from the time I used .* - [G] :
*.243.55.130 - - [21/Jun/2016:00:21:07 +0300] "GET *.shtml HTTP/1.1" 500 - "-" "Mozilla/5.0 (compatible; *blocked UA*)"
No more hits from *.243.55.130 or *blocked UA* on the (immediately) previous or following lines. That was, however, the server log. I do not now have an error log 500 sample as only the current month is shown in cPanel's error log, which does not look helpful anyway.


I also find my present (principally working) RewriteRule !^410\.html - [G] awkward and would rather use the more server friendly RewriteRule (^|/|\.html)$ - [G] as kindly suggested by lucy24.

Still, the !^410\.html - [G] does not work for {HTTP_REFERER}, as I said above. I have pondered that a spammer's request for example.com first gets redirected to www.example.com, thus the 301 in the log. (In case the undesired request is for an image it gets redirected to a blank png.) But then, on the next line in the log, there is registered a successful request for 410.html. There are no more lines caused by this spammer, just those two. But, as lucy24 said, it does not matter where in the module I put my ErrorDocuments.

That code 200 thwarts the very purpose of referrer blocking: eliminating spam from the statistics. In blocking by IP or UA no hits on 410.html are registered in the log exactly in the way things should be. (Except for that strange example above "GET /robots.txt HTTP/1.1" 301; "GET /410.html HTTP/1.1" 200 perhaps caused by the www redirect.)
8:10 pm on Aug 1, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5507
votes: 5


In addition it does not always function for referrer spam.
RewriteCond %{HTTP_REFERER} pizza\-xxxxx\.xxx
RewriteRule !^410\.html - [G]
may result in:
"GET / HTTP/1.1" 301 242 "http://pizza-
and then
"GET /410.html HTTP/1.1" 200 294 "http://pizza-
I.e. a successful request by pizza-xxxxx for 410.html is logged.

Neither does it always work for IPs blocked by {REMOTE_ADDR}
"GET /robots.txt HTTP/1.1" 301
"GET /410.html HTTP/1.1" 200


Your using these in a non-traditional method.
Try moving a few lower in your htaccess, that is below the domain canonicalisation.
8:25 pm on Aug 1, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Your using these in a non-traditional method.
Try moving a few lower in your htaccess, that is below the domain canonicalisation.
wilderness I believe thord is just giving examples here.

Also, as lucy24 says:
Ordering of directives only matters within any one mod. Otherwise, put things in whatever order is simplest and most convenient for you.


The 301s are, as thord discovered, the first request for non-www* and it gets forwarded (301) to the www then gets the 402 server response. The examples show the code *is* working properly.

*This is often the case with vertical bots (those that follow a list rather than follow web links.) If in fact the domain is set up for non-www, then the bot needs not make a second request.
9:14 pm on Aug 1, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5507
votes: 5


FWIW, lucy was referring to his question regarding the location of ErrorDoc definition.

There are many example in this forums archives (even by lucy) of order and 410's are always after canonicalisation

He's not interested in the final result anyway, rather just what the 410 shows up as in his stats.
9:14 pm on Aug 2, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15896
votes: 876


410's are always after canonicalisation

Jeepers, wilderness, I was about to say the opposite: the overall order should be
[F]
[G]
[R] of which the last two are typically index.html and domain-name canonicalization, in that order
[L]
unless there's a specific reason for putting one rule in a different place. (For instance, if you're serving a manual 404 or 410 to select robots, that would go with the [F] rules; if you're rewriting to a script that is to issue a redirect, that would go with the [R] rules.)

The most important of those specific reasons is
RewriteRule ^{all-your-named-error-documents} - [L]

which has to go before all other RewriteRules-- especially the ones with [F] flag.

I agree that if neither you nor the server has an ErrorDocument directive for a given category of error, there would be no reason for an infinite loop.

:: detour to look something up ::

Yes. The Apache default is "output a simple hardcoded error message". If you want an ErrorDocument, you have to say so. (Many hosts have standard names like "missing.html" or "forbidden.html", but not for every imaginable category of error!)