Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Tracking down internal redirect loop involving index.php and referer G

     
8:33 pm on Oct 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


Unless this is one of those easy everybody-but-me-already-knows-the answer to this things, all I'm really hoping for here is some pointers to prior Webmasterworld threads which might give me better clues to solving my problem, which is this.

Recently (or maybe longer) and unpredictably visitors coming mostly from Google Images are receiving 500 server errors when trying to access images and the URLs containing them on my site, which appear to me to be infinite redirect loops involving index.php.

I've tried to follow in their footsteps myself by locating the resource they were seeking in Google Images and seeing what happened to me. What I got when clicking on the image in GI was my default 403:

Forbidden

You don't have permission to access /index.php on this server.

Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.



My raw access and error logs, respectively, show this:

Access.log (redacted):

IP - - [17/Oct/2018:12:35:54 -0700] "GET /YYYY/MM/DD/slug/ HTTP/1.1" 500 747 "https://www.google.com/" "user agent"

Error.log (redacted):

[Wed Oct 17 12:35:54 2018] [error] [client IP] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace., referer: [google.com...]

In other words, my 403 browser response corresponds to a 500 server error implicating an infinite redirect loop.

I went further and located the same page in the regular Google search engine and had no problems bringing up the URL from within Google Web search results, nor by simply pasting the URL into my browser.

The only references to index in my .htaccess file other than BrowserMatch user agent entries containing "index" are these:

- my caching plugin at the very beginning contains these lines referencing index.html

RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/all/$1/index.html -f [or]
RewriteCond /home/user/example.com/wp-content/cache/all/$1/index.html -f
RewriteRule ^(.*) "/wp-content/cache/all/$1/index.html" [L]


- my .htaccess file contains these two directives at the top

 # Define directory index
DirectoryIndex index.html index.php


 # Disable directory browsing
Options -Indexes


- and this boilerplate WordPress section at the end

# BEGIN WordPress
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress


The problem seems to me to be some conflict originating in Google images that triggers an infinite redirect lop somehow centered around that most basic Options -Indexes directory browsing block, but I can't for the life of me figure out how that would be.

Again, any hints appreciated.

(Note: for the next unpredictable hours I will be without electrical power while some work is being done, so if I don't respond it's because I'm temporarily shut down. I'll try if possible to answer any questions that may pop up before then.)
9:04 pm on Oct 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11874
votes: 245


If you have sufficient access to your server configuration you could try logging mod_rewrite for clues
9:12 pm on Oct 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


EDIT; No. I'm on a shared server and the only control access I have is .htaccess.
9:57 pm on Oct 17, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


Crystal ball says that you have neglected to poke a hole for your custom error page, leading to an infinite loop:
request gets a 403 for whatever reason
server asks for error document to accompany the numerical response
this internal request is denied, on the same grounds as the initial 403
server asks for error document to go with this fresh 403
this internal request is...
...et cetera, until the server puts its foot down.

Make sure that every module that can issue a 403 has an exemption for your custom error document. In most cases this involves two things:
a <Files> envelope naming the document, containing an Allow from all (2.2) or a Require blahblah (2.4) directive
and
RewriteRule ^forbidden\.html - [L]
(using, ahem, the actual name and URLpath of your 403 document). This rule goes before all other RewriteRules.
10:06 pm on Oct 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 15, 2003
posts:2645
votes: 7


Do you have access to the apache Vhost definition? It sounds like that is where the problem lays.
10:56 pm on Oct 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


Lucy24, I don't have a custom 403 page. My theme does have a custom 404.php page.

Demaestro, no I don't have access on my host's shared server to vhost. I did some quick Googling after Phranque's question and determined I would need such access to log mod_rewrite, hence my edit.

It seems there must be something obvious here involving index.php when provoked specifically from within Google images, probably including how WordPress works, but I just don't know enough to frame that conceptual perception.
11:21 pm on Oct 17, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.
This is the part that should never happen. The server is trying to bring up the ErrorDocument, and is being denied. Does your host have a default ErrorDocument name? They're often built-in, even if an individual site may choose not to use it; you'll find the name somewhere in your host's documentation. (That's why I used "forbidden.html" above.) Put in the suggested RewriteRule, using the host's default name, and see if there is any change in the response you get.

[Wed Oct 17 12:35:54 2018] [error] [client IP] Request exceeded the limit of 10 internal redirects due to probable configuration error.

:: detour to test site to check something ::

Oh, right. It isn't logged as an error until you hit the 10-internal-redirects mark, so there's no way to see what is happening in the meantime. (External redirects--the kind where the browser has to step in--are easy to trace because you see all the separate requests in access logs.) RewriteLogs--which can only be set in config, not in htaccess--are only useful if the loop is, in fact, the result of rewriting rather than some other activity.
11:39 pm on Oct 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


Lucy24, there's no 403, ErrorDocument, or forbidden anything anywhere in my shared server installation I can get to, nor have I invoked any textually from within my .htaccess file.

I was under the impression Apache had some sort of on board default 403 response, and I was assuming the text I quoted, which is all I have ever seen when I've tripped a 403 myself (and different from the equally boilerplate 500 text I periodically get when giving my .htaccess the vapors) was it.

This is the only 403 response I have ever seen on any of the shared Apache 2.2 servers I've occupied at this same host continuously since 2012.
11:08 am on Oct 18, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


I may have solved my immediate problem, though not the larger one of the redirect loop nor of the phantom error document Lucy24 highlighted.

Why were visitors referred by Google being forbidden? Because Google wasn't exempted in my anti-hotlinking code, it seems:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$ [OR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.*)?\.example\.com
RewriteRule \.(bmp|gif|jpe?g|png)$ - [F]


Since I temporarily killed the code, the problem has abruptly stopped. Only correlation, of course, not causation, but I'll take it for now.

Which begs several questions. Have I simply been using disco era anti-hotlinking code while the rest of the world has moved on? What is the preferred form these days?

Should I simply welcome Google into yet one more aspect of my life, now as a trustee for my images?

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$ [OR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.*)?\.googlample\.com [OR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.*)?\.example\.com
RewriteRule \.(bmp|gif|jpe?g|png)$ - [F]


Tangential question: I have seen this same code boilerplate where other file types - .pdf, .zip, css, etc. - are included along with images. Is this good practice?

These questions for the immediate anti-hotlinking situation. In the larger picture, the redirect loop 403/500 error problem seems connected to the way my host handles mod_rewrite on these Apache 2.2 shared servers, because every time I've been able to successfully replace mod_rewrite code with a SetEnvIf alternative, I then get proper 403s in my logs rather than the redirect loop 500s, or, again, that has at least seemed to be the correlation to me.

So: immediate problem solved, and until anyone presents a more elegant solution I'm going to simply add a Google exemption into my anti-hotlinking rewrite block.

The larger "existential" question of why mod_rewrite seems to provoke redirect loop 500s rather than 403s I'll leave open. Surely I'm not the only one to experience this, and, again, any Webmasterworld links addressing these phenomena would be appreciated.

(A little less than an hour now, and I'm without power for the rest of the day.)
5:04 pm on Oct 18, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


!^http(s)?://(.*)?\.example\.com
Overkill. All you need to say is
\bexample\.com
or at most
^https://example\.com
If you are getting fake referers--very unlikely in this situation--the last thing you want to do is permit incorrect forms, like http when your site is https, or with-www when your site is without.

If you change from http to https, put in an optional s? for a few weeks and then make it non-optional.

If you allow search engines to index your images, you must poke holes for all of them in your anti-hotlinking code.

:: shuffling papers ::

RewriteCond %{HTTP_REFERER} !^http://example\.com/
RewriteCond %{HTTP_REFERER} .
<snip>
RewriteCond %{HTTP_REFERER} !\b(google|translate|bing|duckduckgo|ecosia|search\.yahoo|yandex)\b
ymmv, obviously, notably in the case of “translate”. The <snip> here refers to
#1 one group of images that are handled differently,
and #2 a couple of external sites that are allowed to hotlink, such as forums where I post an image.

I was under the impression Apache had some sort of on board default 403 response
On shared hosting there will often be a meta-default where the config file specifies an ErrorDocument, and a corresponding <Files> in config makes sure everyone can see it. This means that error logs of sites without the specified custom 403 page will show two lines for every 403: one for “request denied” and a second for “file not found”. But no infinite loops, and the request receives the intended 403.
8:31 pm on Oct 18, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


Thanks, Lucy24.

I'm not sure I understand this, though:

On shared hosting there will often be a meta-default where the config file specifies an ErrorDocument, and a corresponding <Files> in config makes sure everyone can see it. This means that error logs of sites without the specified custom 403 page will show two lines for every 403: one for “request denied” and a second for “file not found”. But no infinite loops, and the request receives the intended 403.


It sounds like you're saying my host should be making me aware of the location of its meta-default error document in some way, but where should I be looking for it?

I just ran a test to see exactly what the basic forbidden response would be without the complications assumed above by adding a simple files deny to my .htaccess file

<files offlimits.txt>
order allow,deny
deny from all
</files>


and the response was basically identical to the one above

Forbidden

You don't have permission to access /offlimits.txt on this server.

Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.


So what is generating this message? My server? My browser?

The associated (redacted) error log lines are

[Thu Oct 18 13:10:20 2018] [error] [client IP] client denied by server configuration: /home/user/example.com/offlimits.txt
[Thu Oct 18 13:10:20 2018] [error] [client IP] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.


IOW, another of those which shows as a 500 server error in the access log, even though it's a simple mod_authz deny, and obviously different from this (redacted) error line near it

[Thu Oct 18 13:12:51 2018] [error] [client IP] client denied by server configuration: /home/user/example.com/
[Thu Oct 18 13:12:51 2018] [error] [client IP] client denied by server configuration: /home/user/example.com/forbidden.html


Except there simply is no forbidden.html, hidden or unhidden, under example.com or anywhere else I can access.

I'm sorry if I seem so stupid, but, frankly I was expecting the latter response to my offlimits.txt test, not the former.
8:54 pm on Oct 18, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4562
votes: 364


Your WP basic snippet is missing part of the standard form:
 # BEGIN WordPress
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress


Normally it has this form:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress


I understand that it may appear that you can remove the <IfModule mod_rewrite.c> </IfModule> tags and it should work just fine, but on an experiment years ago, I found that is not the case. It may make no sense but see whether adding back the actual default makes a difference in your other issues. When I tried it the way you're using it, I learned not to treat it logically.

I'm not saying that this will magically resolve your error page issues, but my experience taught me to leave it alone or you may notice strange behavior.

9:16 pm on Oct 18, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


Hmmm...couldn't hurt.
11:17 pm on Oct 18, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


Listen to not2easy. She speaks fluent WordPress ;)

So what is generating this message? My server? My browser?
The server. In fact it's pretty inept of the server to be putting the part about the ErrorDocument out in the open, but that's what servers do. The internal redirects, on the other hand, do point to some kind of misconfiguration. If it were your own server you certainly wouldn't want it handling eleven internal requests (1 + 10) on every single 403, since that means it's doing ten times as much work as necessary, on just those requests where you don't want it doing any work at all.

Not all hosts have a default error document. If they do, it should be pretty easy to find in the documentation. If they don't, they don't. (But if so, why does the server think there's an ErrorDocument, as indicated by the error logs? not2easy, this isn't something in WP is it?)

In any case it may be worth it to make one of your own. (Tangent: A lot of experienced webmasters think there's no point in a nice 403 page, because nobody gets blocked except unwanted robots anyway. But it simply ain't so. For years and years before I had websites of my own, I thought of 404 as “no page” and 403 as “no directory” because that’s where I, as a human, would most often encounter a 403. Nothing malign about it.)
12:44 am on Oct 19, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 44
votes: 2


not2easy: implemented

Lucy24: Yep, that did it. It turns out, though, that my host has not fully enabled/has crippled the possibilities available according to Apache:

[httpd.apache.org ]

That is, a custom 500 error page or even a simple .htaccess ErrorDocument 500 directive is simply not allowed, and, moreover, its absence - i.e., host setup only - triggers the same syntax as the previous host 403 boilerplate: "Additionally, a 500 server error was encountered while trying to use an ErrorDocument to handle the request." C'est la host.

I was, however, able to put a simple text message highlighted with h1 html markup as one line in my .htaccess ErrorDocument 403, which both stopped the redirect loop and reduced the bandwidth hit to a frugal 573 bytes. I may get more elaborately accommodating in due course.

Together with not getting skinned alive financially by the electrical trades, and today goes in the win column.

Thanks to all who helped.