Welcome to WebmasterWorld Guest from 54.158.248.167

Forum Moderators: phranque

Trying to find a hidden redirect

     
3:12 pm on Jun 22, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 12, 2006
posts:2637
votes: 89


Hello. I've got a little annoying problem that I'm hoping someone can help me with.

I ran my site through SEMrush the other day and it came up with a few redirect errors on my homepage.

the homepage contains some links to other pages on my website that seem to work perfectly fine when you click on them, but SEMrush is insisting that they're a "redirect loop" which "never resolve to their destination".
they do -- they work fine when you click on them.
but they also say that "if you can’t spot a redirect chain with your browser, then your website probably responds to crawlers’ and browsers’ requests differently, and you still need to fix the issue."

I'm totally stumped. The only redirects i've set up for those pages are in the .htaccess file, sending the non-www version to the www version, and the http version to https -- but the links are all
https://www
anyway, so I don't think it has got anything to do with that.

(here's the code in my htaccess file anyway)
RewriteEngine on
# send people from non-www to www
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
# send people from http to https
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

I suppose I'm just wondering whether anyone knows of a tool which will display each URL in the loop, so I can see the chain.
3:51 pm on June 22, 2017 (gmt 0)

Senior Member from NL 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1498
votes: 198


You can use the Chrome developer tools to see the chain. Open the Network tab and check "Preserve log", then browse to the initial URL.

Or you could use a web-based tool like Webpagetest.org for a similar result.
4:27 pm on June 22, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3451
votes: 181


Most of us are using one rule to cover both of your rules there and the [L flag instructs the server to stop processing on that request. That first rule would even rewrite requests for "www.example.com" and the [NC] flag makes the server test for all variations - think of eXample.com, exAmple.com,ExaMple.com, etc., etc. - you get the idea. If you know for certain that there may be incoming links for some case other than 'example.com' it would be simpler to add a line for that known variation. Using %{HTTP_HOST} in the Rule target will simply use whatever form was requested. It can be used as a condition, but best avoided in the target.

Someting like this combines the conditions into one rule:
RewriteCond %{SERVER_PORT} =80 [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
And if you expect different cased requests, you can add a line for that with the [OR] flag.

If you want to test what is happening to a request, you can use various header tools. I like the FF Extension 'HTTP Live Headers' for that task, but there are many others. Examining the headers lets you see each request and response between the browser and the server during the loading of a page. They can be very long conversations as each image and script and cookie gets a paragraph or so. Have patience and preferably record it to a .txt file so you can examine it with care.
4:53 pm on June 22, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14028
votes: 521


Edit: Darn, not2easy, you keep typing faster than me.


Incidentally, you can combine those two redirects into one uber-canonicalization redirect:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]

Some sources recommend !on instead of off, though I haven't seen the explanation. There's also been some recent discussion about [NC] in the www part of the rule; it's probably only an issue if your preferred hostname is not all-lower-case.

if you can’t spot a redirect chain with your browser, then your website probably responds to crawlers’ and browsers’ requests differently, and you still need to fix the issue

This strikes me as gibberish. Each request is an independent entity, whether it comes from a browser or a robot; your server doesn't know what happened on the previous request, a millisecond earlier. That's assuming they're not accusing you of some form of cloaking (intentionally sending different responses to different visitors).

Now, some rare robots will take a redirected request and make that into the referer of the fresh request, which may create problems. (It's also flat-out wrong, because a redirect is supposed to keep the original referer, whatever it was, but try telling them that.) For example: if they request example.com/pagename and get redirected to www.example.com/pagename, that second request will give example.com/pagename as referer. Your access logs will tell you if this is going on. In rare cases you'll have to poke holes so a robot can bypass rules that were intended to block auto-referers.

Log check suggests that the SemrushBot doesn't engage in this particular behavior, but it was worth checking. They do seem to be very fond of requesting /directory without trailing slash, so make sure you've got a redirect in place for any URL that doesn't correspond to a real, physical directory.
7:04 pm on June 22, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3451
votes: 181


Edit: Darn, not2easy, you keep typing faster than me.
Just started earlier.

Beside yours are usually more thorough and useful. I would not have thought to check what Semrush does.
7:56 pm on June 22, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 12, 2006
posts:2637
votes: 89


thanks for the htaccess code tips, i've changed all that over.
can't see what the problem is with the redirect though -- i've tried robzilla's suggestions of looking in Chrome developer tools and Webpagetest.org but neither of them show any redirects at all (although they did show up how slow my supposedly asynchronous adsense is -- it stops everything else loading for nearly 2 seconds! ...but that's a whole other problem i can put off for another day)
i think I'm just going to ignore the SEMrush error, because nothing else is seeing the same problem.
cheers for the help
9:20 pm on June 22, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3451
votes: 181


I believe that the redirect problem is this:
# send people from http to https
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

The use of %{HTTP_HOST} in the target URL puts in whatever was originally requested. If a person requested "http://example.com/page.html", it would rewrite to "http://example.com/page.html". Since both of those rewrites use the [L] flag that tell the server to stop additional processing after processing the request, the second Rule might not have ever been used. The first rule's condition includes only the domain while the second rule includes the original (possibly non-www) request. I'm guessing that the first rule protected you from the effects of the second rule.

This may not apply in this case, but I see it a lot so I'll mention it here. The asynchronous AdSense script should be just before the </head> tag to work best. They give you all the code in one chunk, but the part between the <script> tags belongs in the head.

I can also say that even under perfect conditions, it can take longer to load than it should, particularly when "something" on a page has been changed.
10:47 pm on June 22, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 12, 2006
posts:2637
votes: 89


Alright, cool. I've changed it now anyway. I think the reason I had it like that in the first place was because I was using the same htaccess file on my localhost test server and the real server, and I thought that would work nicely for both... shows how expert I am with htaccess. A little knowledge is a dangerous thing

I can only run the SEMrush test once a month (free version), so i'll see if the changes fix it
6:24 pm on Aug 21, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 12, 2006
posts:2637
votes: 89


sorry to re-open this old thread, but i've just discovered that the new .htaccess code i've started using above has stopped redirecting example/index.php to example/

(I must have deleted the bit of code that did it, and now I can't find it)

so far the combined code redirects
http to https
non-www to www
and seems to add on a trailing slash to directories without one (although if I'm perfectly honest I can't work out how!)
so I just need it to redirect example/index.php to example/ and then I can retire a happy man

RewriteEngine on
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
6:52 pm on Aug 21, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10980
votes: 84


try this:

RewriteEngine on

RewriteRule ^index\.php$ https://www.example.com/ [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]


seems to add on a trailing slash to directories without one (although if I'm perfectly honest I can't work out how!)

https://httpd.apache.org/docs/current/mod/mod_dir.html#directoryslash
The DirectorySlash directive determines whether mod_dir should fixup URLs pointing to a directory or not.

Typically if a user requests a resource without a trailing slash, which points to a directory, mod_dir redirects him to the same resource, but with trailing slash for some good reasons...
7:24 pm on Aug 21, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14028
votes: 521


the new .htaccess code i've started using above has stopped redirecting example/index.php to example/

Your original redirect, involving www and https, would never have done that anyway. At most, it would redirect
http://example.com/index.php
to
https://www.example.com/index.php
which would then require a separate rule to redirect to
https://www.example.com/

On almost all sites, your last two redirects (where "last" means the physical order of RewriteRules) are
#2 redirect all /directory/index.html (or index.php or whatever you use) to /directory/ not just for the root but for all directories
#1 fix the domain name, including both www and protocol.
Rule #2 should have an explicit target: https://www.example.com/rest-of-path --i.e. none of that {HTTP_HOST} business--so no request would ever invoke both #1 and #2.

and seems to add on a trailing slash to directories without one (although if I'm perfectly honest I can't work out how!)

That's got nothing to do with mod_rewrite, assuming you're talking about real, physical directories. That's mod_dir doing its job. Its other job is to serve up the content of
/directory/index.php
when a visitor (correctly) requests
/directory/
alone.

Have you recently moved to a new server? I'm wondering if your overall DirectoryIndex settings have changed. This would be inherited from the config file, but can be changed in htaccess.


[edited by: not2easy at 1:33 am (utc) on Aug 22, 2017]
[edit reason] requested edit [/edit]