Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Undoing a 301 redirect... WMT warns of possible infinite loop

         

LuckyLiz

7:48 pm on May 26, 2013 (gmt 0)

10+ Year Member



I just discovered (using the fetch as Google tool in webmaster tools) that a programmer set up something in our CMS so that if someone types in sitename/topic (no slash) there is a 301 redirect to sitename/topic/ (with a slash).
The fetch as Google result has a warning message that reads: "The page seems to redirect to itself. This may result in an infinite redirect loop."

I'm going to have the programmer take out that 301 redirect, but I'm wondering if at the same time the programmer should also remove a 301 that redirects from sitename.com/topic/index.htm to stename.com/topic/
We have a canonical link tag on the pages.

Anyone have any thoughts on whether or not to remove the /index.htm redirect?

tedster

12:11 am on May 27, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In my opinion you should not remove either redirect. If you know you do not have an infinite loop in either case, a "warning" is just saying "Take a look", and not "There is a problem". In both these cases, Google used to have duplicate problems when the redirect was NOT in place. I wouldn't risk that, even today.

lucy24

2:18 am on May 27, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the page redirects to itself, any browser will let you know. A quick test should be enough to tell you if gwt is delirious. It would not be the first time.

In this specific case: The name of the page either has a trailing slash or it does not. Pick one and stay with it. It really ought to be no-slash, so it doesn't look as if every one of your pages is the index page of a nonexistent directory, but for practical purposes it doesn't matter. Use the one that seems to make your CMS happier ;)

I'm wondering if at the same time the programmer should also remove a 301 that redirects from sitename.com/topic/index.htm to stename.com/topic/

NO, NO, a thousand times no :) That one really does become Duplicate Content if you permit both forms. Keep the redirect. Your programmer can be presumably be trusted to set it up so this one doesn't become an infinite loop.

LuckyLiz

2:50 am on May 27, 2013 (gmt 0)

10+ Year Member



What behavior would I see from the browser if the page did redirect to itself? Would the browser give me an error message or act strangely - like continually reloading the page?


The problem with the trailing slash isn't so much how we use it, but how people link to us. Sometimes they'll use a trailing slash when they link to a directory index page and sometimes they don't.


It really ought to be no-slash, so it doesn't look as if every one of your pages is the index page of a nonexistent directory


I don't know what you mean by nonexistent directory. We have things divided into different topics, each of which is set up in it's own directory.

lucy24

5:33 am on May 27, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Nonexistent" = it doesn't physically exist. If it were a real, physical directory, the directory-slash redirect would be a non-issue because the server itself takes care of it. Inevitably some people will link to the wrong form of an URL, so you have to redirect them. Same as with people linking to "goodstuff/index.html" instead of "goodstuff/" alone.

It works in the other direction too. If you look way down at the bottom of the gwt list of places that link to you, you'll find pairs or even large sets that are obviously all the same page. Those are people who don't redirect. But even though they're listed multiple times, I kinda doubt that google is fool enough to count them as two or more separate links.


All current browsers recognize an infinite redirect and will nip it in the bud.

:: detour for business with test site to refresh memory ::

The error message from your browser will look like this:
Redirect Loop

Redirection limit for this URL exceeded. Unable to load the requested page. This may be caused by cookies that are blocked.

Camino has stopped trying to retrieve the requested item. The site is redirecting the request in a way that will never complete.

* Have you disabled or blocked cookies required by this site?
* NOTE: If accepting the site's cookies does not resolve the problem, it is likely a server configuration issue and not your computer.


Well, in this case I know it is a "server configuration issue" because I intentionally added the line

RewriteRule ^dunnykin/ http://www.example.com/dunnykin/ [R=301,L]


just so I could get the exact wording of the error message ;)

Site logs tell me the browser heroically tried ten consecutive times before giving up. This type of error has to come from the browser because the server doesn't know it's going on; each request is an island. (Conversely, an internal rewrite leading to an infinite loop will get you a 500 error from the server. This time it's the browser that doesn't know it's going on.)

The browser never gets as far as reloading the page, because every time it puts in a request it is told to ask for something else-- even if the "something else" is the exact same thing it has asked for ten times already.

If browsers did not do this, every badly coded www site would lead to people having to force-quit their browsers. HTML and CSS are both designed to be extremely forgiving.

LuckyLiz

5:26 pm on May 27, 2013 (gmt 0)

10+ Year Member



Thanks for the explanation! I wish I knew more about programming and how things worked (sigh).

The directories I was talking about in the example I gave are real directories. They'd be things like site.com/goodstuff/ and site.com/prettythings/ etc. So if I understand what you were saying, we didn't need a 301 from site.com/goodstuff to site.com/goodstuff/? What about the site.com/goodstuff/index.htm then? If it matters, it's an asp.net site that's using a CMS and some kind of url rewrite to make page names have real names instead of pageid numbers.

We also don't show up in the first few pages of Google for the title tag that is on any of those directory pages anymore. I don't know when that happened. I don't know if it's related to when the 301 redirects got put in or whether it is just a result of the Google change that's promoting big name companies over independent sites that used to rank well.

lucy24

8:13 pm on May 27, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The directory-slash redirect is automatic unless you have intentionally turned it off. So if you ask for
example.com/directory
and that's a real, physical directory, the server will step in and 301 redirect you to
example.com/directory/

Whoops! This is assuming Apache. I don't know whether That Other Server does the same thing. But your programming geek will know.

The "index.html" (or extension of your choice) only works in one direction. If someone asks for
www.example.com/directory/
the server will look in /directory/ and try to come up with a file called "index.html". User's address bar won't change. But if user asks for
www.example.com/directory/index.html
the server will proceed directly to the requested page unless you have an explicit redirect. (If you've done hanky-panky with mod_mime, the user may even get to the right place by requesting "/directory/index" and that's all. But let's not make trouble.)

a CMS and some kind of url rewrite to make page names have real names instead of pageid numbers

That makes all the difference in the world. If all your pages and directories are "really" just
example.com/index.php?long-complicated-query-here
then all those index redirects won't happen, because the server doesn't see the pages and directories involved.