Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Web crawl > URLs not followed > "Redirect error"

         

Marfola

11:18 am on Apr 6, 2009 (gmt 0)

10+ Year Member



Google Webmaster Tool is showing numerous "Redirect errors" on links that are properly 301 redirected.

Web crawl > URLs not followed > "Redirect error"
http://www.example.com/category/subcategory/page/

Note that:

1. URLs shown above as redirect errors are not in my site nor are they in my sitemap.

2. The urls in my site and sitemap are all final, target URLs (return a 200 header) not one redirects.

3. I do not have any chain redirects. I manage the 301 in a single redirect and redirect to the final url. I have tested every single URL (as well as every possible permutation) in Websniffer and get the desired header for each. I get a 301 permanent redirect (to the final URL) for URLs redirected, 404 not found for pages that don't exist, and 200 for my target URLs, ie all of the URLs in my sitemap and site.

4. There are no conflicts between my .htaccess and http.conf or any other underlying server configurations.

The post [google.com] suggests I'm not the only webmaster with this problem. Can someone please help find an answer.

FYI: I've excluded the canonical link element as a solution to the problem. Per Matt Cutts' recent presentation [googlewebmastercentral.blogspot.com] the 'tag' is for sites unable to manage redirects at the source.

I'm posting this in WebmasterWorld as I have yet to receive an answer in Google Support Forum. (I've posted the question several times in the past 2 months).

jdMorgan

2:08 pm on Apr 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have tested every single URL (as well as every possible permutation) in Websniffer and get the desired header for each.

It sounds like you've been very, very thorough, both in your 301 handling and in your testing. But since you also say that this problem has persisted for months (which wouldn't likely be the case if this were just a GoogleGlitch), I have to ask: Did you test your URLs with a Googlebot user-agent?

We've had several reports here at WebmasterWorld of .htaccess hacks that 'cloak' the attacked sits, serving alternate content and links to Googlebot and a few other major search engines' robots. While I'm only aware of .htaccess hacks, this malicious cloaking could also be accomplished by hacking a script on the victim site.

I'd suggest testing URLs that should and should not redirect and URLs that are and are not valid on your site, using a genuine GoogleBot User-Agent header string copied from your raw access log file. Alternately, a line-by-line review of all code in the actual .htaccess file(s) on your server, looking for search engine User-agent substrings (e.g. "Googlebot", "oglebo"), would be a good idea.

Hopefully, this is just a rare long-lived GoogleGlitch, but you're right to take it seriously.

Jim

Marfola

11:50 am on Apr 7, 2009 (gmt 0)

10+ Year Member



jdmorgan,

Thanks for your comments.

I've tested my URLs in User Agent Switcher FireFox plugin with the Googlebot Agent. They all serve the desired content.

I've reviewed my .htaccess file(s) - line by line - and did not find any search engine User-agent substrings (e.g. "Googlebot", "oglebo").

I'd suggest testing URLs that should and should not redirect and URLs that are and are not valid on your site, using a genuine GoogleBot User-Agent header string copied from your raw access log file.

I'm not sure if I've interpreted this point correctly. Here's what I've done:

I've reviewed all the urls in my raw access log file with the GoogleBot User-Agent header string. Invalid urls with the GoogleBot User-Agent header string show a 404. Urls that should redirect show a 301. Valid urls that should not redirect show a 200. I've copied and tested a few of the URLs. The header showing in my log file is indeed the header returned.

The only oddity is the following:

Of the URLs with the GoogleBot User-Agent header string that should redirect and show a 301, five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.

Any thoughts?

As an aside, I don't think the following applies to my site:

[johnmu.com...]

While I'm suffering from the second symtom - warnings for URLs that redirect - the final URLs are all correct per Websniffer i.e. none redirect to an unwanted destination. What's more, I'm not suffering from the first symtom - urls from my website are simply not indexed anymore. I have not noticed a change in indexed urls.

jdMorgan

12:41 pm on Apr 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I'd suggest testing URLs that should and should not redirect and URLs that are and are not valid on your site, using a genuine GoogleBot User-Agent header string copied from your raw access log file.

What I meant was:
Test known URLs that should redirect (you've done this)
Test known URLs that should not redirect (you've done this)
Test unknown (non-canonical) URLs that should redirect (not sure)

For example, if your site is at www.example.com, then test
example.com ---301--> www.example.com/
www.example.com/index.php ---301--> www.example.com/
www.example.com/?fake-query ---301--> www.example.com/?correct-query-or-blank (or 404 response)
example.com/index.php ---301--> www.example.com/

etc.

Also, be aware that their error-reporting terminology is a bit inaccurate: A 301 *may* be an error in their eyes, but it is not an error at the HTTP protocol level. The "error" that Google is talking about is really not at that level, because what they are really saying is that "Hey, somebody is linking to a URL that gets a 301 response, but they should be linking to the correct URL -- the same URL that the 301 redirect goes to."

So their error message is misleading because it is terse; It should say, "Error: HTTP 301-Moved Permanently response code received when linked URL is followed" if that's what they mean.

Just make sure all of the links on your site point to the final, canonical URL, and that none link to URLs which will trigger your 301 redirect.

However, if all your on-site links are correct then this sounds like a GoogleGlitch.

Jim

Marfola

1:50 pm on Apr 7, 2009 (gmt 0)

10+ Year Member



We've tested all possible permutations. All return a 301 redirect. The only anomaly is our homepage - www.example/index.php. As we use TYPO3 we must show a 200 response on www.example.com/index.php for the site to work. In any case this does not effect the internal pages those showing redirect errors in webmaster tools.

Also, be aware that their error-reporting terminology is a bit inaccurate: A 301 *may* be an error in their eyes, but it is not an error at the HTTP protocol level. The "error" that Google is talking about is really not at that level, because what they are really saying is that "Hey, somebody is linking to a URL that gets a 301 response, but they should be linking to the correct URL -- the same URL that the 301 redirect goes to."

Not all of the pages with a “Redirect error” have backlinks thus this isn't our case either.

Did you see this?

Of the five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.


Of note, not all of the URLs with the GoogleBot User-Agent header string that should redirect and show a 301 appear in webmaster tools.

If this is a Google Glitch - for example google doesn't like URLs ending in index.html - is there are way to point this out?

[edited by: Marfola at 1:52 pm (utc) on April 7, 2009]

jdMorgan

3:40 pm on Apr 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Of the five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.

I saw that, but cannot ascribe any problem-specific meaning to it. What were the response codes, and exactly how fast were the "rapid" requests?

You say your index page in /index.php, but then ask about Google not liking URLs ending in .html -- Please clarify.

Jim

Marfola

10:53 am on Apr 8, 2009 (gmt 0)

10+ Year Member



Jim,

Of the URLs with the GoogleBot User-Agent header string that should redirect and show a 301, five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.


On any given day, there are more than 5 URLs with the GoogleBot User-Agent header string that correctly show a 301 response code in my log file. Only 5 appear in Webmaster Tools.

There are 2 things to note:

1.The five that appear in Webmaster Tools are listed one right after the other in my log file, i.e. no entries in between. Those that show elsewhere in my log file are not listed in Webmaster Tools.


2. The path for the five URLs that appear in Webmaster Tools is the same as the path for the those that don’t. All of the URLs with the GoogleBot User-Agent header string showing a 301 response code in my log file end in http://www.example.com/category/subcategory/page/ rather than http://www.example.com/category/subcategory/page/ index.html. The 301 response code is correct and is not part of a chain.

You say your index page in /index.php, but then ask about Google not liking URLs ending in .html -- Please clarify


These are 2 separate issues.

1. The cannonical URL for our home page is www.example.com. Because we use typo3 www.example.com/index.php must also return a 200 response header. If we didn’t our site wouldn’t work.

Thus the only cannonical redirect we do not execute is www.example.com/index.php ---301--> www.example.com/. Both www.example.com/index.php and www-example.com return a 200 response header. We have never had issue with this. www.example.com/index.php does not appear in our site or sitemap. It is not indexed by any of the search engines. It has a pagerank of zero.

2. Google not liking URLs ending in /index.html.

All of my suburls end in /index.html. I do not have chain redirects. All of my redirects return a 301 response header and list the correct final URL. My site and sitemap include final urls only. Urls ending in / do not appear in either. Some of the URLs showing as redirect errors in Webmaster Tools have external links but not all do.

All of the redirect errors showing in Webmaster Tools end in /.

If this is a Google Glitch – i.e. Google should have no problem following preoperly redirected urls ending in /index.html - is there are a way to point this out?

jdMorgan

1:45 pm on Apr 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know of any 100%-effective way to report a problem to Google. They likely have hundreds of employees in the crawling/indexing/ranking/reporting area, but there are millions of Webmasters with 'problems' of varying degrees -- So unless you know someone there and have their e-mail/phone number or can corner them at PubCon, the outlook is a bit bleak -- Keep making noise in their forum, drop relevant comments in their blogs where appropriate, etc.

Let me ask, though: Have you considered adding a 301 redirect from /blah/ to /blah/index.php to see if this affects the reported errors?

At some level, Google seems to "want" to use and list /blah/ URLs instead of /blah/index.xyz URLs -- The fact that they are trying to use those "/" URLs to spider your site even when you never link to "/" URLs is an indicator of this. So I'm wondering if a 301 redirect from "/" to "/index.xyz" to force them away from those "/" URLs might do anything positive in this case.

Other than that, I'm afraid I'm out of ideas.

Jim

Marfola

10:57 am on Apr 20, 2009 (gmt 0)

10+ Year Member



At some level, Google seems to "want" to use and list /blah/ URLs instead of /blah/index.xyz URLs -- The fact that they are trying to use those "/" URLs to spider your site even when you never link to "/" URLs is an indicator of this. So I'm wondering if a 301 redirect from "/" to "/index.xyz" to force them away from those "/" URLs might do anything positive in this case.

Jim, are you suggesting we remove the 301 redirect from non-index (/) to index (/blah/index.html)? If so, non-index would return a 404.