Forum Moderators: Robert Charlton & goodroi
Web crawl > URLs not followed > "Redirect error"
http://www.example.com/category/subcategory/page/ Note that:
1. URLs shown above as redirect errors are not in my site nor are they in my sitemap.
2. The urls in my site and sitemap are all final, target URLs (return a 200 header) not one redirects.
3. I do not have any chain redirects. I manage the 301 in a single redirect and redirect to the final url. I have tested every single URL (as well as every possible permutation) in Websniffer and get the desired header for each. I get a 301 permanent redirect (to the final URL) for URLs redirected, 404 not found for pages that don't exist, and 200 for my target URLs, ie all of the URLs in my sitemap and site.
4. There are no conflicts between my .htaccess and http.conf or any other underlying server configurations.
The post [google.com] suggests I'm not the only webmaster with this problem. Can someone please help find an answer.
FYI: I've excluded the canonical link element as a solution to the problem. Per Matt Cutts' recent presentation [googlewebmastercentral.blogspot.com] the 'tag' is for sites unable to manage redirects at the source.
I'm posting this in WebmasterWorld as I have yet to receive an answer in Google Support Forum. (I've posted the question several times in the past 2 months).
I have tested every single URL (as well as every possible permutation) in Websniffer and get the desired header for each.
It sounds like you've been very, very thorough, both in your 301 handling and in your testing. But since you also say that this problem has persisted for months (which wouldn't likely be the case if this were just a GoogleGlitch), I have to ask: Did you test your URLs with a Googlebot user-agent?
We've had several reports here at WebmasterWorld of .htaccess hacks that 'cloak' the attacked sits, serving alternate content and links to Googlebot and a few other major search engines' robots. While I'm only aware of .htaccess hacks, this malicious cloaking could also be accomplished by hacking a script on the victim site.
I'd suggest testing URLs that should and should not redirect and URLs that are and are not valid on your site, using a genuine GoogleBot User-Agent header string copied from your raw access log file. Alternately, a line-by-line review of all code in the actual .htaccess file(s) on your server, looking for search engine User-agent substrings (e.g. "Googlebot", "oglebo"), would be a good idea.
Hopefully, this is just a rare long-lived GoogleGlitch, but you're right to take it seriously.
Jim
Thanks for your comments.
I've tested my URLs in User Agent Switcher FireFox plugin with the Googlebot Agent. They all serve the desired content.
I've reviewed my .htaccess file(s) - line by line - and did not find any search engine User-agent substrings (e.g. "Googlebot", "oglebo").
I'd suggest testing URLs that should and should not redirect and URLs that are and are not valid on your site, using a genuine GoogleBot User-Agent header string copied from your raw access log file.
I've reviewed all the urls in my raw access log file with the GoogleBot User-Agent header string. Invalid urls with the GoogleBot User-Agent header string show a 404. Urls that should redirect show a 301. Valid urls that should not redirect show a 200. I've copied and tested a few of the URLs. The header showing in my log file is indeed the header returned.
The only oddity is the following:
Of the URLs with the GoogleBot User-Agent header string that should redirect and show a 301, five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.
Any thoughts?
As an aside, I don't think the following applies to my site:
[johnmu.com...]
While I'm suffering from the second symtom - warnings for URLs that redirect - the final URLs are all correct per Websniffer i.e. none redirect to an unwanted destination. What's more, I'm not suffering from the first symtom - urls from my website are simply not indexed anymore. I have not noticed a change in indexed urls.
What I meant was:
Test known URLs that should redirect (you've done this)
Test known URLs that should not redirect (you've done this)
Test unknown (non-canonical) URLs that should redirect (not sure)
For example, if your site is at www.example.com, then test
example.com ---301--> www.example.com/
www.example.com/index.php ---301--> www.example.com/
www.example.com/?fake-query ---301--> www.example.com/?correct-query-or-blank (or 404 response)
example.com/index.php ---301--> www.example.com/
etc.
Also, be aware that their error-reporting terminology is a bit inaccurate: A 301 *may* be an error in their eyes, but it is not an error at the HTTP protocol level. The "error" that Google is talking about is really not at that level, because what they are really saying is that "Hey, somebody is linking to a URL that gets a 301 response, but they should be linking to the correct URL -- the same URL that the 301 redirect goes to."
So their error message is misleading because it is terse; It should say, "Error: HTTP 301-Moved Permanently response code received when linked URL is followed" if that's what they mean.
Just make sure all of the links on your site point to the final, canonical URL, and that none link to URLs which will trigger your 301 redirect.
However, if all your on-site links are correct then this sounds like a GoogleGlitch.
Jim
Also, be aware that their error-reporting terminology is a bit inaccurate: A 301 *may* be an error in their eyes, but it is not an error at the HTTP protocol level. The "error" that Google is talking about is really not at that level, because what they are really saying is that "Hey, somebody is linking to a URL that gets a 301 response, but they should be linking to the correct URL -- the same URL that the 301 redirect goes to."
Not all of the pages with a “Redirect error” have backlinks thus this isn't our case either.
Did you see this?
Of the five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.
If this is a Google Glitch - for example google doesn't like URLs ending in index.html - is there are way to point this out?
[edited by: Marfola at 1:52 pm (utc) on April 7, 2009]
I saw that, but cannot ascribe any problem-specific meaning to it. What were the response codes, and exactly how fast were the "rapid" requests?
You say your index page in /index.php, but then ask about Google not liking URLs ending in .html -- Please clarify.
Jim
Of the URLs with the GoogleBot User-Agent header string that should redirect and show a 301, five appear in the log file in rapid succession. These are the URLs that appear under the heading: HTTP Error: 301 (Moved permanently) in Webmaster Tools.
You say your index page in /index.php, but then ask about Google not liking URLs ending in .html -- Please clarify
Let me ask, though: Have you considered adding a 301 redirect from /blah/ to /blah/index.php to see if this affects the reported errors?
At some level, Google seems to "want" to use and list /blah/ URLs instead of /blah/index.xyz URLs -- The fact that they are trying to use those "/" URLs to spider your site even when you never link to "/" URLs is an indicator of this. So I'm wondering if a 301 redirect from "/" to "/index.xyz" to force them away from those "/" URLs might do anything positive in this case.
Other than that, I'm afraid I'm out of ideas.
Jim
At some level, Google seems to "want" to use and list /blah/ URLs instead of /blah/index.xyz URLs -- The fact that they are trying to use those "/" URLs to spider your site even when you never link to "/" URLs is an indicator of this. So I'm wondering if a 301 redirect from "/" to "/index.xyz" to force them away from those "/" URLs might do anything positive in this case.
Jim, are you suggesting we remove the 301 redirect from non-index (/) to index (/blah/index.html)? If so, non-index would return a 404.