|Google Webmaster Tools Reporting Old Data As New?|
Two weeks ago I discovered that Google was indexing thousands of my pages incorrectly, resulting in duplicate version of the same page. The same page would be indexed as:
This shouldn't have mattered, because I was using canonicals to point it to the right page, but since the first version of that URL wasn't really supposed to exist, on that incorrect version of the page some of the links were broken.
I though no problem. So two weeks ago I used a 301 redirect to redirect all the /mypage/ to /mypages/. Problem solved I thought.
Except it's been two weeks and Google keeps adding more and more and more of the /mypage/ versions to the HTML suggestions area, complaining of duplicate titles and all sorts of other things.
But if you try to go to any of those pages it's complaining about, they basically don't exist, they redirect to the correct page. Yet Google Webmaster tools continues to complain about them anew as if it still sees them.
What's going on here? I'm totally confused. How can Webmaster Tools give new HTML suggestions based on pages which don't exist and have been redirected for two weeks?
You're right that a canonical link element was not going to fix that situation. As long as those 301 redirects are in place, then you are probably getting old information in the report.
Can you see in your own server logs if googlebot is still requesting those incorrect page URLs in present time, and if your server always responding with the 301? If that is the case, then you can just ignore the messages.
Actually how often is WMT data updated? I see that Links to your Site contain links from pages that no longer link to my site (or never linked?), as much as I am checking certain sites and pages that links to me (according to WMT), I could not find a link back.
@tedster Pretty tough to tell from my server logs. I'm pretty sure it's returning the 301, but not 100%. I guess my question would be... why wouldn't it be returning the 301?
The info isn't old though, it keeps adding new ones on a daily basis. That's the really confusing thing.
Actually... hmm. In the log files it looks like the server might be returning a 302 redirect to Googlebot. Which makes no sense, I have it set up as a 301. If it's getting a 302 would that do it?
Only an error in your server configuration. For example, if you're on Windows, 302 is the default and you explicitly need to choose "Permanent" to generate a 301 status. And Microsoft has made understanding the actual status quite obscure in many versions of IIS - trying to dumb down server technology is what it feels like. Microsoft has long had a culture of doing their own thing rather than aligning to accepted standards. (It's a "favorte", not a "bookmark"... etc.)
If you're on Apache, how did you set up the redirect?
And finally, your server logs can always be set to record the status of the server response. If you can't see it currently and easily, you will probably need to find that setting for the logging function and enable it.
I see old data too. Like Zoltan, I see "links to your site" from pages that don't exist on the web (the domains are no longer alive).
I have also noticed that some 404 information is dated some years previously (as in, 404's associated with internal linking on pages that I discarded years ago).