Forum Moderators: open
All URLs are properly formed and return a 200 OK code. Everything looks perfectly fine with SimSpider et al. The static-URL pages are properly indexed by every major SE, with one exception.
Google, unfortunately, sees things weirdly. It continues to index the old dynamic URLs, apparently from memory. I've gone through the site and hunted down any old links to the query string URLs, and there are appear to be no external links to them. Even if Google loads a page using the query string format from memory or a buried link somewhere, all links on the page are static, so GB shouldn't find additional query string links to follow.
My assumption was that as all links to the dynamic format were eliminated, the static format would push the old pages out of the index. That hasn't happened at Google. I suppose I could exclude GB from the query string pages with a well-crafted robots.txt statement, but I fear that effects of that might be even worse, i.e., the currently indexed URLs get replaced by worse performers or nothing at all.
Anyone have an idea why Google might be ignoring the well-linked, well-formed static URLs and instead choose to index dynamic pages that seem to have no internal or external links? This might be more of a server question, but are there any other server response issues that might be confusing GB but not everyone else?
If a URL exists, should it then be expected to always be valid?
Back when I was learning my abc's I got the impression it was one of the principles of the web.. where I've got that from I'm not sure but do think it's a good idea.
Are all pages that include? dynamic? Maybe Google got confused.
my 2cents
It's a pretty standard thing that should be done when implementing rewrites, for anyone that may stumble upon this thread in the future.
I even have one case where Google continues to show a page title that hasn't been in existence for more than a month, on a page it has fresh tagged ten times in the meantime. It shows the new page title for other searches.
Page used to be titles "How to Widget Well" and is now titled "PageMoved". It has been crawled ten times and shows for "pagemoved" searches with the new title, but still shows up with the old title for "widget well" (no quotes) searches.
Bottom line, getting pages out of the index and replacing them with new pages with the same content is very difficult right now.
getting pages out of the index and replacing them with new pages with the same content is very difficult right now
My problem exactly. And I'm afraid to try to completely remove the old URLs from Google lest the traffic they generate go away without being replaced by the new pages. I had expected a more organic replacement process to occur based on PR (how much PR can an unlinked page have?), but clearly this isn't happening the same way it used to. I think the 301 redirect is worth a try, though. It's a bit tricky on the site in question because some dynamic URLs are still valid, but I believe it's doable without adding excessive code.
One thing to check out is the cache being displayed for your new pages compared to your old pages. My pages were bizarrely displaying the same cache long after various changes.... the new pages would be showing a "PageMoved" cache! Naturally this killed the new page for multi-keyword matches.
Also, if I do a search -thefilename of the old page, all the new pages but one rank higher than the old pages... so a defunct #23 page is preventing a new #7 page from showing in the results. Very weird. So while this is kinda clinically interesting it is driving me crazy too. (For reasons that aren't important I can't use a 301.)