Forum Moderators: open

Message Too Old, No Replies

Google's memory and rewritten URLs

         

rogerd

11:18 pm on Dec 4, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm working with a site that has dynamic product pages with a short query string. The site runs ASP on a Windows server. In an effort to create friendly URLs for users and spiders, we implemented a custom error page handler that turns a URL that looks like
- domain.com/store/dynamic.asp?brand=Acme%20Widgets
into a friendlier
- domain.com/mall/Acme_Widgets/ or
- domain.com/mall/Acme_Widgets/index.htm
The URLs create a nice directory structure, too, so that product XY555 from Acme would look like
- domain.com/mall/Acme_Widgets/XY555.htm

All URLs are properly formed and return a 200 OK code. Everything looks perfectly fine with SimSpider et al. The static-URL pages are properly indexed by every major SE, with one exception.

Google, unfortunately, sees things weirdly. It continues to index the old dynamic URLs, apparently from memory. I've gone through the site and hunted down any old links to the query string URLs, and there are appear to be no external links to them. Even if Google loads a page using the query string format from memory or a buried link somewhere, all links on the page are static, so GB shouldn't find additional query string links to follow.

My assumption was that as all links to the dynamic format were eliminated, the static format would push the old pages out of the index. That hasn't happened at Google. I suppose I could exclude GB from the query string pages with a well-crafted robots.txt statement, but I fear that effects of that might be even worse, i.e., the currently indexed URLs get replaced by worse performers or nothing at all.

Anyone have an idea why Google might be ignoring the well-linked, well-formed static URLs and instead choose to index dynamic pages that seem to have no internal or external links? This might be more of a server question, but are there any other server response issues that might be confusing GB but not everyone else?

rogerd

3:22 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Have I stumped the stars?

davidpbrown

3:40 pm on Dec 5, 2003 (gmt 0)

10+ Year Member



I spotted that Google was retaining old (static) URLs and now have a permanent redirect php to catch them.

If a URL exists, should it then be expected to always be valid?

Back when I was learning my abc's I got the impression it was one of the principles of the web.. where I've got that from I'm not sure but do think it's a good idea.

Are all pages that include? dynamic? Maybe Google got confused.

my 2cents

iJeep

3:57 pm on Dec 5, 2003 (gmt 0)

10+ Year Member



You need to put some code at the top of the page that checks the uri. If it is the dynamic page then return a 301 so it will know that the page has moved.

rogerd

4:21 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hmmm... so, iJeep, you are suggesting including ASP code to parse the requested URL, and, if it is in the dynamic format, to display the static URL and return a 301 server header?

bakedjake

4:27 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



rogerd: Exactly right. On all of your dynamic pages (that you are rewriting), you should grab the requested URI. If it's in the old form, shoot the client a 301 redirect with the new form.

It's a pretty standard thing that should be done when implementing rewrites, for anyone that may stumble upon this thread in the future.

steveb

9:59 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may have gotten the solution above, but Google has been behaving the same regarding static pages and other static pages. Pages continue to show up despite having no links to them, while the replacement pages don't register, except as indented results under the old page or for searches where the words are not on the old page.

I even have one case where Google continues to show a page title that hasn't been in existence for more than a month, on a page it has fresh tagged ten times in the meantime. It shows the new page title for other searches.

Page used to be titles "How to Widget Well" and is now titled "PageMoved". It has been crawled ten times and shows for "pagemoved" searches with the new title, but still shows up with the old title for "widget well" (no quotes) searches.

Bottom line, getting pages out of the index and replacing them with new pages with the same content is very difficult right now.

rogerd

10:30 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



getting pages out of the index and replacing them with new pages with the same content is very difficult right now

My problem exactly. And I'm afraid to try to completely remove the old URLs from Google lest the traffic they generate go away without being replaced by the new pages. I had expected a more organic replacement process to occur based on PR (how much PR can an unlinked page have?), but clearly this isn't happening the same way it used to. I think the 301 redirect is worth a try, though. It's a bit tricky on the site in question because some dynamic URLs are still valid, but I believe it's doable without adding excessive code.

steveb

10:55 pm on Dec 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like we have the exact same issue, and all my pages are static html. I've been experimenting with removing pages; linking to dead/404 locations; changing titles; etc.

One thing to check out is the cache being displayed for your new pages compared to your old pages. My pages were bizarrely displaying the same cache long after various changes.... the new pages would be showing a "PageMoved" cache! Naturally this killed the new page for multi-keyword matches.

Also, if I do a search -thefilename of the old page, all the new pages but one rank higher than the old pages... so a defunct #23 page is preventing a new #7 page from showing in the results. Very weird. So while this is kinda clinically interesting it is driving me crazy too. (For reasons that aren't important I can't use a 301.)

rogerd

12:27 am on Dec 6, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



It's truly weird. I have PR4 pages that don't seem to be in the index at all, while the PR0 versions of the same pages are in the index and perform marginally well for searches.