Forum Moderators: open
My concern is that I want the rewritten URLs to stay in the index, and definitely I don't want the duplicate content filter to kick in and only leave the original pages (since they're older). I don't really want any visitors to get 404's, but based on the SERPs that doesn't appear to be a danger. My plan is to just remove the files for the original URLs and see if I get any 404s, probably make a custom 404 page with a link to home just in case. Is there any danger to doing this? I'm a little concerned that the rewritten pages might be from the freshbot and could disappear at some point...or are all current www-ex results from the last deepcrawl? I suppose I should check my logs to see if the deepcrawl ever hit my rewritten URLs. When the deepcrawl comes looking for the original pages again, are there any problems with them not being there, or will it just not find them since there aren't any links to them?
I'm almost embarrassed to think this since it seems a bit unethical, but would there be any advantage to keeping the original pages live, say to direct PR back to my homepage? I could easily modify the pages as to not have duplicate content to any other page on my site. I really wouldn't do this, its just too spammy and I don't care to risk a penalty, but I'm still curious if this is a vulnerability in the algorithm. The thing is, the original pages themselves don't have any links pointing to them so the next deepcrawl they might get thrown out, or least they'd be PR0. So I guess I answered my own question there.
That's probably why you were seeing both. The old URLs were in the permanent index, and the new ones were freshbotted.
If the rewrite is a 301 or a tranparent redirect, there is no need to keep the old files. As far as http (web) access is concerned, they no longer exist. They only exist in the file system of your server, and have essentially been "disconnected" from any access by URL.
If the above leaves you still worried, you might want to post which kind of redirect you used - it would help keep discussion on-track.
HTH,
Jim
Yes, the old URLs would be replaced if you use a 301.
Just as matter of best pracitices, and whenever possible, return a 301 or a 410 on pages you intentionally remove. That way a 404 in your logs is much more likely to mean you have an on-site problem. It makes the error log much more useful by reducing the number of junk entries in it, and also leads to a better user experience because the user can be either redirected to a relevant page, or told unambiguously that the page has been removed, and he/she should delete or correct any bookmarks, or notify the owner of the site with the bad link to update it, etc.
Jim