Forum Moderators: open
One of my sites is about a year old. Recently (within the last several months) I changed some of the names of files. I added 301 redirects from the old pages to the new pages. So I started thinking and have some basic questions about redirects from observations I have made...
Some pages that the redirect is directed to seem to be spidered right away with log files showing the name of the new file. These pages take on a PR of 0 (white bar) and are listed in the index. This is pretty much what I expected. Other pages still have not changed in the index and when crawled, the log files show the old file names - even with a redirect in place. Why do you think this is?
Even though the logs show googlebot crawling an old page with a redirect, how can I be sure that the bot is actually crawling the new page? Or is it at all?
I tested all the redirects and they are working. I have double and triple checked all the links on my site and all the links link to the new pages. There are no references or links to the old pages with the redirects, so how does google find these old pages during a deep crawl? Even if the redirect has been up for several months?
So this brings up another question...
How long should I leave a 301 redirect up?
I am almost to the point where I am thinking I should just delete the old pages because when I am deep crawled, I am getting a smattering of old and new in the log files. The pages that are new that do not seem to be crawled have blank PR, the new pages that ARE crawled (according to the logs) take on a PR 0. And still again, some pages have taken on the PR of their previous pages.
I am confused and wondering how long it will take to work this mess out.
Don't you think in the long run it would be better to just delete all the old files, which I assume would force the bot to crawl just the existing pages?
Anyway, thanks for any thoughts on this...
WebDude
I would leave the redirect in place as long as valued visitors are assumed to still have the old page URL in their favourites, or at least until the old pages are gone from Google's cache.
<added>or Yahoo's and MSN's etc cache</added>
<added>and of course - incoming links</added>
[edited by: Patrick_Taylor at 1:43 pm (utc) on Aug. 11, 2004]
You may have backlinks to those old pages, and users may have bookmarked them. For these reasons, you should keep the redirects in place for as long as possible - 12 months is a minimum. Backlinks may never change to the new page names, so it is best if you can to keep the redirects indefinitely.
I mean, I thought a redirect physically would redirect the bot to the new page regardless of cache. Is that not the way a redirect works? There should be no referal to old pages at all. All other spiders are crawling new pages.
my one piece of advice is to check the logs carefully to make sure they really are sending 301s. i thought i had 301s but after months of assuming that it worked discovered that my php code that should have generated a 301 actually generated a 302.
i had to go through a reasonably convoluted process to fix it but my server is now throwing off 301s.
Also, why some and not others? All the redirects are exactly the same (IIS5/properties/file/redirect to:http://widget.com/page.html exact - permanent)