Forum Moderators: open

Message Too Old, No Replies

301 Redirect Length

and other questions

         

webdude

12:59 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just wondering...

One of my sites is about a year old. Recently (within the last several months) I changed some of the names of files. I added 301 redirects from the old pages to the new pages. So I started thinking and have some basic questions about redirects from observations I have made...

Some pages that the redirect is directed to seem to be spidered right away with log files showing the name of the new file. These pages take on a PR of 0 (white bar) and are listed in the index. This is pretty much what I expected. Other pages still have not changed in the index and when crawled, the log files show the old file names - even with a redirect in place. Why do you think this is?

Even though the logs show googlebot crawling an old page with a redirect, how can I be sure that the bot is actually crawling the new page? Or is it at all?

I tested all the redirects and they are working. I have double and triple checked all the links on my site and all the links link to the new pages. There are no references or links to the old pages with the redirects, so how does google find these old pages during a deep crawl? Even if the redirect has been up for several months?

So this brings up another question...

How long should I leave a 301 redirect up?

I am almost to the point where I am thinking I should just delete the old pages because when I am deep crawled, I am getting a smattering of old and new in the log files. The pages that are new that do not seem to be crawled have blank PR, the new pages that ARE crawled (according to the logs) take on a PR 0. And still again, some pages have taken on the PR of their previous pages.

I am confused and wondering how long it will take to work this mess out.

Don't you think in the long run it would be better to just delete all the old files, which I assume would force the bot to crawl just the existing pages?

Anyway, thanks for any thoughts on this...

WebDude

Patrick Taylor

1:32 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not the world's most experienced with this (only been through the process once), but isn't it the case that the redirect means the old page can (and even should - duplicate content?) be deleted straightaway? After all, it can no longer be viewed because any attempt to do so is diverted by the redirect to the new page. I would assume that the continued appearance of the old page in Google's index is just that it still exists in their cache.

I would leave the redirect in place as long as valued visitors are assumed to still have the old page URL in their favourites, or at least until the old pages are gone from Google's cache.

<added>or Yahoo's and MSN's etc cache</added>
<added>and of course - incoming links</added>

[edited by: Patrick_Taylor at 1:43 pm (utc) on Aug. 11, 2004]

encyclo

1:39 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you use a 301 permanent redirect, the old pages become inaccessible and so you can remove the files immediately. You may, however, see the old filenames in the results for some time to come.

You may have backlinks to those old pages, and users may have bookmarked them. For these reasons, you should keep the redirects in place for as long as possible - 12 months is a minimum. Backlinks may never change to the new page names, so it is best if you can to keep the redirects indefinitely.

webdude

1:43 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well this is my point. The redirects were originally put there for old bookmarks and such. Strictly for the customer. But I find it odd that after several months the bot would still be crawling old pages with redirects.

I mean, I thought a redirect physically would redirect the bot to the new page regardless of cache. Is that not the way a redirect works? There should be no referal to old pages at all. All other spiders are crawling new pages.

webdude

1:44 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are no backlinks to these pages. All backlinks are going to the index page.

diamondgrl

1:55 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



webdude,

my one piece of advice is to check the logs carefully to make sure they really are sending 301s. i thought i had 301s but after months of assuming that it worked discovered that my php code that should have generated a 301 actually generated a 302.

i had to go through a reasonably convoluted process to fix it but my server is now throwing off 301s.

webdude

2:02 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have checked and rechecked this. The 301s are working. All other spiders are crawling the new pages. Only googlebot seems confused. As far as i thought, a redirect would eliminate all references to the old pages in the log except maybe the initial hit (not sure about this).

Also, why some and not others? All the redirects are exactly the same (IIS5/properties/file/redirect to:http://widget.com/page.html exact - permanent)

Patrick Taylor

2:02 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



php code

Isn't the textbook way with .htaccess?

redirect 301 /filename.htm h**p://w**.domain.com/newfilename.htm

webdude

2:30 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



.htaccess is an Apache thing. Not supported by IIS.

Patrick Taylor

2:36 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Point taken.

webdude

3:14 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, this thread bombed and I am no closer to an answer. After looking at some other threads here, I have come to the conclusion that googlebot is broke. I think the only coarse would be to delete all old content and thus eliminate all redirects. It seems, at least for me, that googlebot is not following them on certain pages. Not sure why.

webdude

8:26 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone?