| 7:21 pm on Sep 25, 2008 (gmt 0)|
I am not all that happy in funnelling a lot of URLs to a single page, but there shouldn't be a major problem with that.
Is there any way you can identify the ones that produce the most incoming traffic and cater just for those?
| 7:44 pm on Sep 25, 2008 (gmt 0)|
thanks for the quick reply g1smd. We didn't use to get much traffic on those dynamic pages as it was structured very deep into the website. So you suggest 301ing all those 11,000 404 pages to one of our main pages which holds numerous top 10 rankings shouldn't be a problem?
| 7:46 pm on Sep 25, 2008 (gmt 0)|
I'd only do those that are on-topic for where it takes you.
I would wait for Tedster's opinion on this, too.
| 7:59 pm on Sep 25, 2008 (gmt 0)|
thanks g1smd! All eyes on you now tedster :)
| 7:59 pm on Sep 25, 2008 (gmt 0)|
If you actually moved the content then a 301 is right... if you have removed the content then a 404 is right.
Adding a sitemap as the content to your 404 page is nice for users.
[edited by: Demaestro at 8:03 pm (utc) on Sep. 25, 2008]
| 11:42 pm on Sep 25, 2008 (gmt 0)|
My advice is let those urls be 404 and say just goodbye. Trying to squeeze every last drop out of old dead urls with 301 redirects is problematic, and at a high enough volume it can cause trust issues. The urls are gone - let them be 404/410 and get back to developing and marketing the live content.
The robots.txt in addition might be a good idea - no reason to let Google spend part of the crawl budget looking for urls you have chucked out the window.
| 11:59 pm on Sep 25, 2008 (gmt 0)|
awesome. Thanks a lot everyone.
| 1:50 am on Sep 26, 2008 (gmt 0)|
So what I'm gathering from this on a major site change is to let the 404 urls die out and control the flow of googlebot by using robots.txt. What can be done with pages that can't be disallowed by robots.txt, such as those that are on the root domain, and not in a subfolder or subdomain?
Particularly if I have a lot of IBL's pointing to some of those pages that will be moved?
| 2:01 am on Sep 26, 2008 (gmt 0)|
If there is a page with a lot of links I still like to redirect that incoming flow.
| 2:37 am on Sep 26, 2008 (gmt 0)|
Personally, I'd use 410 (GONE) with a custom error page for visitors.
[edited by: Marcia at 2:39 am (utc) on Sep. 26, 2008]
| 2:59 am on Sep 26, 2008 (gmt 0)|
|If there is a page with a lot of links I still like to redirect that incoming flow. |
Absolutely - don't throw away the good landing pages, whether it comes from search traffic, direct links or type-ins. The BEST thing to do is either have content right there (no redirect at all) or redirect to a url that has essentially the same content. This question was about redirecting 11,000 urls to one target url.
I also agree with Marcia that 410 is the most technically correct http status for a url that used to exist. Right now, Google treats 404 and 410 in the exact same fashion. But if you are up to it, 410 is still the clearest signal your server can give.
| 5:18 am on Sep 26, 2008 (gmt 0)|
>>We didn't use to get much traffic on those dynamic pages as it was structured very deep into the website.
So there probably isn't much in the way of PageRank for those pages, and more than likely not much in the way of inbound linking, but it's a waste to keep getting hammered by bots that keep looking for 404s.
>>So you suggest 301ing all those 11,000 404 pages to one of our main pages which holds numerous top 10 rankings shouldn't be a problem?
I personally would be very uncomfortable redirecting that many to an important page with good rankings, as a "just in case" precaution.
But how about creating a brand new "user friendly" page to 301 redirect those pages to, that can guide any possible visitors to the important pages on the site. Kind of a transitional mini sitemap page to stop the 404 activity from going on and on.
| 5:40 am on Sep 26, 2008 (gmt 0)|
thanks for all the help people. Marcia, we already have a custom 404 page that users get which guides them to important pages of the site.
should we still make a new "user friendly" page and then 301 all those 11,000 URLs to that page?
can i just block that whole directory in robots.txt so that the bots can't hit em again?
just give them a 410 code and do nothing with the robots.txt to block them?
| 6:43 am on Sep 26, 2008 (gmt 0)|
>>Marcia, we already have a custom 404 page that users get which guides them to important pages of the site.
That's the point - they're 404's. They're all returning a 404 Page not found - which means they might be back. Really, it's not a sign of a quality site to have that many missing pages. That's why it's brought up in WebmasterCentral; 404's don't do crawlers any good, they just waste resources - for the engines and webmasters with wasted bandwidth and bloated error logs. A custom 404 is only for missing pages (404), so it has nothing to do with 301.
>>should we still make a new "user friendly" page and then 301 all those 11,000 URLs to that page?
I'm "chicken" so that's what I'd do - or maybe a 303 page replaced (or 410 if it didn't matter).
A 301 is completely different than a 404, it means that the page has moved, not that it's just missing. But actually, what's the most accurate is a 303 (see other) which means the page has been replaced by something else, but I haven't heard much mention of that outside the documentation.
This is an old thread, but still a good one, where all 3 (actually, 4) are referenced. Pay particular attention to jdMorgan's comment on 404's:
[BTW, jdMorgan is Apache web server deity, IMHO; to me, his opinions and posts are like webmaster scripture.]
[edited by: Marcia at 6:56 am (utc) on Sep. 26, 2008]
| 7:39 am on Sep 26, 2008 (gmt 0)|
In this case, why would you even want Google to spend crawl budget on those old urls that you're getting rid of - even if you have a brand new url to receive all the 301 redirects? I gave it some more thought and I'm even more sure that a robots.txt disallow rule is the way to go. It can only benefit the good urls that you still have by getting them crawled more frequently.
| 5:22 pm on Sep 26, 2008 (gmt 0)|
Thanks a lot Marcia and tedster for your inputs. It certainly helped us a lot to decide.
WebmasterWorld rules :)