homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
404s - should I use a 301, robots.txt, or leave it alone?
darkroom




msg:3752124
 6:40 pm on Sep 25, 2008 (gmt 0)

Hey Guys,

We operate a website which was registered back in 1990s and holds a lot of authority. We also rank top 10 for 500+ keywords in Google from quite a few years and still hold all these rankings. But we are in a bit of dilemma right now as so all the help that we can get will be greatly appreciated.

We recently (2 months ago) did a complete change over of our website to a completely new design which also included somewhat new URL structure for some sections. (For example: old URL: http://www.example.com/sub-page , new URL: http://www.example.com/folder/sub-page/). Note: the new URL structure was not for the whole site, but I would say about 50% would be on the new URL structure. All the proper 301s were in place as well.

As one would expect, after a complete new site changeover the site experiences some drops in rankings etc, but we were back to our regular rankings in a matter of days. We were just going over our google webmaster tools account last week and noticed that the number of 404 requests have just been growing like crazy. Before it used to be around 1000, now it's about 3000. We have gone through all those URLs and found out that quite a few dynamic URLs were left out when preparing the 301 list. An example of one of those: http://www.example.com/folder/folder/page.aspx?section=blahblah. Now in our new design, we didn't include that section and all these pages are coming up as 404. We have a section in mind that we can 301 this to (but the page we are thinking of redirecting to is one of the most important pages as it holds top 10 rankings for numerous competitive keywords).

Now the question that I have is, if we redirect all those dynamic pages that used to exist before (now 404s) to that 1 page, how will Google look at this? Can it harm our rankings for that 1 page as we are 301ing bunch of 404s to that page?

Here are the options that we have:

a) Leave the 404s as it is
b) 301 them to that 1 page
c) disallow that section in robots.txt (google currently has 11,000 of those dynamic URLs in the index but there is no cache of them)

Any help would be greatly appreciated!

Thanks!
the darkroom.

[edited by: tedster at 6:44 pm (utc) on Sep. 25, 2008]
[edit reason] switch to example.com - it can never be owned [/edit]

 

g1smd




msg:3752157
 7:21 pm on Sep 25, 2008 (gmt 0)

I am not all that happy in funnelling a lot of URLs to a single page, but there shouldn't be a major problem with that.

Is there any way you can identify the ones that produce the most incoming traffic and cater just for those?

darkroom




msg:3752176
 7:44 pm on Sep 25, 2008 (gmt 0)

thanks for the quick reply g1smd. We didn't use to get much traffic on those dynamic pages as it was structured very deep into the website. So you suggest 301ing all those 11,000 404 pages to one of our main pages which holds numerous top 10 rankings shouldn't be a problem?

g1smd




msg:3752177
 7:46 pm on Sep 25, 2008 (gmt 0)

I'd only do those that are on-topic for where it takes you.

I would wait for Tedster's opinion on this, too.

darkroom




msg:3752183
 7:59 pm on Sep 25, 2008 (gmt 0)

thanks g1smd! All eyes on you now tedster :)

Demaestro




msg:3752184
 7:59 pm on Sep 25, 2008 (gmt 0)

If you actually moved the content then a 301 is right... if you have removed the content then a 404 is right.

Adding a sitemap as the content to your 404 page is nice for users.

[edited by: Demaestro at 8:03 pm (utc) on Sep. 25, 2008]

tedster




msg:3752277
 11:42 pm on Sep 25, 2008 (gmt 0)

My advice is let those urls be 404 and say just goodbye. Trying to squeeze every last drop out of old dead urls with 301 redirects is problematic, and at a high enough volume it can cause trust issues. The urls are gone - let them be 404/410 and get back to developing and marketing the live content.

The robots.txt in addition might be a good idea - no reason to let Google spend part of the crawl budget looking for urls you have chucked out the window.

darkroom




msg:3752283
 11:59 pm on Sep 25, 2008 (gmt 0)

awesome. Thanks a lot everyone.

youfoundjake




msg:3752305
 1:50 am on Sep 26, 2008 (gmt 0)

So what I'm gathering from this on a major site change is to let the 404 urls die out and control the flow of googlebot by using robots.txt. What can be done with pages that can't be disallowed by robots.txt, such as those that are on the root domain, and not in a subfolder or subdomain?
Particularly if I have a lot of IBL's pointing to some of those pages that will be moved?

g1smd




msg:3752308
 2:01 am on Sep 26, 2008 (gmt 0)

If there is a page with a lot of links I still like to redirect that incoming flow.

Marcia




msg:3752321
 2:37 am on Sep 26, 2008 (gmt 0)

Personally, I'd use 410 (GONE) with a custom error page for visitors.

[edited by: Marcia at 2:39 am (utc) on Sep. 26, 2008]

tedster




msg:3752325
 2:59 am on Sep 26, 2008 (gmt 0)

If there is a page with a lot of links I still like to redirect that incoming flow.

Absolutely - don't throw away the good landing pages, whether it comes from search traffic, direct links or type-ins. The BEST thing to do is either have content right there (no redirect at all) or redirect to a url that has essentially the same content. This question was about redirecting 11,000 urls to one target url.

I also agree with Marcia that 410 is the most technically correct http status for a url that used to exist. Right now, Google treats 404 and 410 in the exact same fashion. But if you are up to it, 410 is still the clearest signal your server can give.

Marcia




msg:3752377
 5:18 am on Sep 26, 2008 (gmt 0)

>>We didn't use to get much traffic on those dynamic pages as it was structured very deep into the website.

So there probably isn't much in the way of PageRank for those pages, and more than likely not much in the way of inbound linking, but it's a waste to keep getting hammered by bots that keep looking for 404s.

>>So you suggest 301ing all those 11,000 404 pages to one of our main pages which holds numerous top 10 rankings shouldn't be a problem?

I personally would be very uncomfortable redirecting that many to an important page with good rankings, as a "just in case" precaution.

But how about creating a brand new "user friendly" page to 301 redirect those pages to, that can guide any possible visitors to the important pages on the site. Kind of a transitional mini sitemap page to stop the 404 activity from going on and on.

darkroom




msg:3752388
 5:40 am on Sep 26, 2008 (gmt 0)

thanks for all the help people. Marcia, we already have a custom 404 page that users get which guides them to important pages of the site.

should we still make a new "user friendly" page and then 301 all those 11,000 URLs to that page?

or

can i just block that whole directory in robots.txt so that the bots can't hit em again?

or

just give them a 410 code and do nothing with the robots.txt to block them?

Marcia




msg:3752408
 6:43 am on Sep 26, 2008 (gmt 0)

>>Marcia, we already have a custom 404 page that users get which guides them to important pages of the site.

That's the point - they're 404's. They're all returning a 404 Page not found - which means they might be back. Really, it's not a sign of a quality site to have that many missing pages. That's why it's brought up in WebmasterCentral; 404's don't do crawlers any good, they just waste resources - for the engines and webmasters with wasted bandwidth and bloated error logs. A custom 404 is only for missing pages (404), so it has nothing to do with 301.

>>should we still make a new "user friendly" page and then 301 all those 11,000 URLs to that page?

I'm "chicken" so that's what I'd do - or maybe a 303 page replaced (or 410 if it didn't matter).

A 301 is completely different than a 404, it means that the page has moved, not that it's just missing. But actually, what's the most accurate is a 303 (see other) which means the page has been replaced by something else, but I haven't heard much mention of that outside the documentation.

This is an old thread, but still a good one, where all 3 (actually, 4) are referenced. Pay particular attention to jdMorgan's comment on 404's:

[webmasterworld.com...]

[BTW, jdMorgan is Apache web server deity, IMHO; to me, his opinions and posts are like webmaster scripture.]

[edited by: Marcia at 6:56 am (utc) on Sep. 26, 2008]

tedster




msg:3752427
 7:39 am on Sep 26, 2008 (gmt 0)

In this case, why would you even want Google to spend crawl budget on those old urls that you're getting rid of - even if you have a brand new url to receive all the 301 redirects? I gave it some more thought and I'm even more sure that a robots.txt disallow rule is the way to go. It can only benefit the good urls that you still have by getting them crawled more frequently.

darkroom




msg:3752858
 5:22 pm on Sep 26, 2008 (gmt 0)

Thanks a lot Marcia and tedster for your inputs. It certainly helped us a lot to decide.

WebmasterWorld rules :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved