|Ethical 301 redirect or not|
Redirecting thousands of deprecated URLs to one page
I have never been able to get a straight answer on this one.
We have thousands of deprecated URLs, no longer in use, and no longer serving a 200 status on the server.
These URLs are still appearing in the main index and I want them removed.
* We were able to redirect all of these URLs, and there are about 400 of them to a single index page (not the main site index page, but a category index page that then drills down into subcategory pages.
* All pages removed were in the root/preferred domain
* All redirects were to a page which exists in the root/preferred domain, so there is no redirecting to another domain.
I am obviously concerned that hundreds of redirects to a single page would be considered a black hat/doorway page routine. Of course, this is not the intent, and it's all part of a site clean up. But is this an edthical 301 use, or not. If not, what else can be done?
I had a similar situation a while back. Here's what I did.
I went through my logs to determine if any of the deprecated pages were getting any referrals, not so much from search engines, but from natural links.
If a page was getting referrals, I would look at the context of the links, and find an appropriate existing page to do the 301 redirect to (not necessarily the root page).
If the page wasn't getting any referrals, but I was able to determine that there were a number of external links to it, I would also do the 301 to the appropriate page.
For the remainder, I returned a 404 response with a custom error page / mini site map.
If your pages are still in the main index after all this time, are you sure that there are no links somewhere on your site still pointing to them? I've seen cases where a delinked page that was left on the server was still being spidered by the SEs, and causing problems.
*I have never been able to get a straight answer on this one.*
Neither have I, but suspect it's down to nobody knowing for sure.
Using 301s is too obvious a promotional method to have been ignored by G IMO, but I've only seen one "official" mention of it, in a recent post by MC where he says non-relevant links from 301s can be "dangerous".
Where this leaves relevant links from a network of on-topic 301ed domains is anyones guess..
Google is getting better at cleaning out the supplementals index - and if the pages are that old, they should always appear below your current pages.
Unless you do have some links still working, as suggested above.
personally, rather than clutter your server with loads of unnecessary 301s, all pointing to one page, why not save that one page as your 404? Less work, less clutter, future proof.
But do check your site navigation - xenu is your friend. And if you have a database site that throws up multiple URLs, robots.txt will enable you to avoid reoccurrence. Clone pages with unique URLs are often part of the problem.
FWIW, I doubt the 301s will either help or hinder your site in SEO terms. Strictly neutral, and therefore of no interest to Google either way.
> Where this leaves relevant links from a network of on-topic 301ed domains is anyones guess..
It leaves 301 redirects as the proper thing to do with a URL that is no longer valid. This is part of the HTTP protocol [w3.org], and the major SEs are not going to "penalize" things at that level without strong analysis to confirm inappropriate use.
In simple terms, using a 301 to redirect a dead URL to its logical replacement is the correct thing to do. However, killing off thousands of URLs is the wrong thing to do, and should be avoided. With today's server-side technology (e.g. mod_rewrite, ISAPI Rewrite, MySQL), there's little excuse to ever have to kill off or change a URL [w3.org], and well-managed sites don't do it.
To put this in perspective, assume that the major search engines see the Web as a public library, and not as a weekend bookseller's stall at a flea market. If your site has indexable documents always popping in and out, it's a flea market, and they can't be blamed for giving it less emphasis than they would to a well-maintained library. Although the Web has moved from an primarily research and academic focus to a more-commercial focus, the SEs still see the former as the 'ideal,' and only tolerate the latter.
jonrichd's plan is a good one.
 Added quote for context. [/edit]
[edited by: jdMorgan at 1:49 pm (utc) on Jan. 17, 2007]
Are you still getting human traffic for those pages? If you are a 301 is good.
If its only spider traffic, you might want to block via robots.txt
*It leaves 301 redirects as the proper thing to do with a URL that is no longer valid.*
Right but I was talking about artificially inflating a site through use of multiple domain 301s, would that simply fall into the "links scheme" category?
You would think it falls into a similar category a link schemes. But 301 redirects are not links, and for the moment I do see some 301 networks getting results when the topic/theme/keywords involved are in synch. I also wonder about some of the new drastic changes this Dec/Jan and whether this might be one of the factors under Google attack.
For me, the best practice is pretty much what JD said - if there's a true replacement page or domain, then 301. Otherwise 404.
That would depend on how many domain and subdomain variants were redirected and whether you tried to actively promote all of them. If you've got www- and non-www of the .com and .co-uk redirected to one of those domains as the canonical, I wouldn't give it a second thought.
It's quite reasonable to assume that at some level the SEs don't appreciate Webmasters wasting their bandwidth, CPU time, and disk space on a bunch of duplicates. On the other hand, I'm not a big believer in "penalties" -- I'd suspect the duplicates would be filtered (ignored), ranked into obscurity because of PR/link-pop splitting across all those domains, or simply dropped into G's Supplemental index.
*You would think it falls into a similar category a link schemes.*
Deliberately creating sites to garner links that would later be redirected to the "real" site through 301s is obviously a links scheme, but I've never seen a " Rankings down in G, could it be my multiple 301s?' post.
I'm not sure why Google would penalize an action that may be sloppy and is usually unnecessary (better housekeeping is a much more elegant solution), but does you no good, and others no harm.
They have enough to do already, I suspect ;)
*.. does you no good..*
301s are supposed to pass on PR/link juice from IBLs, G has to have looked at the potential abuse of 301s when deciding they should transfer PR/value, don't you think?
This is the way I look at it, if the old urls are not getting any human traffic, 404 them, block them via robots.txt and move on. They will be deindexed
Remember, make your website for humans, not search engines. PR really does not apply to serps so people really need to stop worrying about it.
Hundreds of 301's could throw some flags up at google. Why take the chance, especially if there are no human visitors hitting the pages.
You really have to look at the traffic that is going to those pages.
|PR really does not apply to serps so people really need to stop worrying about it. |
I'm sure you've overstated that one, trinorth - let's call it poetic license or hyperbole. Real-time PR (not the toolbar report) has a definite impact on SERPs as one of may factors. But I agree that people are much too obsessed about their toolbar greenies.
[edited by: tedster at 6:32 pm (utc) on Jan. 18, 2007]
In the original posters case where he is taking hundreds of url's I do not think that the pr would be worth potentially damaging he sites trust rank...
Especially if the pages are not getting human traffic via search.
I completely agree with that. I just consulted on a major redevelopment where every url changed except for the home page. We spotted about 70 critical urls that needed a 301 and all the rest we let go 404. Their rankings for the site never showed even a little hiccup, and within a few weeks the improvements started showing up.
So is there such a thing as 301 "abuse"?
Thanks for the great discussion.. really its appreciated.
Our case is more one that required an immediate call to action, where URLs needed to be depracated. If it was a stadard site clean up, I agree that if there is human traffic 301 'em and if it's just bot traffic than serve up the 404 error page.
In our case, our content management system was cloning pages, such that widgets.html also existed as widgets.html?page=1, page=2 ,etc... and thousands of pages were being indexed while meta descriptions and titles and most of the other content on the page were exactly the same for page=1, page=2, page=3, etc... yes, I know that sounds ugly. Not only did these pages need to be deprecated, so did our CMS
I was considering writing a function that would 301 all query string pages and so widgets.html?page=1, page=2, etc.. would 301 to widgets.html. In the end, just to be safe, I ended up just serving up the 404 page for the thousands of query string indexed pages. I then ran Xenu to make sure there were linkages to any of these pages still. Thanks
|For me, the best practice is pretty much what JD said - if there's a true replacement page or domain, then 301. Otherwise 404. |
tedster, jd, have you done anything recently with 410's? Or, are the SE's handling 410's the same way as 404's?
I know if I had the opportunity to keep that bot from requesting that URI again in the future and not wasting valuable resources, I'd surely jump on the bandwagon and strongly advocate the use of 410 over 404 for URIs that are Gone.
There is an inherent problem with serving a 404 for a URI that you know is Gone. The bot will continue to request that URI for quite some time.
>> These URLs are still appearing in the main index and I want them removed. <<
If the topic has moved, then a 301 is the right thing to do.
If it has gone then 404 or 410 should be used, serving a custom error page that contains site navigation to get the user on their way again.
You could leave the linking structure like it is and use the no index, no follow tag. That would clean them up as well.
We recently did that to a sort by price function that was causing us issues but was still necessary for the sites customers.
> tedster, jd, have you done anything recently with 410's? Or, are the SE's handling 410's the same way as 404's?
I've always used 410-Gone for pages which had to be removed and had no logical replacement. This is a very rare occurrence, as I subscribe to Tim Berners-Lee's philosophy that "Cool URI's Don't Change." So, in eleven years on the Web, I've 410'ed under two dozen URL's.
As far as any strong evidence that the SEs treat 410-Gone any differently from 404-Not Found, I don't have any. But according to the HTTP/1.1 protocol, it's the right thing to do, so if and when they support it properly, my sites are ready. In the meantime, they may treat it as a 404, or perhaps as a generic 400 error -- I have certainly not had any problems using it.
About 404 vs. 410:
I have a forum that is constantly being used for spam posting and so I delete the offending posts and 410 their URLs because it's really-really gone after I've deleted them. I have noticed that my Google WMT account counts 410s as HTTP errors and 404 as a (slightly less problematic?) separate category. So, I have asked Google if this is a potential problem and, amazingly, I got a human reply stating that Google treats 410 and 404 as the same response. Now, that's still confusing because they ARE different HTTP responses, each serving a different purpose. Also, I know that Y! Slurp is notorious for coming back for 404-ed URLs literally years they'd been gone, and so I use 410 in hope to save some Slurp bandwidth. I'm not sure it's helping but at least there's hope.
Anyways, back to the subject: as far as Google is concerned, I think it's safe to assume they treat 404 and 410 exactly the same way.
Hit hard by the so-called Google 950 penalty [webmasterworld.com], I took a look at the potential causes. The first thing that jumped out at me, was I had several 301 directives for defunct webpages pointing at my index page. Over the last few years, I just kept adding. I even added a few non-existent page names that were often tried by users, just to keep them landing on my index page - my bad.
I removed the 301s, and after a couple days my site returned to normal ranking for all terms associated. Make of it what you will.
*Make of it what you will.*
Couple of days seems a bit quick if there was a penalty....