Forum Moderators: Robert Charlton & goodroi
* We were able to redirect all of these URLs, and there are about 400 of them to a single index page (not the main site index page, but a category index page that then drills down into subcategory pages.
* All pages removed were in the root/preferred domain
* All redirects were to a page which exists in the root/preferred domain, so there is no redirecting to another domain.
I am obviously concerned that hundreds of redirects to a single page would be considered a black hat/doorway page routine. Of course, this is not the intent, and it's all part of a site clean up. But is this an edthical 301 use, or not. If not, what else can be done?
I went through my logs to determine if any of the deprecated pages were getting any referrals, not so much from search engines, but from natural links.
If a page was getting referrals, I would look at the context of the links, and find an appropriate existing page to do the 301 redirect to (not necessarily the root page).
If the page wasn't getting any referrals, but I was able to determine that there were a number of external links to it, I would also do the 301 to the appropriate page.
For the remainder, I returned a 404 response with a custom error page / mini site map.
If your pages are still in the main index after all this time, are you sure that there are no links somewhere on your site still pointing to them? I've seen cases where a delinked page that was left on the server was still being spidered by the SEs, and causing problems.
Neither have I, but suspect it's down to nobody knowing for sure.
Using 301s is too obvious a promotional method to have been ignored by G IMO, but I've only seen one "official" mention of it, in a recent post by MC where he says non-relevant links from 301s can be "dangerous".
Where this leaves relevant links from a network of on-topic 301ed domains is anyones guess..
Unless you do have some links still working, as suggested above.
personally, rather than clutter your server with loads of unnecessary 301s, all pointing to one page, why not save that one page as your 404? Less work, less clutter, future proof.
But do check your site navigation - xenu is your friend. And if you have a database site that throws up multiple URLs, robots.txt will enable you to avoid reoccurrence. Clone pages with unique URLs are often part of the problem.
FWIW, I doubt the 301s will either help or hinder your site in SEO terms. Strictly neutral, and therefore of no interest to Google either way.
It leaves 301 redirects as the proper thing to do with a URL that is no longer valid. This is part of the HTTP protocol [w3.org], and the major SEs are not going to "penalize" things at that level without strong analysis to confirm inappropriate use.
In simple terms, using a 301 to redirect a dead URL to its logical replacement is the correct thing to do. However, killing off thousands of URLs is the wrong thing to do, and should be avoided. With today's server-side technology (e.g. mod_rewrite, ISAPI Rewrite, MySQL), there's little excuse to ever have to kill off or change a URL [w3.org], and well-managed sites don't do it.
To put this in perspective, assume that the major search engines see the Web as a public library, and not as a weekend bookseller's stall at a flea market. If your site has indexable documents always popping in and out, it's a flea market, and they can't be blamed for giving it less emphasis than they would to a well-maintained library. Although the Web has moved from an primarily research and academic focus to a more-commercial focus, the SEs still see the former as the 'ideal,' and only tolerate the latter.
jonrichd's plan is a good one.
Jim
[edit] Added quote for context. [/edit]
[edited by: jdMorgan at 1:49 pm (utc) on Jan. 17, 2007]
For me, the best practice is pretty much what JD said - if there's a true replacement page or domain, then 301. Otherwise 404.
It's quite reasonable to assume that at some level the SEs don't appreciate Webmasters wasting their bandwidth, CPU time, and disk space on a bunch of duplicates. On the other hand, I'm not a big believer in "penalties" -- I'd suspect the duplicates would be filtered (ignored), ranked into obscurity because of PR/link-pop splitting across all those domains, or simply dropped into G's Supplemental index.
Jim
Remember, make your website for humans, not search engines. PR really does not apply to serps so people really need to stop worrying about it.
Hundreds of 301's could throw some flags up at google. Why take the chance, especially if there are no human visitors hitting the pages.
You really have to look at the traffic that is going to those pages.
PR really does not apply to serps so people really need to stop worrying about it.
I'm sure you've overstated that one, trinorth - let's call it poetic license or hyperbole. Real-time PR (not the toolbar report) has a definite impact on SERPs as one of may factors. But I agree that people are much too obsessed about their toolbar greenies.
[edited by: tedster at 6:32 pm (utc) on Jan. 18, 2007]
In our case, our content management system was cloning pages, such that widgets.html also existed as widgets.html?page=1, page=2 ,etc... and thousands of pages were being indexed while meta descriptions and titles and most of the other content on the page were exactly the same for page=1, page=2, page=3, etc... yes, I know that sounds ugly. Not only did these pages need to be deprecated, so did our CMS
I was considering writing a function that would 301 all query string pages and so widgets.html?page=1, page=2, etc.. would 301 to widgets.html. In the end, just to be safe, I ended up just serving up the 404 page for the thousands of query string indexed pages. I then ran Xenu to make sure there were linkages to any of these pages still. Thanks
For me, the best practice is pretty much what JD said - if there's a true replacement page or domain, then 301. Otherwise 404.
tedster, jd, have you done anything recently with 410's? Or, are the SE's handling 410's the same way as 404's?
I know if I had the opportunity to keep that bot from requesting that URI again in the future and not wasting valuable resources, I'd surely jump on the bandwagon and strongly advocate the use of 410 over 404 for URIs that are Gone.
There is an inherent problem with serving a 404 for a URI that you know is Gone. The bot will continue to request that URI for quite some time.
I've always used 410-Gone for pages which had to be removed and had no logical replacement. This is a very rare occurrence, as I subscribe to Tim Berners-Lee's philosophy that "Cool URI's Don't Change." So, in eleven years on the Web, I've 410'ed under two dozen URL's.
As far as any strong evidence that the SEs treat 410-Gone any differently from 404-Not Found, I don't have any. But according to the HTTP/1.1 protocol, it's the right thing to do, so if and when they support it properly, my sites are ready. In the meantime, they may treat it as a 404, or perhaps as a generic 400 error -- I have certainly not had any problems using it.
Jim
I have a forum that is constantly being used for spam posting and so I delete the offending posts and 410 their URLs because it's really-really gone after I've deleted them. I have noticed that my Google WMT account counts 410s as HTTP errors and 404 as a (slightly less problematic?) separate category. So, I have asked Google if this is a potential problem and, amazingly, I got a human reply stating that Google treats 410 and 404 as the same response. Now, that's still confusing because they ARE different HTTP responses, each serving a different purpose. Also, I know that Y! Slurp is notorious for coming back for 404-ed URLs literally years they'd been gone, and so I use 410 in hope to save some Slurp bandwidth. I'm not sure it's helping but at least there's hope.
Anyways, back to the subject: as far as Google is concerned, I think it's safe to assume they treat 404 and 410 exactly the same way.
Hit hard by the so-called Google 950 penalty [webmasterworld.com], I took a look at the potential causes. The first thing that jumped out at me, was I had several 301 directives for defunct webpages pointing at my index page. Over the last few years, I just kept adding. I even added a few non-existent page names that were often tried by users, just to keep them landing on my index page - my bad.
I removed the 301s, and after a couple days my site returned to normal ranking for all terms associated. Make of it what you will.