301s will also occur if you redirect www. to non-www or vice versa, so the ones you are seeing are not necessarily just because of your page redirects.
It should be easy to tell from your raw (full-text) HTTP server logs where the requests are coming from. The well known robots use user-agent strings that clearly identify themselves, and if in doubt, you can look up the IP addresses at a WhoIs service, which will show them to be from Google or Yahoo or wherever.
The way to indicate that a page does not exist anymore (and force the page out of search engine indexes) is a 410 rather than a 404, but I would personally not return either a 404 or a 410 for any page that has been renamed/moved (such that the old URL *does* have a new location) and for which you are concerned about its ranking. And certainly not until you are sure where those requests are coming from.
If the page is moved, the correct response is 301, no matter how long it takes others to catch on.
I have returned 410 for a couple of pages that I either moved or collapsed into other pages, but only after a period of returning 301, and I *didn't care* about SEO or whether the PageRank of the page would be transferred to the new page.
In addition, what if those incoming requests are referrals from other sites whose links point to your old page URLs? Currently they get redirected to the correct new page. If you do a 404 or 410, those visitors won't reach the correct page, and the webmasters, if they bother to check their outgoing links at all, will probably just delete the links rather than try to track down your new corresponding page name.
Thanks for the response Steve.
We actually took over the domain from the previous company that was operating it. A lot of the old content does not have any suitable replacements, so it is 301 redirected to the homepage, do you think this could be a problem?
I think it could be. The home page is not a new version of the old URL - that is, the content has not "moved", it's just gone. I know of several websites that got into ranking trouble by 301 redirecting many pages to home (or a top level page of some kind) instead of returning a 404 or 410.
It doesn't pay to squeeze on the PR too tightly. I'd say look for important backlinks that point to problematic pages and create appropriate content at the same URL - even if it's just an explanation about the change to the website.
|I know of several websites that got into ranking trouble by 301 redirecting many pages to home (or a top level page of some kind) instead of returning a 404 or 410. |
I have to ask you, how confident are you in that assessment? 50% confident? 75% confident? 100% confident?
I have about two or three dozen product pages where the product has been discontinued, so they 301 to the main category page for that product.
If you are certain of this, then I know what I will be doing this weekend...
I'd agree with Tedster. Redirecting multiple old URLs to root is not a good thing to do.
You want a figure? Hard to say, but I'd say 90% or more.
I'd agree with Tedster. Redirecting multiple old URLs to root is not a good thing to do.
Does anybody else want to weigh in on this?
The reason that 301 to the Home Page (instead of 404) can cause trouble is pretty straightforward - it's technical deception.
A 301 says that the content has "Moved Permanently". But in the case of removed content, it hasn't moved, it's just plain old not available any more. I've seen Webmaster Tools warnings about too many 301s of this kind.
With several variations, this kind of thing has even been a part of those practices that Google considers "black hat". This history (along with poor technical execution) means Google needs to trust-check every 301 before its effect is allowed into the rankings.
Thank you for the clarification.
|I'd say look for important backlinks that point to problematic pages and create appropriate content at the same URL - even if it's just an explanation about the change to the website. |
Just to clarify do you recommend to return a 200 OK for URLs you don't want to handle instead of a 301? Because I find it more deceptive than a standard 301.
Technically you would think a 301 will force the spider to update its index with the target URL asap. And as you know what is happening is an old or even non-existent link is accessed indefinitely. All it takes is another site that posts a link to your site.
So if I assume the spider doesn't keep a record of invalid URLs, something I find logical, it keeps accessing the same URL cause it is listed someplace else, even outside the domain. And that's the problem I believe.
Instead, the spiders should only access, list and update URLs in their index, found inside the domain and nothing else. In other words URLs starting from the root of the domain. This may be happening to an extend with their index but not with the access or update.
|do you recommend to return a 200 OK for URLs you don't want to handle instead of a 301 |
Not exactly - I meant that if you are trying to preserve that backlink's equity then you should have content related to the original content for that URL. Then you can return a 200 OK. In some cases, it is conceivable that the home page could serve that function, but in my experience that's a rare situation.
If you put yourself in the place of a user who clicks on the link, I think that communicates my idea a bit more clearly. Such a user would not be well served by ending up on a page that doesn't relate to what the link promised. And if a site routinely serves many 301s, pointing to the same target page for diverse backlinks, that's the situation where I sometimes see ranking problems coming up.
olly, are you using GWT? If so, is GWT reporting any Soft 404s?
Google Displaying 'Soft 404' Errors in Webmaster Tools
2010-06-06 - [webmasterworld.com...]
|I meant that if you are trying to preserve that backlink's equity then you should have content related to the original content for that URL |
Yes, but I think if the URLs were removed for years and that's from what I understand is the issue, the spiders insist of accessing them today and presumably for years to come. And IMO this usually happens, because the URLs are posted outside the domain. I see it on my site too btw.
The best I have come up with is to dynamically try and locate the closest URL that matches the request and redirect there with a 301 instead of using the root. And that doesn't mean the original request is always relevant with the redirect but it may work for some cases.
If I was having a 404 that wouldn't change anything in the spider's index. The next time it will crawl the same external site's page and come in again getting another 404.
The result of all this is the spider wastes b/w from me on the one hand and on the other, wastes its resources instead, it could crawl a valid page.
|wastes its resources instead, it could crawl a valid page |
That does seem like a reasonable assumption - but I haven't seen the crawl budget work that way. And a 404 doesn't use up much bandwidth anyway.
|The best I have come up with is to dynamically try and locate the closest URL that matches the request and redirect there with a 301 |
That sounds good to me.
I have 301s that are more than 6 years old.
Once you put up a 301, don't change it. It's called *permanent* redirect for a reason ;)
|The result of all this is the spider wastes b/w from me on the one hand and on the other, wastes its resources instead, it could crawl a valid page. |
I gotta' ask...
Is there a way to make it into a "valid" page?
By that, I mean making it into a page that will somehow help you generate revenue?
I have some old (discontinued) products on my ecommerce site and used to have them redirect to the home page (which has been pointed out is NOT something you should do). So I am going to redirect them to a "product archive" page (which would be the same as the original page except that it clearly states the item is no longer available) and then add links to the most relevant products that ARE still available.
Agreed, it would be better if google spent its resources crawling only the pages we want it to, but on the other hand, if the page still has the potential to bring in visitors, I think there might be a way to leverage that.
|I know of several websites that got into ranking trouble by 301 redirecting many pages to home (or a top level page of some kind) instead of returning a 404 or 410 |
Have you seen any successful remedial steps taken , or is it just a case of waiting?
One suggestion for reversal that I've heard is to steadily remove the 301's and replace them with 404's , but not all at once. Thoughts ?
[edited by: Whitey at 3:11 am (utc) on Aug 17, 2010]
The sites that I know starting ranking again undid the excessive 301 redirects.
What's the prognosis for those that do nothing and is a reconsideration request another avenue ?
I've never heard of success in that situation, Whitey. Doesn't mean it hasn't happened, but I never heard of it.
Reconsideration requests tend to be successful more often when you've made real changes. The other successful situation is when there was a false positive penalty.
If you've got a pile of 301s pointing to the same page from diverse former URLs, I'd just let them all go 404. Do you have a true 404 response set up for bogus URLs, or does every not found url 301? That's the worst situation to be in. A savvy competitor can knock your rankings for a loop.
Can you explain - I've not heard of this phrase.
|Do you have a true 404 response set up for bogus URLs, or does every not found url 301 |
As far as i know all old URL's were either 404'd or 301'd only. What concerns me is that some old site's had all the their URL's either 301'd to the root domain , or in some redirected domains, to a subdirectory URL . I'll have it checked out though.
A false positive - as in a heuristic for the algo tries to catch some practice in order to automate a type of penalty. But sometimes the heuristic test flags a few sites by accident that were not actually doing what the heuristic was trying to catch.
For a classic from the early days of search, a page might have a white background and also use white text inside a div somewhere. Now that div happens to have a dark colored background image so the text is not actually hidden from visitors. But the early days of search, attempts to catch hidden text in an automated fashion sometimes flagged this kind of thing as hidden text - and that is a false positive.
There was a publicized case on the Google Forums last year where someone got penalized because Google's page layout simulation showed "too much white space" - but that assumed white space was actually filled by an iframe.
|Redirecting multiple old URLs to root is not a good thing to do |
So would you say it's also not a good idea to redirect them all to a single sub-folder ?
I personally would redirect pages that
(a) no longer exist and have an equivalent (best example would be where two pages have merged) - to the equivalent
(b) no longer exist and have no equivalent (best example here would be an ecommerce site with a discontinued product ) - to their parent
If a page has never existed or does not yet exist (unused prod ids or article ids or invalid character strings) then serve a 404.
If you 301 everything and Googlebot decides to go ape on your site and start requesting possible urls then you're telling it that you have/had potentially thousands of pages you don't. As it's been confirmed that 301s do drain at least some PageRank you are potentially weakening your site in at least one way.
You also leave your site open to attack. If someone were to maliciously link to loads of non-existent (but valid) urls and get Google to crawl those links, this would create the same issue.
Firstly thanks for all the responses. There is some great insight here. To reiterate we took over the site a couple of years back and have since then changed the url structure and therefore 301'ed all the old urls to the homepage as a lot of them we do not have content for. We are not using 301's for pages that have never existed, we are doing what FranticFish said and using 404s.
From what I have gathered here I think the best route forward would be to get rid of these 301's and rather use 404's. However before I go ahead and do that I was wondering a few things. A few of these urls do have content which we could 301 to. But now comes another problem. Say I have a few urls which I 301 to the same page (not the root), would that count against us? Even if they are pointing to relevant content?
|One suggestion for reversal that I've heard is to steadily remove the 301's and replace them with 404's , but not all at once. Thoughts ? |
Can I do them all at once or should I go for what Whitey said above?
The timing of this thread is perfect. I was checking one of my domains and found that about a year ago the URL structure was changed.
Most of the site was changed to reflect the new URL Structure, however the bottom navigation was hard coded into a module that the developer was too lazy to change, so he 301'ed 15 menu items back to home.
15 x 100's of pages = BAD NEWS. Yesterday we rectified the issue and I am interested to see the ramifications. The site is hovering 14 at the moment for a highly competitive keyword. I will keep you posted, but not sure how long it will take to reflect.
|Say I have a few urls which I 301 to the same page (not the root), would that count against us? |
I've never noticed any problems doing this myself.