homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Site Restructuring, 404s and Google
Roaming Gnome

 4:43 am on Sep 15, 2012 (gmt 0)

After a CMS change I 301 directed all aspects of the site I wanted to keep to the appropriate new URL. Google is spidering the old pages which no longer exist, thus given 404 errors.

How does one go about informing google that the pages it is looking for are no longer valid?




 6:19 am on Sep 15, 2012 (gmt 0)

A 404 or 410 status is what you need - sounds like you're fine to me. It just takes a while for the crawling to slow down on the legacy URLs that aren't re-published elsewhere.

In fact, googelbot will occasionally request those old URLs for years, but at a much slower frequency. Don't worry about it unless you see a 404 status when you think that URL should be a 301.

And by the way - a 404 is a kind of "error" for a crawler - but it's an error you intended to happen. So it's not the kind of "error" that you need to fix, unless you intended that URL to resolve. It's just included in the report for your information.

[edited by: tedster at 9:30 pm (utc) on Sep 15, 2012]

Roaming Gnome

 1:48 pm on Sep 15, 2012 (gmt 0)

Thanks, that makes me feel better. *Thumbs Up*


 9:19 pm on Sep 15, 2012 (gmt 0)

There's a persistent rumor that google gets the message faster with 410s than with 404s. A 410 is intentional; a 404 is the generic "can't find it". So if you can do it without making your htaccess balloon to thousands of lines, include explicit 410s for the pages you're not redirecting. And make sure to specify a nice 410 page for the humans. It can even be the same physical page as the 404 page. Don't let them get the Apache default; it's scary. (And the IIS default is probably scarier. Their error messages always make me think something went seriously wrong in the deepest recesses of the server.)


 9:56 pm on Sep 15, 2012 (gmt 0)

404 - the server can't find it, doesn't know if it ever was here, and has no idea whether it might come back in the future.

410 - it's gone and it ain't coming back (though Google comment that an unnervingly large number of URLs that have returned 410 in the past do at some point come back to life again - and that's why they spider them forever).

Roaming Gnome

 5:36 am on Sep 16, 2012 (gmt 0)

Thanks for the additional info. I went ahead and knocked out the unwanted pages with a RedirectMatch 410.htaccess entry. (To a custom page just in case any human eyes hit it.)


 7:41 am on Sep 16, 2012 (gmt 0)

If you have any RewriteRules in your htaccess file you should not use any Redirect or RedirectMatch directives. Directives are processed in "per module" order and not in the order written in the htaccess file and so you cannot guarantee module execution order. There was much longer discussion on these points in another thread here only yesterday.


 8:52 am on Sep 16, 2012 (gmt 0)

There have also been multiple discussions about the (un)wisdom of mixing up redirects-- regardless of mechanism-- with error documents. You want the page to return a 410, not a 302.

Roaming Gnome

 5:52 pm on Sep 16, 2012 (gmt 0)

RedirectMatch was used for regex matching. I could not get the test pages to work with RewriteRule.


 6:08 pm on Sep 16, 2012 (gmt 0)

That points either to a syntax error or rules in the wrong order, or both.

Roaming Gnome

 6:39 pm on Sep 16, 2012 (gmt 0)

Thanks for the advice. I will check my syntax.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved