Welcome to WebmasterWorld Guest from 54.226.62.251

Message Too Old, No Replies

Site Restructuring, 404s and Google

     

Roaming Gnome

4:43 am on Sep 15, 2012 (gmt 0)



After a CMS change I 301 directed all aspects of the site I wanted to keep to the appropriate new URL. Google is spidering the old pages which no longer exist, thus given 404 errors.

How does one go about informing google that the pages it is looking for are no longer valid?

Thanks

tedster

6:19 am on Sep 15, 2012 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A 404 or 410 status is what you need - sounds like you're fine to me. It just takes a while for the crawling to slow down on the legacy URLs that aren't re-published elsewhere.

In fact, googelbot will occasionally request those old URLs for years, but at a much slower frequency. Don't worry about it unless you see a 404 status when you think that URL should be a 301.

And by the way - a 404 is a kind of "error" for a crawler - but it's an error you intended to happen. So it's not the kind of "error" that you need to fix, unless you intended that URL to resolve. It's just included in the report for your information.

[edited by: tedster at 9:30 pm (utc) on Sep 15, 2012]

Roaming Gnome

1:48 pm on Sep 15, 2012 (gmt 0)



Thanks, that makes me feel better. *Thumbs Up*

lucy24

9:19 pm on Sep 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



There's a persistent rumor that google gets the message faster with 410s than with 404s. A 410 is intentional; a 404 is the generic "can't find it". So if you can do it without making your htaccess balloon to thousands of lines, include explicit 410s for the pages you're not redirecting. And make sure to specify a nice 410 page for the humans. It can even be the same physical page as the 404 page. Don't let them get the Apache default; it's scary. (And the IIS default is probably scarier. Their error messages always make me think something went seriously wrong in the deepest recesses of the server.)

g1smd

9:56 pm on Sep 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



404 - the server can't find it, doesn't know if it ever was here, and has no idea whether it might come back in the future.

410 - it's gone and it ain't coming back (though Google comment that an unnervingly large number of URLs that have returned 410 in the past do at some point come back to life again - and that's why they spider them forever).

Roaming Gnome

5:36 am on Sep 16, 2012 (gmt 0)



Thanks for the additional info. I went ahead and knocked out the unwanted pages with a RedirectMatch 410.htaccess entry. (To a custom page just in case any human eyes hit it.)

g1smd

7:41 am on Sep 16, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you have any RewriteRules in your htaccess file you should not use any Redirect or RedirectMatch directives. Directives are processed in "per module" order and not in the order written in the htaccess file and so you cannot guarantee module execution order. There was much longer discussion on these points in another thread here only yesterday.

lucy24

8:52 am on Sep 16, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



There have also been multiple discussions about the (un)wisdom of mixing up redirects-- regardless of mechanism-- with error documents. You want the page to return a 410, not a 302.

Roaming Gnome

5:52 pm on Sep 16, 2012 (gmt 0)



RedirectMatch was used for regex matching. I could not get the test pages to work with RewriteRule.

g1smd

6:08 pm on Sep 16, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That points either to a syntax error or rules in the wrong order, or both.

Roaming Gnome

6:39 pm on Sep 16, 2012 (gmt 0)



Thanks for the advice. I will check my syntax.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month