Google and 410's and 301

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google and 410's and 301

Does Google Ever "Get" It Moved

4specs

12:10 am on Oct 10, 2006 (gmt 0)

I moved about 800 pages to new names as part of a reclassification to a new system. I added 301 redirect which I maintained for about 3-4 months. I found Google still checking the old URL. I then made all the pages 410's and Google is now ranking these as HTTP errors.

Does Google ever get the message?

tedster

1:41 am on Oct 10, 2006 (gmt 0)

Yes, but only after at least a year.

Pirates

2:02 am on Oct 10, 2006 (gmt 0)

I don't see anything google is doing here is wrong. They are showing a suspicion of 301, great in my opinion 301 should relate to entire sites not individual pages.
Then you say google have kept a record of all your previous content, aren't they great. Well you put the pages up in the first place and presumabley would be glad of people keeping a record. Now you want to change all your content to a new url format but maintain the benefit of old urls. Nah don't agree with that. Any page you put up should be evualated as new and content as new.

(there you go ted no mention of hijacks lol)

fjpapaleo

2:18 am on Oct 10, 2006 (gmt 0)

"301 should relate to entire sites not individual pages"

Says who? How bout if Google just follows the rules like the rest of us instead of making up their own, wouldn't that be nice?

Pirates

2:29 am on Oct 10, 2006 (gmt 0)

I do not represent google. Yes would jump at a job on Matts team because I am passionate about identifying spam but it will Never happen so any remarks I make should be taken with a huge pinch of salt.

[edited by: Pirates at 2:40 am (utc) on Oct. 10, 2006]

BigDave

2:52 am on Oct 10, 2006 (gmt 0)

Google should keep checking any page that has a link poiting to it, and you should leave your 301 up forever.

What google should do with that 301 is to credit the moved page and remove the old page from the index, but they should keep checking the old URL it it has links pointing to it.

Pirates

3:08 am on Oct 10, 2006 (gmt 0)

No sites should try to maintain consitant urls and use 301 for correct purpose of 301 sites not pages.

jd01

3:31 am on Oct 10, 2006 (gmt 0)

I am not sure how 301's are correctly only applied to an entire site. It appears to me the W3C defines a resource as:

Resource
A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.

The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, providedthat the conceptual mapping is not changed in the process.

A URI can be further classified as a locator, a name, or both.

[w3.org...]

And a properly applied 301 means:

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.

[w3.org...]

It appears proper use of a 301 applies to any resource, which would include URI(s), URL(s) and URN(s).

Justin

fjpapaleo

3:52 am on Oct 10, 2006 (gmt 0)

Yeah, it's not brain surgery. If I tell you there's nothing at that URL any longer, that I've moved it to a new location (URL), than you don't need to index it any longer. It's that simple.

There are plenty of legitimate reasons for using 301's and 410's. I'm tired of hearing all these lame excuses for Google. Their constant failure to handle these correctly is just plain poor.

tedster

4:44 am on Oct 10, 2006 (gmt 0)

And what do you feel Google should do if you or some future domain owner decides to use that url again? This happens all the time -- people make technical errors and revert to a previous situation, etc. URL rewriting technology in particular can often be implemented in a buggy fashion on the first iteration for a complex, dynamic website.

If the new target URL for a 301 is in the index, then that's what matters in my opinion.

I think it's good if googlebot wants to double check the old url for a while, and even a long while. If a 301 meant that never again would googlebot ask for a URL after a 301 was in place, that could be a big problem for many.

fjpapaleo

11:22 am on Oct 10, 2006 (gmt 0)

"If a 301 meant that never again would googlebot ask for a URL after a 301 was in place, that could be a big problem for many."

Agreed, but that's not what I said. As was pointed out in the thread, if there is a link pointing to a URL, it should be crawled, that doesn't mean it should be indexed. By definition, a URL is a "resource locator". If there's no resource at that location than what is there to index? When I decide to put something back at that URL I'll let you know! How? By linking it to the rest of my "web" so when you crawl it you'll see that there's something there. After all, it is my website!

And you're absolutley right, it's a major headache for a large dynamic site.

g1smd

11:38 am on Oct 10, 2006 (gmt 0)

Google will recheck every URL that they have ever seen once in a while just to make sure the status is the same as it was previously.

The scan more URLs than they index. They index more URLs than they show in their SERPs.

There is no way that it could be done any differently.

If they didn't store data about "gone" and "moved" URLs they would revisit them every time they "rediscovered" a link pointing to that URL, thinking they were new. So, of course they store the status of those.

However, if they marked "gone" and "moved" URLs to never be visited again, they would never spot if that URL came back into use with a new site owner. That would also be a gross error, so they recheck them every now and again, "just in case".

4specs

5:55 pm on Oct 10, 2006 (gmt 0)

Several more comments as the original poster:

1. As far as I know there are no links into these pages from the outside. The few that I found by reviewing the logs of the 301's looking for the referrer have a 301 on them.
2. My Google sitemap is complete and does not list these.

I would prefer not to have 404's on changed pages from search engine visits.

I would prefer not to have a 1,000 line .htaccess file for 5 years.