Forum Moderators: Robert Charlton & goodroi
Does Google ever get the message?
(there you go ted no mention of hijacks lol)
[edited by: Pirates at 2:40 am (utc) on Oct. 10, 2006]
Resource
A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.
The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, providedthat the conceptual mapping is not changed in the process.
A URI can be further classified as a locator, a name, or both.
[w3.org...]
And a properly applied 301 means:
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
[w3.org...]
It appears proper use of a 301 applies to any resource, which would include URI(s), URL(s) and URN(s).
Justin
There are plenty of legitimate reasons for using 301's and 410's. I'm tired of hearing all these lame excuses for Google. Their constant failure to handle these correctly is just plain poor.
If the new target URL for a 301 is in the index, then that's what matters in my opinion.
I think it's good if googlebot wants to double check the old url for a while, and even a long while. If a 301 meant that never again would googlebot ask for a URL after a 301 was in place, that could be a big problem for many.
Agreed, but that's not what I said. As was pointed out in the thread, if there is a link pointing to a URL, it should be crawled, that doesn't mean it should be indexed. By definition, a URL is a "resource locator". If there's no resource at that location than what is there to index? When I decide to put something back at that URL I'll let you know! How? By linking it to the rest of my "web" so when you crawl it you'll see that there's something there. After all, it is my website!
And you're absolutley right, it's a major headache for a large dynamic site.
<sorry for the rant>
The scan more URLs than they index. They index more URLs than they show in their SERPs.
There is no way that it could be done any differently.
If they didn't store data about "gone" and "moved" URLs they would revisit them every time they "rediscovered" a link pointing to that URL, thinking they were new. So, of course they store the status of those.
However, if they marked "gone" and "moved" URLs to never be visited again, they would never spot if that URL came back into use with a new site owner. That would also be a gross error, so they recheck them every now and again, "just in case".
1. As far as I know there are no links into these pages from the outside. The few that I found by reviewing the logs of the 301's looking for the referrer have a 301 on them.
2. My Google sitemap is complete and does not list these.
I would prefer not to have 404's on changed pages from search engine visits.
I would prefer not to have a 1,000 line .htaccess file for 5 years.