Page is a not externally linkable
- WebmasterWorld
-- New To Web Development
---- How Do Search Engine Robots Work?


pageoneresults - 8:49 pm on Jan 9, 2007 (gmt 0)


Most search engines use the Last modified header to determines if a page has updated. It makes little or no sense to re-index pages that have not changed.

Okay, what if the server does not support the Not Modified header? I'll assume Googlebot will then reindex that page? Does it have any sort of "compare" functionality. I mean, would it compare the new page to the old page and determine changes and use that if 304 was not supported?

10.3.5 304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code. The 304 response MUST NOT contain a message-body, and thus is always terminated by the first empty line after the header fields.

If Googlebot doesn't do the compare, and a 304 is not supported, that means a page that has not changed since the last indexing gets indexed again.

Wouldn't it be to my advantage to make sure that my server supports the 304 Not Modified? :)

[google.com...]

304 (Not modified)
The requested page hasn't been modified since the last request. When the server returns this response, it doesn't return the contents of the page.

You should configure your server to return this response (called the If-Modified-Since HTTP header) when a page hasn't changed since the last time the requestor asked for it. This saves you bandwidth and overhead because your server can tell Googlebot that a page hasn't changed since the last time it was crawled.

And to harness those bots? I really only want them to index my freshest and most relevant content. Don't I? What if that bot is programmed to retrieve only so much information?


Thread source:: http://www.webmasterworld.com/new_web_development/3206921.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com