Forum Moderators: phranque
I couldn't answer what they actually look for, you can download a program called CURL or WGET to see what's available to them though.
They both work from the command line, this is webmasterworld's headers:
C:\Documents and Settings\Richard Lees>c:/curl -I www.webmasterworld.com
HTTP/1.1 200 OK
Date: Fri, 07 Jan 2005 07:41:20 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510
Cache-Control: max-age=0
Pragma: no-cache
X-Powered-By: BestBBS v3.15
Content-Type: text/html
I think the first 3 are mandatory in any HTTP response.
Some people are keen to hide the "powered by" header to disguise the fact they are using a server side language.
Others will deliberately alter the "last modified" headers to make the page appear fresher than it is.
Most of it can be faked, I doubt the SE's give any "special" attention to any of the headers outside the caching ones, though I'm probably missing something obvious :)
Maybe I missed it, but in the WebmasterWorld example you gave, I did not see the file length in bytes.
Of course, if G or Y spiders in the whole page, they can just look at the file length on their own logs.
My reason for asking is that I'd like to know which fields or info might be relevant to ones rankings.
Anyone can falsify their meta tags, but a change of file-length might indicate an actual revision better.
Best wishes - Larry
Possibly, but then there would be occassions where you've updated a page, and it just so happens it's the same length in bytes.
I guess the HTTP spec created the likes of the "last modified" header for this kind of thing. In an ideal world (where the data was always true), you could just use that to see if a page had been changed or not.