Forum Moderators: phranque

Message Too Old, No Replies

Which Fields to G and Y read when crawling pages?

And which are available to them on your host ISP?

         

larryhatch

1:07 pm on Dec 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lets say Google or Yahoo crawls a number of pages on your site.
Which fields do you think they upload?

Obviously, they pull in the Title, the Head, the Body (if indexing the page) and so on.

1) I presume they upload the Date, the most recent update of the page file.

2) How about the Time-of-day? That might indicate something about the webmaster.

3) File-length in bytes? That's a quick easy check for changes in a page.

4) I'm sure I missed others. Please add them in.

Does anyone know, or have some evidence that G or Y checks all these fields on a crawl?

Are any of these typically made unavailable to the SEs by your typical host ISP,
or do they have the same access that I do when I read the remote (host) directory for my site?

I'm not talking about robots.txt or .htaccess exclusions, please presume there are none.

Can the Almighty G (or Y) see absolutely everything on the servers when they spider?

And .. what have I left out above? (its late here, I'm groggy.)

- Larry

larryhatch

2:11 am on Dec 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I should simplify by question..

During an ordinary crawl, (no robots or htaccess exclusions)

1) Does Google or Yahoo note the Length of HTML (etc.) files in Bytes?
2) How about the Date the file was last saved?
3) How about the Time of day?
4) The nunber/percentage of pages updated within a period of time, for a given site?

Does anyone know or some indications of this?

- Larry