Depth of each Google crawl?

Forum Moderators: open

Message Too Old, No Replies

Depth of each Google crawl?

Reno

2:45 pm on Aug 11, 2002 (gmt 0)

In my other posting, vitaplease gave me a way to check to see the date when our pages were cached by Google. I did that, and found something interesting: one page that I checked was cached in the past week, the other on June 28th. Both pages are on the same site and have been online the same amount of time. However, the one cached in the last week is "higher" in the hierarchy.

So using this as the "hierarchy" set-up to my question:

top level = index.html
2nd level = pages linked directly off of index.html
3rd level = pages linked off second level, but not off top
4th level = pages linked off third level, but not off second or top, etc etc.

Q. When Google does a crawl, do they *always* crawl deep (into the 3rd and 4th level for example)? Or do they reserve the "deep crawl" to only certain times?

Thanks...

bcc1234

7:49 pm on Aug 11, 2002 (gmt 0)

How do you check when the page was crawled ?
I just insert the timestamp in the comments at the end of the page and then check the cached version. Is there a better way ?

Reno

11:19 pm on Aug 11, 2002 (gmt 0)

I am not familiar with "timestamp". When a site has hundreds of pages, is that an automatic insertion?? What is the code that is added to each page to do this?

bcc1234

12:40 am on Aug 12, 2002 (gmt 0)

I am not familiar with "timestamp". When a site has hundreds of pages, is that an automatic insertion? What is the code that is added to each page to do this?

Ok, try <url with timestamps>, go to any page and view the source (at the very end).

I place timestamps on all pages.
I just thought there might be a way to check the crawl date of a page directly on the google site.

Reno

2:16 am on Aug 12, 2002 (gmt 0)

In another thread here at WebmasterWorld, I asked about whether I could see the date of a page in the Google cache. One of the forum members - vitaplease - told me to go to:

search2.cometsystems.com/

and check the "archived copy". Vitaplease said that it gives you a date stamp of the Google cache. I do not understand the relationship between cometsystems and google, but there is in fact a date on the archived copy at comet.

So for example, when I go there and check:

<bcc1234's site>

I get:

<page title>
... <snippet text> ...
<page url> - Archived copy - Related pages

Clicking the "Archived copy" links says: "It is an archived copy dated Jun 28, 2002."

I checked your source code - how did you get it to put that exact time/date at the bottom of the html page like that? Would appreciate any guidance...

[edited by: ciml at 3:04 pm (utc) on Aug. 12, 2002]
[edit reason] url snipped. [/edit]

bcc1234

2:45 am on Aug 12, 2002 (gmt 0)

I checked your source code - how did you get it to put that exact time/date at the bottom of the html page like that? Would appreciate any guidance...

I don't know of any standard way to do it.
In that case I used jsp so the easiest solution was:

It all depends on the technology that you are using.

In my case, the bottom is a separate file that is included in all pages. So by adding that line to my page_bottom.inc - I got the timestamp on every page on the site.

Another way would be to configure apache to add to the content.
Check the mods, I remember going over one that takes a response and adds to it before passing to the client.

That way it would not matter what technology you are using. The timestamp would be added to the response.

I don't remember the name of the mod, but it does exist.
That's how free hosting sites add their ads to peoples' pages.
Search for it on google :)

Key_Master

2:57 am on Aug 12, 2002 (gmt 0)

Reno,

On SSI enabled servers you can insert the following code into the pages of your site.

-->

When the page is viewed it will return something similar to:

vitaplease

8:20 am on Aug 12, 2002 (gmt 0)

Q. When Google does a crawl, do they *always* crawl deep (into the 3rd and 4th level for example)? Or do they reserve the "deep crawl" to only certain times?

Reno,

It is generally so that a higher Pagerank (throughout your site) will give you more frequent deeper crawls.

Also check this thread: [webmasterworld.com...]

Googles reasons for deep-spidering one or the other site are often unclear.

In this thread: [webmasterworld.com...] you will find that for example Google indexes less pages of WebmasterWorld than Alltheweb claims to do.

Google also does frequent spidering to a few of a (high Pagerank) site's pages and then indexes the page and caches its version for a few days.

This will help for that: [researchbuzz.com...]

web_india

9:03 am on Aug 12, 2002 (gmt 0)

>> It is generally so that a higher Pagerank (throughout your site) will give you more frequent deeper crawls.

What high PR ensures frequent deep crawls, vitaplease ?

Reno

1:25 pm on Aug 12, 2002 (gmt 0)

Thanks vitaplease for the forum references - I read the threads, which answered by original question. Thanks too Key_Master and bcc1234 for the timestamp tips. Regrettably they don't work at my site (I tried)...

Brett_Tabke

1:52 pm on Aug 12, 2002 (gmt 0)

You really don't want to do that and give Google the impression a page has changed every time it is viewed. It gets marked as dynamic content - not something you want.

incywincy

2:03 pm on Aug 12, 2002 (gmt 0)

hi brett,

does that mean that including a counter or current date on your pages will have an adverse affect on your google ranking?

Reno

2:34 pm on Aug 12, 2002 (gmt 0)

I also thank you Brett - that is an angle that had not occurred to me.

bcc1234

3:40 pm on Aug 12, 2002 (gmt 0)

You really don't want to do that and give Google the impression a page has changed every time it is viewed. It gets marked as dynamic content - not something you want.

I thought the same, but I had one site where about 20% of the content changed with every request. And all URLs were dynamic (...?par=val...).
The site had about 100 pages, all of them got indexed.