Forum Moderators: open
So using this as the "hierarchy" set-up to my question:
top level = index.html
2nd level = pages linked directly off of index.html
3rd level = pages linked off second level, but not off top
4th level = pages linked off third level, but not off second or top, etc etc.
Q. When Google does a crawl, do they *always* crawl deep (into the 3rd and 4th level for example)? Or do they reserve the "deep crawl" to only certain times?
Thanks...
I am not familiar with "timestamp". When a site has hundreds of pages, is that an automatic insertion? What is the code that is added to each page to do this?
Ok, try <url with timestamps>, go to any page and view the source (at the very end).
I place timestamps on all pages.
I just thought there might be a way to check the crawl date of a page directly on the google site.
search2.cometsystems.com/
and check the "archived copy". Vitaplease said that it gives you a date stamp of the Google cache. I do not understand the relationship between cometsystems and google, but there is in fact a date on the archived copy at comet.
So for example, when I go there and check:
<bcc1234's site>
I get:
<page title>
... <snippet text> ...
<page url> - Archived copy - Related pages
Clicking the "Archived copy" links says: "It is an archived copy dated Jun 28, 2002."
I checked your source code - how did you get it to put that exact time/date at the bottom of the html page like that? Would appreciate any guidance...
[edited by: ciml at 3:04 pm (utc) on Aug. 12, 2002]
[edit reason] url snipped. [/edit]
I checked your source code - how did you get it to put that exact time/date at the bottom of the html page like that? Would appreciate any guidance...
I don't know of any standard way to do it.
In that case I used jsp so the easiest solution was:
<!-- <%= new java.util.Date(System.currentTimeMillis()).toString() %> -->
It all depends on the technology that you are using.
In my case, the bottom is a separate file that is included in all pages. So by adding that line to my page_bottom.inc - I got the timestamp on every page on the site.
Another way would be to configure apache to add to the content.
Check the mods, I remember going over one that takes a response and adds to it before passing to the client.
That way it would not matter what technology you are using. The timestamp would be added to the response.
I don't remember the name of the mod, but it does exist.
That's how free hosting sites add their ads to peoples' pages.
Search for it on google :)
Q. When Google does a crawl, do they *always* crawl deep (into the 3rd and 4th level for example)? Or do they reserve the "deep crawl" to only certain times?
Reno,
It is generally so that a higher Pagerank (throughout your site) will give you more frequent deeper crawls.
Also check this thread: [webmasterworld.com...]
Googles reasons for deep-spidering one or the other site are often unclear.
In this thread: [webmasterworld.com...] you will find that for example Google indexes less pages of WebmasterWorld than Alltheweb claims to do.
Google also does frequent spidering to a few of a (high Pagerank) site's pages and then indexes the page and caches its version for a few days.
This will help for that: [researchbuzz.com...]
You really don't want to do that and give Google the impression a page has changed every time it is viewed. It gets marked as dynamic content - not something you want.
I thought the same, but I had one site where about 20% of the content changed with every request. And all URLs were dynamic (...?par=val...).
The site had about 100 pages, all of them got indexed.