'spidering' 'indexing' and 'caching'

Forum Moderators: open

Message Too Old, No Replies

'spidering' 'indexing' and 'caching'

there are no stupid questions, only stupid answers

Proust

12:37 pm on Dec 4, 2003 (gmt 0)

Hello everybody,
I am new here on the board.
I would like to ask the distinguished experts here some (i suppose newbe) questions about 'spidering' 'indexing' and 'caching'
Google spiders once a month (right?), it then 'indexes' the page and at the same time 'caches' the page.
Since my cache as seen in the Google bar changes every 2 days (i think due to my meta <meta name="REVISIT-AFTER" content="2 DAYS"> how does that relate to the spidering and indexing?
I reckon Google computes the PR from the 'indexing'
So am i correct that the cache is 'only' a service by Google to show surfers 'how the page most recently looked' , that spidering is the 'hard work' done by Google in order to 'index' the page and calculate the PR?
Thank you

Jbrookins

2:41 pm on Dec 4, 2003 (gmt 0)

Google spiders as often as it wants to. Our site gets hit pretty frequently, and some people may see visits nearly every day.

Other than that: yep, pretty much! Though don't confuse PR with SERPs. They're different things. PR being (very basically) a numeric representation of the value of the page determined by inbound links while Search Engine Rank Position would be the actual location of the page in any given search...interconnected, but completely different things.

geebee2

4:26 pm on Dec 4, 2003 (gmt 0)

I'm no expert, but it is worth understanding that there are various operations that a spidered page undergoes, and it seems that Google does not do all these at the same time.

(1) Page is fetched
(2) Page appears in 'cache'
(3) Page is analysed/indexed
(4) Pr of page is calculated (PR0 if duplicate content)
(5) Pr of page propagates to toolbar Pr servers

It isn't very clear what the timing is on all these, and there may be other complications, such as partial indexing, approximate estimation of Pr, etc.

George

Proust

10:57 am on Dec 5, 2003 (gmt 0)

Voila .. thanks for that, very much appreciated :)
but,
Where does 'the crawl' and 'the Google dance' stand in relation to 'spidering' 'indexing' and 'caching'
Whats the difference between 'crawling' and 'spidering'?

And just for my own curiosity: How on earth can Google 'cache' billions of internet pages?
Doesnt that mean they have a 'back-up' of the whole (spiderable) internet? Where do they keep it?

Thanks again,
Proust

jbinbpt

11:04 am on Dec 5, 2003 (gmt 0)

Hi Proust and Welcome to Webmaster World.

every 2 days (i think due to my meta <meta name="REVISIT-AFTER" content="2 DAYS">

Most here do not think that this meta tag or most meta tags do anything. I for one include them on the index page just in case. Perhaps your observation is accurate.

Spidering and crawling are the same function.

Good questons. Keep asking them.

geebee2

1:54 pm on Dec 5, 2003 (gmt 0)

> And just for my own curiosity: How on earth can Google 'cache' billions of internet pages?
Doesnt that mean they have a 'back-up' of the whole (spiderable) internet? Where do they keep it?

See

[www-db.stanford.edu...]

although that of course describes the original much smaller prototype, with about 24 million pages.

But yes, Google has huge numbers of machines with big disks.

Proust

11:09 am on Dec 11, 2003 (gmt 0)

Thanks again for answers and usefull link :)

Reading other threads on this board i'd be curious to know what the point is of doing searches on www2 and www3?
Do surfers take that route?
(Or is it 'a trick' to see which datacenters know what)

Can we say : it only matters what www.google serps?

And why is 'all this talk' called the "Florida update"?
What has Florida got to do with it? (Isnt it a Californian thing?)

Thank you