Forum Moderators: open

Message Too Old, No Replies

Veni, vidi... fugit.

The PR barometer and crawling

         

Josefu

8:59 am on Aug 21, 2003 (gmt 0)

10+ Year Member



...I've read in many places here that Google sends it's little automated arachnoid quite often once a site has passed a certain 'in the clover' level of 'Google quality' - I understand that the latter means content and incoming links. My question is: what exactly is that level, how good the content and how many the links? I am a bit confused because after getting quite a few solid links from mid-PR sites, mister googlebot just grabbed (once again) my robots.txt and index page.

This is not whining, I would really like to know even if I have to bite the bullet and make changes. I do not count on Google alone, but it sure would help : )

Perplexed

12:07 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



My understanding ( which is usually wrong ) was that the frequency of visit did not rely so much on how much was on the site, or the number of backlinks, but on the frequency that you add new pages, or change old ones. It is sort of like - the harder you work, the harder googlebot will work.

Josefu

12:26 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



Hee hee hee - well at this rate I'll have to change my index page every day - that's all mister googlebot will look at. Perhaps I can hope for a second visit, followed m~a~y~b~e by a deeper crawl, through the URL of one of my newly-established backlinks?

dmorison

1:40 pm on Aug 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My understanding ( which is usually wrong ) was that the frequency of visit did not rely so much on how much was on the site, or the number of backlinks, but on the frequency that you add new pages, or change old ones. It is sort of like - the harder you work, the harder googlebot will work.

This raises an interesting point regarding the opportunity for a new site that changes frequently to get indexed at all.

Given:

* A search engine only has so many resources, and has to decide how to allocate those resources.

and

* The results returned by a search engine have to be relevant; otherwise users will lose confidence.

Therefore:

A new site that changes every 5 minutes may never get indexed because it has not earned the call of the crawler every 5 minutes; and because it changes so often the search engine cannot guarantee relevancy to a users' query.

Conclusion:

If you're a new site that is changing frequently; at least have a relevant and static front page that does not.


As an aside; there seems to be some mis-understanding at to what warrants a frequent visitation / re-index by a search engine spider.

The view that "the more important a search engine considers your page the more frequently their spider will visit" is way too simplistic.

Firstly, regardless of your considered importance, if your content doesn't change there is absolutely no point in re-indexing your page every 5 minutes.

The Privacy Statement page at microsoft.com has a displayed Google PageRank of 9; but I doubt that Googlebot visits any more than once a month, if that. Why not? No need; waste of resources. Doesn't detract from its considered importance though and a justified PageRank of 9.

Away from the lofty heights of PageRank 9 (of which most of us here only dream), a search engine still has to manage resources. Regardless of what a search engine thinks of a page (ala PageRank), there is no point in re-crawling a page that doesn't change very often.

Now given that a search engine's results have to be relevant; the best use of your spiders' resources is to try and visit pages with about the same frequency with which changes are made.

This is easy for a spider to "learn". You visit for the first time, and then come back in, say, a week. If changes haven't been made then you increase the time before your next visit. If changes have been made then you might decrease your time before the next visit.

By repeating this simple algorithm indefinitely (incrementing and decrementing the time between visits) the spider will after a few visits be just about in sync with the frequency at which the page updates.

Most efficient use of resources and almost guaranteed relevant results for your users.

[edited by: dmorison at 2:02 pm (utc) on Aug. 21, 2003]

Josefu

1:57 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



Thanks for that. Could be a problem for my site because I had a front 'splash' page and got rid of it for a 'real' page that presented a user a welcome info message and a choice whether to profit from the 'whizz' fulscreen mode my site offers or to proceed in normal 'chromed' mode. Before the 'splash' intro would call up a new fullscreen window - which would be blocked by all users using a popup blocker : )

Perhaps I've traded the good of my site for the good of my users - at least until Google visits again next month.

[added]

your addendum was as informative as the body as your message but it appeared while I was writing; I didn't get it the first time around so I'll add this in one of my own. Thanks.

Josefu

2:10 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



...wait, alright, this brings up another dumb question. My wife has a web Journal that she adds to every day. Positive?

rogerd

2:16 pm on Aug 21, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hi, Josefu, nice Latin... I think your wife's blog should be fine, since you have a home page that changes daily and a larger quantity of static content.

Josefu

2:48 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



...hehe - actually I screwed up the latin, it should be 'fugi' as in 'I ran away' instead of 'fugit' (as in 'tempis fugit' (time flies)) 'it runs away'...

What do you mean by 'should be fine'? Thanks for the reassurance all the same : )

Yet another dumb question: I read (here, from a senior member) someone saying that they could have freshbot 'on (someone else's) site in under an hour' - what was THAT all about? Sounds enticing but one must be careful...

amazed

3:21 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



my impressions from watching googlebot on my site:

googlebot is working two shifts:

1) reindexing changed pages: seems to be a continuous process now and - my impression, nothing else - appears to be driven by google's index and not by anything I do. Maybe I am wrong but googlebot does not grab all the pages I change but seems to follow its own route.

2) adding new pages to the index: frequency depends on page-rank, so new sites will have to wait for a while i.e. maybe one month and getting as many incoming links as possible will help.

taxpod

3:47 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



I really do not think the frequency with pages change has anything to do with Googlebot behavior. I have a few sites which get heavily spidered. One has pages which have added information and another has static pages which almost never change. They both get spidered in a like manner. A few other of my sites change frequently or not but have low PR and few inbound links. Those sites get very little spidering regardless of how frequently they are updated.

So there must be another answer.

glengara

3:50 pm on Aug 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seems a good time to remind members of this thread on setting up "if modified since".....
[webmasterworld.com...]

Josefu

9:32 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



...well, the 'have I updated and does it change how deep he goes' question isn't really like my 'I'm a virgin and waiting to be deflowered, what will bring my desired suitor' problem : )

MonkeeSage

10:22 pm on Aug 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was gonna ask something but I fugit (forget). ;)

Actually, I know what it was...how would Google know how often you content changed unless it came and looked as often as you changed it? Seems kinda paradoxical...mabye that's what they have two bots for...but someone recently said that they don't have two anymore or something to that effect. "My brain hurts, my brain hurts, my brains hursts...today."

Jordan

Josefu

6:11 am on Aug 22, 2003 (gmt 0)

10+ Year Member



I must have a talent for asking the unanswerable. First the 'will google read the <noembed> tag' thingy and now this.

Perhaps if incoming links bring Googlebot to my door more than once in a web crawl, he'll send a message to 'DeepGoogle' to pay me a visit. (Oooo! Careful!)

Perplexed

6:23 am on Aug 22, 2003 (gmt 0)

10+ Year Member



"This is easy for a spider to "learn". You visit for the first time, and then come back in, say, a week. If changes haven't been made then you increase the time before your next visit. If changes have been made then you might decrease your time before the next visit."

I think that summed up my thoughts fairly well, but as I said, I could be wrong. Thinking about it, I doubt that that is the only factor. Perhaps it is one of many, including the frequency of backlink additions.

My logs do seem to indicate that the more changes I make recently the more I get visited. but I have also been adding backlinks.

Perhaps googlebot has "flavours of the month"!