Forum Moderators: open
[webmasterworld.com...]
Now to work on more incoming links!
How does Google acctually index a site, I understand there are multiple bots that perform diff tasks, find links, index the page...etc. Is there any documentation on this. what is the order of occurance by each bot?
I keep having to ban and unban that IP!
Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)
on IP 66.249.66.51
Personally, I think that google has finally developed a system that has overcome the space limitations of their previous version, and have now begun a full crawl using a newly developed crawler (that attempts to evaluate the speed/capacity of a site's server on the fly for maximum indesing speed) in earnest to rebuild their entire index from the ground up.
I think in the next 3-6 weeks there will be both a MAJOR update, as well as an release by google saying "now searching XXX billion and/or trillion pages".
[END SPECULATION]
-- Rich
One of our sites was getting hit hard and fast by G. It seemed to build throughout October. Then on 11/1 it all but stopped. 11/1 G requested about 3% of what it was on 10/31. Ok, now the embarassing part - I made a small (really small) change to the index page on 10/31 that cause it not to validate. Could the lack of the code validating inhibit G and the other bots?
I run a search engine, and besides your various algo's, the two most important things are the size of your index, and its freshness. If I were google, that's the first thing I'd throw money at if I had spare cash and was worried about Microsoft and Yahoo on my heels. I'd upgrade my crawler farm and add massive capacity to the servers that carry my indices. Then I'd crawl as hard and as deep as possible.
And I'd make sure I have a small team lurking on the boards checking if webmasters start squeaking about bandwidth and load - as above. I may even contact the owner of the board, and call in a favor to bump the 'Google hits' discussion to the home page. ;)
m.