Forum Moderators: open
Part 2:
The crawler page obviously just has links to other pages, how do I insure that the pages that are linked to from the crawler page get indexed, and the crawler page itself does not get indexed? Does that make sense? I just don't want to crawler page to show up on search results.
thanks!
aron hoekstra
[edited by: WebGuerrilla at 7:34 pm (utc) on June 18, 2003]
[edit reason] no urls please [/edit]
secondly, you'll ahve quite aserver drain if google grabs 100s of 100k+ pages every day, good luck with that.
I suggest you should FIRST think of a site structure WITHOUT sitemap and think if google and visitors get everywhere. THEN you can add a sitemap linking to certain subtopic pages, but don't link to all of them, after all google has a crawler, not jsut a page fetcher.
SN
There is no fixed limit, IMO. Certainly not 100. Matt Cutts at Pubcon said he would have been happier with the mentioning 101 kb instead of 100 links in that Google guidance page. Googleguy reconfirmed the non-limit in an earlier thread as well.
It does not make sense either. Take a page with the periodic system, Google would have to discriminate between the elements?
However the number of links could have an effect on crawling preferences and countering spam-traps:
Breadth first and spam: slide 17 and limiting the number of links on a page:
[stanford.edu...]
Very high-Pageranked pages would be less likely to have spammy links if they place more than 100 links on their page, so the risk of spidering all those links would be less high.
Crawling patterns:
This is an older thread with some discussions and links to papers on crawling/Pagerank:
[webmasterworld.com...]
I do remember Ciml observing that of his various sites, the higher Pageranked got (deep)crawled first, but that was a while ago.
It makes sense to start (deep)crawling with lets say Yahoo.com and Dmoz.org
In effect that does follow some higher Pagerank first model.
Now with Freshness playing a bigger role with Fredbot, I would say pages getting fresh (new) inbound links would/should get some craling/preference as well.
Also: [webmasterworld.com...]