Forum Moderators: open

Message Too Old, No Replies

GoogleBot Question(s)

GoogleBot Question(s)

         

nullvalue

7:27 pm on Jun 18, 2003 (gmt 0)

10+ Year Member



Part 1:
I have a page that I will be using as my "crawler page", in it contains nearly 80,000 links to within my web site. The file is over 5 MB in size and i'm on a rather slow DSL line. My question is - will GoogleBot timeout if it take more that a minute or so to load the page? I have heard about people breaking their crawler pages into like 100 kb files. Is this neccessary?

Part 2:
The crawler page obviously just has links to other pages, how do I insure that the pages that are linked to from the crawler page get indexed, and the crawler page itself does not get indexed? Does that make sense? I just don't want to crawler page to show up on search results.

thanks!

aron hoekstra

[edited by: WebGuerrilla at 7:34 pm (utc) on June 18, 2003]
[edit reason] no urls please [/edit]

killroy

8:52 am on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First of, I got a 80k page site without ANY sitemap. google will automatically find more important pages sooner then less important pages, as that is suposed to be how the structure should be ideally. Furthermore, with a sitemap it becomes almost impossible to distribute PR nicely. On my site I have a most-visited lsit on the home page, automatically giving higher PR to these popular pages and bringing them high in the SERPS. With large, flat sitemaps you're simply gonna wash out your sites PR landscape and make it boring an unintuitive.

secondly, you'll ahve quite aserver drain if google grabs 100s of 100k+ pages every day, good luck with that.

I suggest you should FIRST think of a site structure WITHOUT sitemap and think if google and visitors get everywhere. THEN you can add a sitemap linking to certain subtopic pages, but don't link to all of them, after all google has a crawler, not jsut a page fetcher.

SN

mil2k

11:08 am on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't buy the PR argument either for how deep google will crawl.

Very interesting Marval and BigDave. I have read some very senior fellows agreeing with the PR argument. Think we need to start a New thread for this. :)

vitaplease

11:28 am on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Amount of links:

There is no fixed limit, IMO. Certainly not 100. Matt Cutts at Pubcon said he would have been happier with the mentioning 101 kb instead of 100 links in that Google guidance page. Googleguy reconfirmed the non-limit in an earlier thread as well.

It does not make sense either. Take a page with the periodic system, Google would have to discriminate between the elements?

However the number of links could have an effect on crawling preferences and countering spam-traps:

Breadth first and spam: slide 17 and limiting the number of links on a page:
[stanford.edu...]

Very high-Pageranked pages would be less likely to have spammy links if they place more than 100 links on their page, so the risk of spidering all those links would be less high.

Crawling patterns:

This is an older thread with some discussions and links to papers on crawling/Pagerank:
[webmasterworld.com...]

I do remember Ciml observing that of his various sites, the higher Pageranked got (deep)crawled first, but that was a while ago.

It makes sense to start (deep)crawling with lets say Yahoo.com and Dmoz.org
In effect that does follow some higher Pagerank first model.

Now with Freshness playing a bigger role with Fredbot, I would say pages getting fresh (new) inbound links would/should get some craling/preference as well.

Also: [webmasterworld.com...]

This 33 message thread spans 2 pages: 33