Forum Moderators: open

Message Too Old, No Replies

Page rank and googlebot crawl schedule

What's your Page rank and when will googlebot deep crawl your site?

         

latimer

6:58 pm on Jun 5, 2002 (gmt 0)

10+ Year Member



Last month it seemed the theory that less page rank puts you lower on googlebots spidering schedule was confirmed (for existing sites, new sites seem to get special attention). Also, that higher page rank will result in more of a large sites pages being crawled. Anyone care to share this months crawl dates on their sites, number of pages crawled and page rank? We are currently with a lowly page rank of 3 on the index and 0 on other levels and haven't seen googlebot yet.

indigojo

8:41 am on Jun 8, 2002 (gmt 0)

10+ Year Member



surely the ingredient that most determines how much a site gets crawled is high value informative content, just a thought

The Contractor

10:21 am on Jun 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jack,
<<You countered with assertions that large sites should not be indexed simply because they have lots of pages. And your posts carry a strong suggestion that you think that any site having a large number of pages must be spam. That, it seems to me, misses the point.>>

When did I state any of the above, use the word spam, and say google should not index sites that are large? Whew!

This thread started on PR vs Crawl.
I "am" saying that higher PR sites should be crawled first/deeper.
I "am" saying that if I dump news, public records, one paragraph of words taken from somewhere and call them a page, and any other information that is freely available elsewhere into a database, "yes" it is useful information - but "no" I should not be given special attention since I have made 100K pages of information that is available everywhere.
And "no" I do "not" believe Google should crawl my entire site if I do a data dump into my web.
Should google crawl every page of sites that use the DMOZ data dump?
Wouldn't that be kind of stupid of Google to replicate data over and over again?
I am "not" saying that because a site uses data found elsewhere it is useless information - I am saying Googles algo "should" recognize this for what it is.
One of my own sites is a directory type, do you realize how easily I could dump a lot of the DMOZ data into by useing my spider to crawl certain area's of DMOZ?
I could have almost guaranteed instant PR/success by doing so - I would rather build it up over the course of of a couple years or more and try to get some content that is available at very few if any other sites.
I don't believe in it - that is my OPINION ;)

P.S. I am not saying anyone in this threads website is a data-dump or that it should not be crawled completely by Google.

taxpod

12:20 pm on Jun 8, 2002 (gmt 0)

10+ Year Member



So everyone assumes that a site with 100K pages must either be spam or links? Why is that? I have a genealogy site and one very useful tool in conducting genealogy research is city directories from the late 1800s / early 1900s. I have many such books. I scan the pages and post them to my site as jpg's. One page might conatin the names of 100 people with their address, place of employment, and profession. It may include other items as well. One directory for one year might include 1,000 pages. So if you have 100 such directories, you have 100,000. And this is only one aspect of tthe site. So how is this links or spam?

Another feature of my site is cemetery transcriptions. I have 50,000 individuals listed, each with a photo of the headstone. I also have 10,000 obituaries, 100,000 census records, 20,000 marriages, 5,000 alumni of colleges and universities, 500,000 military records, etc. etc. I also have links to 100,000 sites.

So when you imply that Google shouldn't deep crawl sites with so many pages because they are somehow junk, I take offense to that. I'm a lousy HTML writer and I have self-taught what little I know but Google deep crawls my site because I do, in fact, have a lot of information that is useful. I'd like to have ten times as much data and will, god willing, in a few years.

I'm not somehow saying that I have a great site but I still think the deep crawl is deserved.

The Contractor

7:12 pm on Jun 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Taxpod,

First: Read my message above - no, I mean read it!

Second: Do not sticky me, say what you want to in public!

Third: I have NOT called your site or any other site SPAM, please feel free to quote me when you find those words in a message in this thread.

You said <<So when you imply that Google shouldn't deep crawl sites with so many pages because they are somehow junk, I take offense to that.>>

I never called a site "junk" that has a large # of pages, again - please feel free to quote me when you find those words in a message in this thread.

I think you need to read this entire thread (maybe print it out) before making accusations and putting words in my mouth or posts that simply are not there.

Have a great day ;)

world end soon

9:34 pm on Jun 8, 2002 (gmt 0)

10+ Year Member



Here comes a dumb question:

How do you guys know which of your pages are crawled by google? Do you put a counter/tracker on every page or do you have some fancy software?

I'll try not to ask too many like that one :)

rfgdxm1

11:45 pm on Jun 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Err...we know Googlebot came to our site when we see things like this in our website logs:

Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)

Kinda obvious. ;)

24bit

4:10 am on Jun 9, 2002 (gmt 0)

10+ Year Member



Wow, in the last week, I've been getting crawled just about every day. I've gotten crawled by crawl1.googlebot.com, 2, 3, 4, 5, 6, 7, 8, and 9!! All my main pages are PR4, and I'm expecting/hoping that they'll be PR5 in the next dance. I think it's possible that I'm getting so many visits from Googlebot because I was able to get my link onto the homepage of the major music site on the net which is a high PR7. I think Googlebot crawls that page all the time and then follows that link to my site.

ann

4:30 am on Jun 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only seen 2 signs of miss google picking up one each of my lesser pages but msn has been crawling my entire site for two days! I seem to rank the highest in msn for all my keywords and phrases.

I'm getting a little burned out with watching miss prima donna googlebot. I seem to please most every search engine out there but them....

I am of a good mind to delete my googlebar!

Ann

chaitan

6:43 am on Jun 9, 2002 (gmt 0)

10+ Year Member



It seem that googlebot is crawling my PR7 site everyday:
May/23: 167 pages
May/24: 94
May/25: 447
May/26: 100
May/27: 180
May/28: 201
May/29: 245
May/30: 178
May/31: 104
Jun/01: 991
Jun/02: 23291
Jun/03: 85390
Jun/04: 118938
Jun/05: 125
Jun/06: 117
Jun/07: 159
Jun/08: 2810

danny

8:32 am on Jun 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Doofus, your site has 100 000+ pages of useful content, no argument about that (it's a great site, been around for ages). But the problem for Google is how to distinguish useful content from junk ("spam") - and the method it has for doing that involves PR.

I agree the system is not perfect (it is, as you point out, biased against new sites), but what are the alternatives? Strong AI would be needed to distinguish "good" from "bad" intrinsically, and what extrinsic measures are there except link pop?

Danny.

Doofus

3:43 am on Jun 10, 2002 (gmt 0)



chaitan:

It looks like you have a combination of the regular monthly crawl and the so-called "fresh" crawl in those statistics. Is the "fresh" crawl getting the same pages from one day to the next?

With your PR7, it started getting serious on the monthly stuff on June 2. On my PR6 site, it started getting serious on June 5.

Thanks for your input. I don't think there's any doubt whatsoever that "deep" sites on the Web don't have a snowball's chance in hell of getting all their content in Google unless they have a healthy PageRank to begin with.

Case closed.

This 41 message thread spans 2 pages: 41