Welcome to WebmasterWorld Guest from 22.214.171.124
Is this normal for a page to be indexed so quickly...and so well?
on a popular SEO forum
That's a big part of the picture.
Pretty amazing bit of mind reading too on behalf of Google as robots.txt here disallows all bots...
Just for the record, this statement is not true. It allows Jeeves, Slurp, Googlebot and MSNbot.
The useful part of robots.txt here reads:
I don't want to hijack this thread, but I also don't want anyone to be miguided. If you really want to see what Brett is doing with robots.txt you need to actually read the first paragraph of that file. ;)
Let's be clear here - I am fine in principle with webmaster deciding which bots he wants to crawl his site or not, I am all for obeying robots.txt, however in this case the robots.txt states that no bot is allowed to crawl this site. Therefore if Google crawls this forum (which it does), then they are either mind readers OR robots.txt shows different content depending on IP/user-agent (known as cloaking) OR there is some other agreement with Google.
I haven't seen that correlation lately. PR doesn't seem to have much to do with it. Universal search seems to play a big part. I can make a blog post on a low PR site and see it in the index in minutes.
PageRank has absolutely nothing to do with query relevance. Google has struggled with that fact for some time now. Now it seems they've figured out that PR shouldn't have anything to do with indexing speed either.
I think they took a page out of Clinton's playbook and they now have a sign displayed prominently at the 'Plex. It reads;
It's the relevance, stupid!
I can make a blog post on a low PR site and see it in the index in minutes.
Blogs use approach which is effectively "direct submission" of data into search - this data does not need to be crawled to be found out, that's why Google and others can index it quickly. Perhaps this forum also uses same approach.
A low-PageRank URL that's trusted could get updated up to several times a day ( but at least daily, once every two days ). While a high-PageRank page on a not so trusted domain might not see Googlebot for days, sometimes even weeks.
Of course there are many other factors to this, for example orphaned pages get checked with about the same frequency as a 404 URL even on the MOST trusted domains, and new high profile IBLs will provide a temp boost. But TurstRank x PageRank decides speed for the most part.
Also note that Trust is usually more or less parallel to PageRank for *any* family-safe / news related topic, especially if the theme has an academic wing on the net. Stuff that has more to its academic side usually has more, stuff that's on the shady side usually has way less trust than its PageRank would indicate ( even if it's white-hat all-over ).
3 characters less and a whole paragraph can lose its meaning *grin*[/edit]
[edited by: Miamacs at 2:28 pm (utc) on Jan. 15, 2008]
[edited by: Robert_Charlton at 7:18 pm (utc) on April 12, 2008]
[edit reason] removed specifics [/edit]
What is interesting to me though is that Google.COM is far more "dynamic" or seems far faster to index than a regional google. Why is this=?
More in detail, to be clear: we are talking about results which make it to the top ten on google.com in 8 to 10 minutes from my blog postings. The main site has a PR of 5 and the blog is on a sub-domain of this site but is too new to have a toolbar pr --- though logic would say that it had a pr of 4 pending. The fact that it has "zero" sandbox and close to real time results is something amazing.
Last thing about this: it is NOT related to content being "family" content. We... have lots of comments about adult related matters and post using a lot of tags which is almost black hat.
[edited by: Robert_Charlton at 7:46 pm (utc) on April 12, 2008]
[edit reason] consolidated several posts [/edit]