| 11:49 pm on Jan 14, 2008 (gmt 0)|
I can tell you that it happens a lot here on WebmasterWorld. I see a new question, try to research it a bit, and instead of an answer there's the WebmasterWorld new post right on the first page. It happens more commonly on 4 or more word searches and not so much on commercial, competitive search.
That's a big part of the picture.
| 1:06 am on Jan 15, 2008 (gmt 0)|
PR = carwling/indexing speed, it's simple.
| 1:17 am on Jan 15, 2008 (gmt 0)|
|I can tell you that it happens a lot here on WebmasterWorld. |
Pretty amazing bit of mind reading too on behalf of Google as robots.txt here disallows all bots...
| 1:54 am on Jan 15, 2008 (gmt 0)|
Google does the better than any other search engine. When our site was placed in the Yahoo directory, it took a day for Google to show the page. MSN took nearly three weeks.
|Pretty amazing bit of mind reading too on behalf of Google as robots.txt here disallows all bots... |
Just for the record, this statement is not true. It allows Jeeves, Slurp, Googlebot and MSNbot.
| 2:19 am on Jan 15, 2008 (gmt 0)|
The useful part of robots.txt here reads:
Bizarre decision to cloak robots.txt which does not do anything about bots that disregard it anyway.
| 1:14 pm on Jan 15, 2008 (gmt 0)|
|The useful part of robots.txt here reads: |
I don't want to hijack this thread, but I also don't want anyone to be miguided. If you really want to see what Brett is doing with robots.txt you need to actually read the first paragraph of that file. ;)
| 1:29 pm on Jan 15, 2008 (gmt 0)|
I am not posting misleading information - anyone can check robots.txt of this site to see that all crawling is disallowed. There are references to some other "real" robots.txt's that should not be taken into account by any good bot - these fields are comments and their creative use by Brett is amusing, but it has no bearing on robots.txt standard.
Let's be clear here - I am fine in principle with webmaster deciding which bots he wants to crawl his site or not, I am all for obeying robots.txt, however in this case the robots.txt states that no bot is allowed to crawl this site. Therefore if Google crawls this forum (which it does), then they are either mind readers OR robots.txt shows different content depending on IP/user-agent (known as cloaking) OR there is some other agreement with Google.
| 1:44 pm on Jan 15, 2008 (gmt 0)|
>> PR = crawling/indexing speed, it's simple.
I haven't seen that correlation lately. PR doesn't seem to have much to do with it. Universal search seems to play a big part. I can make a blog post on a low PR site and see it in the index in minutes.
PageRank has absolutely nothing to do with query relevance. Google has struggled with that fact for some time now. Now it seems they've figured out that PR shouldn't have anything to do with indexing speed either.
I think they took a page out of Clinton's playbook and they now have a sign displayed prominently at the 'Plex. It reads;
|It's the relevance, stupid! |
| 1:57 pm on Jan 15, 2008 (gmt 0)|
|I can make a blog post on a low PR site and see it in the index in minutes. |
Blogs use approach which is effectively "direct submission" of data into search - this data does not need to be crawled to be found out, that's why Google and others can index it quickly. Perhaps this forum also uses same approach.
| 2:08 pm on Jan 15, 2008 (gmt 0)|
Doesn't matter whether it's a blog or not. On sites I update frequently, I see the results in the index in under an hour. Even on new sites that haven't been assigned any inaccurate, green chunks of graphics.
| 2:21 pm on Jan 15, 2008 (gmt 0)|
Perhaps the real PR of those sites is high, but it is not shown to be? I have no doubt that some sites are checked at least hourly - I'd expect this site to be one of those.
| 2:26 pm on Jan 15, 2008 (gmt 0)|
In general /if site is not featured in Universal search-like 'verticals', for eg. news (inc. forums), blogs, video(youtube, metacafe), etc./ TrustRank decides crawl and cache speeds.
A low-PageRank URL that's trusted could get updated up to several times a day ( but at least daily, once every two days ). While a high-PageRank page on a not so trusted domain might not see Googlebot for days, sometimes even weeks.
Of course there are many other factors to this, for example orphaned pages get checked with about the same frequency as a 404 URL even on the MOST trusted domains, and new high profile IBLs will provide a temp boost. But TurstRank x PageRank decides speed for the most part.
Also note that Trust is usually more or less parallel to PageRank for *any* family-safe / news related topic, especially if the theme has an academic wing on the net. Stuff that has more to its academic side usually has more, stuff that's on the shady side usually has way less trust than its PageRank would indicate ( even if it's white-hat all-over ).
3 characters less and a whole paragraph can lose its meaning *grin*[/edit]
[edited by: Miamacs at 2:28 pm (utc) on Jan. 15, 2008]
| 2:36 pm on Jan 15, 2008 (gmt 0)|
Miamacs - this sounds like a reasonable approach, ie: combination of TrustRank and (real) PageRank.
| 5:28 pm on Jan 15, 2008 (gmt 0)|
We have recently rebuilt our companies website using the same domian which has low page rank. but gets indexed ridiculously fast. whenever i add new content it gets index very fast and ranks well for keywords
| 6:37 pm on Apr 12, 2008 (gmt 0)|
System: The following message was spliced on to this thread from: http://www.webmasterworld.com/google/3625438.htm [webmasterworld.com] by robert_charlton - 11:20 am on April 12, 2008 (PST -8)
I am getting Google.COM results in 8 minutes for <a keyword phrase> from a blog which I running on a 10 year old site. Is this close to real time something people know about?
[edited by: Robert_Charlton at 7:18 pm (utc) on April 12, 2008]
[edit reason] removed specifics [/edit]
| 7:26 pm on Apr 12, 2008 (gmt 0)|
PR does NOT have anything to do with crawling speed in my view. What counts is the age of the site.
What is interesting to me though is that Google.COM is far more "dynamic" or seems far faster to index than a regional google. Why is this=?
More in detail, to be clear: we are talking about results which make it to the top ten on google.com in 8 to 10 minutes from my blog postings. The main site has a PR of 5 and the blog is on a sub-domain of this site but is too new to have a toolbar pr --- though logic would say that it had a pr of 4 pending. The fact that it has "zero" sandbox and close to real time results is something amazing.
Last thing about this: it is NOT related to content being "family" content. We... have lots of comments about adult related matters and post using a lot of tags which is almost black hat.
[edited by: Robert_Charlton at 7:46 pm (utc) on April 12, 2008]
[edit reason] consolidated several posts [/edit]
| 1:37 pm on Apr 13, 2008 (gmt 0)|
PR of course is still the main factor for crawling speed, this has been proven many many times. Not only for crawling speed but also for the number of URLs crawled.
[edited by: SEOPTI at 1:39 pm (utc) on April 13, 2008]