Forum Moderators: open
Not working for me. Did Google feel this revealed too much?
bummer
Incidentally, I don't think it's crawl order either. If it was then the results would be quite similar to PageRank order.
News: Michael Jackson Loses $5.3 Million Jury Verdict - Reuters - 5 minutes ago
Shortfall in Ont. power fund to hit $300M - Toronto Star - 5 minutes ago
New AIDS drug spurs anxiety, anger, hope - San Jose Mercury News - 8 minutes ago
Try Google News: Search news for -link:mydomain.com or browse the latest headlines
Appeared when searching for -link:mydomain.com
If you issue the same "link:" search 10 times in a row for a given domain, it gives the same list, with the same ordering.
That doesn't even look like random ;)
Doesn't work for me. How do you do it?
Google has its own checksum for every URL in their database. If you're using the toolbar, you'll find out by looking in your temporary internet files. For each URL accessed, there's a search? file in there and the checksum is one of the parameters "ch=xxxxxxxx" providing the page has some PR.
If I were Google, I would simply order the "link:domain.tld" results by checksum, as this is "ready to use" data that they don't need to compute more than once.
If this is not the case, I guess I deserve a reward from Google for that "brilliant idea" - LOL
Dan
Incidentally, I don't think it's crawl order either. If it was then the results would be quite similar to PageRank order.
I'm not arguing that it *is* crawl order, but it certainly could be and still not look like it is pagerank order.
As I understand it, Google starts each crawl with seed sites. These are PR10 directory sites. DMOZ, Yahoo! and Google. They put those three home pages in the queue and crawl them, adding all their links to the queue.
Next it runs through those new pages, doing the same thing. Those pages will most likely be all PR9s and PR10s. And while in the directories, it should continue the same way.
But some of those directory pages have 15 links and some have 150. A branch of the directory with few links would enter the queue at the same rate as a branch with thousands.
Once you get outside the directories it becomes even more complicated. Consider a magazine that links out of DMOZ very early. With all their articles archived, you might end up with a PR4 page within 3 levels of links from what will eventually be a PR9 magazine.
There will also be cases where a site has high PR, but it gets most of it from low PR pages. If "Joe Bob's Mailing List Archive" software puts a link on the bottom of every archive page, and it is used a lot, it might get the equivalent of a PR7 just from all the PR1 and PR2 links that are archive pages, even if it's best incoming link is only a PR4.
It is safest to say that the pages are crawled relative to the amount of PR that they receive from the PR10 seed pages. Crawling based on the previous month's PR just seems like a lot of extra processing to me.