Forum Moderators: open
They are not just crawling and updating but verifying their database. If not we wouldn't be seeing the error pages from "bots" spidering a site. Both these bots (slurp and freshbot) are working from a current listing of some sort.
Anne
Does anyone have a PR6 or above page that doesn't get fresh updates regularly?Yes. I've rearranged major amounts of content, but no fresh tags for this site.
It would be especially interesting to see if a PR6 page that hasn't been changed for months gets updated by freshie.On another site, I've made tweaks, and seen the fresh tag within 24-36 hours.
What's the difference between the two? The one that get's freshened up receives more traffic. That's probably not the reason, though. I'm trying to think of more reasons.
[edited by: martinibuster at 5:09 am (utc) on April 1, 2003]
Is there logic to this?
If I were Google I would Fresh crawl:
1. The highest Pageranked pages within a site - regardless of change of content, because these pages most likely will eventually contain a link to a really new internal page. (how else find really fresh content if you do not fresh crawl important pages that could contain new links?).
2. Pages that look like sitemaps and active directories (for the obvious reasons)
3. Pages to which new inbound links occur, because people are voting there is something new/topical going on there (hence - also the interest for real time blogging links).
4. Pages which rank in the top 200 Google SERPS for search queries that suddenly show a "burst" [webmasterworld.com] in usage. (A Zeitgeist like database, only much larger and more immediate)
This is the big one. Remember, the purpose of freshbot is to find new pages. That they update the cache of old ones isn't the main idea. Makes sense that a webmaster will add links to new content from the home page, or other high traffic pages. Particularly if the webmaster considers it important, as Google would want to find the most important new pages.
Pages which rank in the top 200 Google SERPS for search queries that suddenly show a "burst" in usage. (A Zeitgeist like database, only much larger and more immediate)
Hmm... that makes a lot of sense, however, I assume that this relates to people who are searching for your product name, or site name.
I've wondered about this, but wouldn't the new link represent a change in content to Googlebot anyway? Or would that be different with dynamic setup or content? I'm not familiar with that.
I would guess the Freshbot does a more or less daily check for the highest Pageranked pages of very high Pageranked sites.
The lower the Pagerank of your site and the longer you take to not add new links, the bigger the chance Freshbot will decide to go look elsewhere more often (unless others keep linking to your important pages recently - remember Google has logged the datestamp of every link since 2000 or so).
But I believe point 3 works in conjunction with point 1. I used to have a site that was Main page PR 5 and it used to get fresh botten for all the PR5 & PR4 pages. Now it gets freshbotted for all its PR7 & 6 pages. HOWEVER, there is a 2nd site, which is now PR5 and it's PR4 pages do not get freshbotted. I believe this is becuase it's inside pages have no links coming to it from other websites, whereas in site no. 1 there are links coming to many of it's level 2 pages.
The question I want to ask is how to get level 3 pages freshbotted?
Do any of you have level 3 pages freshbotted?
Vitaplease - 1. The highest Pageranked pages within a site ... because these pages most likely will eventually contain a link to a really new internal page..fresh content if you do not fresh crawl important pages that could contain new links
They even have a paper on the subject
Effective URL Ordering [www-db.stanford.edu]
A related one based on pagerank
Figure 1 shows the average PageRank of all pages down-
loaded on each day of the crawl. The average score for pages
crawled on the 1st day is 7.04, more than three times the av-
erage score of 2.07 for pages crawled on the second day. The
average score tapers from there down to 1.08 after the rst
week, 0.84 after the second week, and 0.59 after the fourth
week.
Breadth First Search Crawling Yields HighQuality Pages [citeseer.nj.nec.com]
I'm sure some of you have read it :-), good to know that they wrote about it too....
I think your .pdf print library must be as overcrowded as mine!
The question I want to ask is how to get level 3 pages freshbotted?Do any of you have level 3 pages freshbotted?
Namaste,
IMO, Its not the level thats important for fresh-crawling. What I find is that one link from my index page or another highlevel internal Pageranked page is enough to get the new page freshcrawled and indexed.
I believe there are basically two forms of Fresh pages [webmasterworld.com](msg#:8), those that get checked for new links to new pages (the highest level Pageranked site internal pages (points 1/2) and the really new content/page Fresh pages(3).
PS. Do a Google search for "feedback" and notice how many pages have the Freshtag. Not that these "feedback-form" pages get changed/updated a lot, they just get a lot of new links (internal in this case) towards them on a continous basis (and/or probably are one of the highest Pageranked within the site).
Also mentioned here: [webmasterworld.com]msg128
The obvious answer you are not looking for would be to get an external page with fresh listing to link to your internal page.
Maybe someone else has tried this, otherwise worth a trial, give your third level PR5 page a link from your index or second/third best page for only two days and see if it gets fresh indexed until the update.
although not very suprising:
A page's previous change rate is a good predictor of it's future change rate.
from Rubble's posting on Microsoft related papers: [webmasterworld.com...]
4. Pages which rank in the top 200 Google SERPS for search queries that suddenly show a "burst" in usage. (A Zeitgeist like database, only much larger and more immediate)
Would google look for the word bursts to find the news or find the news and examine the word bursts? ;)
I've seen (and read others) mention competitive terms and how the SERP's are more adaptive, sounds like something like that could be going on.
Care to elaborate Vitaplease? :)
Extra freshness for this high-spam-reporting/competitive industry/high search volume/high dollar areas?
Would google look for the word bursts to find the news or find the news and examine the word bursts? ;)
Chicken and egg again.
I'd say journalists are generally slower in grasping (non-news) popularity trends than searchers.
I'd say searchers are generally slower in grasping trends than bloggers.
Google can select their word bursts for all three now.
high dollar areas of keywords?
They could randomise results to show the site owners the advantage of ranking high for a moment - hence advertising adwords afterwords. IMO, that would lead to even more SEO-type spam for regular results which google does not want.
The question I want to ask is how to get level 3 pages freshbotted?
Some of my third level pages get fresh tags and some don't. All the third level pages I've checked so far are PR5 linked from PR5 level two pages. The difference seems to be that the third level pages that have even 1 outside link have fresh tags from yesterday. Those that are all internal links don't.
I don't have time right now to check enough to be really sure but I did check several.
Has anyone else noticed this?
Anne