Page is a not externally linkable
Grumpus - 11:23 am on Oct 5, 2002 (gmt 0)
FreshBot prefers NEW over Updated. Now, that's not to say that it doesn't like "updated" pages. It just seems that the Updated Page needs to have good PR and good incoming links (I guess those two are almost one and the same). Here's how I come to this conclusion: My site has millions of pages - about 25K of them are in the main index. When a page is added or updated, it gets slapped onto a "New and Updated" page and my robots text (and internal linking structure) tries to guide all the web spiders to use these pages to seed their index of the site rather than my Alphabetical Listings. It's been my hope that, since nothing will crawl my entire site, at least the new and pertinent stuff will get into the index. (With google, this plan never worked until 6 weeks or so ago when the "minty freshness" factor became much more prevalant - in the past it would hit the front page and wander around my site in a manner I couldn't possibly guess. It made the Bull in a China Shop look like it had a plan). As Freshbot goes through the "new and updated" listings, usually all of the NEW pages are slapped into the index within a few days (and even if they don't get crawled again, they have been STAYING in the index until the next dance). The updated pages, though, have only about a 1 in 8 or 1 in 10 chance of getting put up at Google. Some do, some don't. Now, that won't seem odd to many, but for someone with a big site, I ask - How does it know if it's new or updated if the page wasn't in the index before? I've got roughly 2 million pages that aren't in the index now (half of them are new as I just added a whole bunch of new features, but a million of them have been there for ages and most have never been seen by a living soul, not to mention the Googlebot). So, if I update a page and it's never been crawled, how does google know to call it a "new" page or an "updated" page? The only thing I can figure is that my pages all have an "?ID=#" to determine what the page is. Is the googlebot logging the highest "#" it came across so that it can tell what's new? <shrug> Who Knows? G.
One thing I've noticed, though I haven't been able to do enough research to confirm it, so it's still a THEORY, at this point: