| 11:39 am on Oct 3, 2002 (gmt 0)|
We all know about the main deep crawling Googlebot, and we know that they've recently added a Freshbot that hangs around your site every few days, just waiting to index your new and exciting content (I take it personally of course. I believe the Google Freshbot genuinely enjoys reading my new content :)).
But I just discovered something brand new. I added a new page to my site on the 1st of this month. Hence, it was created and posted AFTER the main monthly index update.
However, this new page already has appeared in Google. It appears that the Freshbot has taken a fancy to it, and has added it in, even though it was never in Google originally.
Has anyone else seen this? Is this a recent skill added to the Freshbot?
| 11:51 am on Oct 3, 2002 (gmt 0)|
I can verify that, similar experience here. Seems as if the new page has to be linked from the index page for a fresh to happen.
| 11:53 am on Oct 3, 2002 (gmt 0)|
| 12:06 pm on Oct 3, 2002 (gmt 0)|
Mac, yes sounds like an everflux. Only difference (or change), is that everflux would only update pages that had been included in the dance. This is a new spin to everflux. Will follow from index to completely new pages.
| 12:24 pm on Oct 3, 2002 (gmt 0)|
Yes, it looks like Google's attempt at freshening up its index has been taken to a new level.
There's no greater satisfaction than adding a new page today, and having it listed in the top SERPS by tomorrow. Now that's service for you!
| 12:28 pm on Oct 3, 2002 (gmt 0)|
Google will even add a new site with a new domain with the fresh bot. Of course, there will be noe page rank, but it will rank in the SERPs.
| 12:30 pm on Oct 3, 2002 (gmt 0)|
Googlebot is in action on my site no, but its the good old Googlebot.
| 12:32 pm on Oct 3, 2002 (gmt 0)|
If my theory of freshness taking more weight in the google algo is correct then it would not surprise me to see google adding pages as it goes, keeping the guess pr and then recalculating that monthly.
Which google cralwer is the fresh one, I often see many crawl12, 4, 10 etc. But do we have any idea to which is fresh googlebot?
| 1:08 pm on Oct 3, 2002 (gmt 0)|
This is not something recent. For several months, Google will add new pages if they find them linked from pages they "fresh crawl" from. They will appear for around 3 days and then disappear to reappear again permanantly after the next monthly update. If it keeps on finding reocurrences of the new page from "fresh crawls" they may reappear again for 3 day periods before the next update, from all apperances here.
| 2:02 pm on Oct 3, 2002 (gmt 0)|
Someone said in a recent thread (can't find it now) that the fresh bot is 64.68.82.* and the deep-crawling bot is 216.239.46.* . (Thanks to whoever it was that posted the info!)
And that seems to jibe with what I'm seeing in my log files.
| 2:10 pm on Oct 3, 2002 (gmt 0)|
That would be warmasol in [webmasterworld.com...]
| 2:31 pm on Oct 3, 2002 (gmt 0)|
chiyo your right every 3-4 days my page get a fresh date and the links from that page will be updated but is then gone 3-4 days.
| 4:02 pm on Oct 3, 2002 (gmt 0)|
New site went live on Monday 30th Oct, the site was added to the dmoz directory on Tuesday and the entire site has been added to the google database today.
This is the fastest inclusion I have experienced, Probably down to the dmoz inclusion.
| 4:06 pm on Oct 3, 2002 (gmt 0)|
I don't know if this means anything but here is what I have noticed.
Last month during the fresh crawl they added my new site and it looked like this:
www.domain.net CACHED 9-22
(then this vanished in a few hours from the index until the dance)
With this fresh crawl I see:
www.domain.net (but it doesn't have CACHED "DATE" this time)Well, it did initally but not now.
(still indexed 2 days later)
I wonder if you see sites that don't have the CACHED "date" version, if they are here to stay?
| 4:10 pm on Oct 3, 2002 (gmt 0)|
I've been seeing the 'freshbot' adding new pages for some time. But they often get missed because they are not being looked for. ;) In my experience any 'freshbot' content - new pages or old - drops out after about 48hrs unless recrawled.
| 4:11 pm on Oct 3, 2002 (gmt 0)|
After a month, deep crawler came to my site last night. Just grabbed 2 pages and went away. Came back again this morning and grabbed another 2 pages. It is hitting my login page (when trying to access the secured pages) and going away from that. I do not want the bot to crawl my secure pages. From the index page, it just went to one of the inside link and got the login page.
Can you guys suggest anything about this behavior?
| 4:22 pm on Oct 3, 2002 (gmt 0)|
Google honors the robots.txt file. disallow the pages you don't want crawled.
| 10:24 pm on Oct 3, 2002 (gmt 0)|
I had a site drop out of Google this last update. The entire site is gray-barred... no penalty. On the morning of Sun. the 29th, three pages were back and by the afternoon about 27 pages were back, all with a Fresh tag of Sept. 28. By Monday the 30th, 43 pages where in the index, all with a Fresh tag of Sept. 28. No pages are showing in the Google cache and they all still have no PR. Today, they lost their Fresh tag, but they are still in the index. I'm hoping they stay in the index for the rest of the month, but I'm skeptical. Is there a chance?
| 6:20 pm on Oct 4, 2002 (gmt 0)|
I thought that new pages were added during an update?
I added a new page to a site at the weekend and now, I'm ranking No.1. for it's keyphrase with a date of wednesday this week...
er... can someone set me straight please, I appear to be working under some misconceptions here..?
| 6:55 pm on Oct 4, 2002 (gmt 0)|
| 6:58 pm on Oct 4, 2002 (gmt 0)|
I'd not considered it as (correct me if I'm wrong) that deals with indexed pages. ie. pages added during an update?
| 7:02 pm on Oct 4, 2002 (gmt 0)|
Check your logs for an ip that looks like this:
I believe this may be the spider which is indexing pages between update.
| 7:07 pm on Oct 4, 2002 (gmt 0)|
We have got new pages added many times during updates. If it is found linked from at least our home page during our daily fresh crawl, it appears for at least 2 to 3 days, searchable from the main index.
This most probably occurs for all other pages that are crawled daily too, but cant be absolutely sure as we have never really tested it, and most new pages are linked from our index page anyway making it hard to assess.
Our index page changes at least 5 times to maybe 10 times a week with new pages added.
| 7:12 pm on Oct 4, 2002 (gmt 0)|
|it appears for at least 2 to 3 days, searchable from the main index. |
Meaning it disappears after that?
I've been constantly crawled (everyday) for well over a month now... Good sign?
It's PR5, I thought that I would just get the standard 1-3 crawls a month?
...and yes Quinn, it is that IP: Significance?
| 7:20 pm on Oct 4, 2002 (gmt 0)|
Sorry Nick_W, I should have elaborated a bit....
This thread can probably explain as this better than I.
| 7:25 pm on Oct 4, 2002 (gmt 0)|
Right, got it! THanks. Check out my next thread ;)
| 9:40 am on Oct 5, 2002 (gmt 0)|
|It is hitting my login page (when trying to access the secured pages) and going away from that. I do not want the bot to crawl my secure pages |
QNetwork, googlebot is clearly trying to hack into your site ;). Before you know it, your private content will be duplicated on Googles main site ;)
| 11:23 am on Oct 5, 2002 (gmt 0)|
One thing I've noticed, though I haven't been able to do enough research to confirm it, so it's still a THEORY, at this point:
FreshBot prefers NEW over Updated.
Now, that's not to say that it doesn't like "updated" pages. It just seems that the Updated Page needs to have good PR and good incoming links (I guess those two are almost one and the same).
Here's how I come to this conclusion:
My site has millions of pages - about 25K of them are in the main index. When a page is added or updated, it gets slapped onto a "New and Updated" page and my robots text (and internal linking structure) tries to guide all the web spiders to use these pages to seed their index of the site rather than my Alphabetical Listings. It's been my hope that, since nothing will crawl my entire site, at least the new and pertinent stuff will get into the index. (With google, this plan never worked until 6 weeks or so ago when the "minty freshness" factor became much more prevalant - in the past it would hit the front page and wander around my site in a manner I couldn't possibly guess. It made the Bull in a China Shop look like it had a plan).
As Freshbot goes through the "new and updated" listings, usually all of the NEW pages are slapped into the index within a few days (and even if they don't get crawled again, they have been STAYING in the index until the next dance). The updated pages, though, have only about a 1 in 8 or 1 in 10 chance of getting put up at Google. Some do, some don't.
Now, that won't seem odd to many, but for someone with a big site, I ask - How does it know if it's new or updated if the page wasn't in the index before? I've got roughly 2 million pages that aren't in the index now (half of them are new as I just added a whole bunch of new features, but a million of them have been there for ages and most have never been seen by a living soul, not to mention the Googlebot). So, if I update a page and it's never been crawled, how does google know to call it a "new" page or an "updated" page? The only thing I can figure is that my pages all have an "?ID=#" to determine what the page is. Is the googlebot logging the highest "#" it came across so that it can tell what's new? <shrug> Who Knows?
| 11:49 am on Oct 5, 2002 (gmt 0)|
Based on what I see, you have 22,800 pages index in Google, so I'd say you have your fair share of Google's index :)
| This 94 message thread spans 4 pages: 94 (  2 3 4 ) > > |