Welcome to WebmasterWorld Guest from 54.167.110.211

Forum Moderators: open

Message Too Old, No Replies

Everflux - October 2002 mid-month Google spidering and minor updates

Tracking Google activity between regular monthly updates

     
7:28 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


Every month we have a lot of reports and questions about mid-month additions and changes in the Google index. Last month we started consolidating them into one thread so all the information would be easier to find and be especially helpful to those new to Google or our Google News forum.

You can refer to September's Everflux / mid-month changes [webmasterworld.com] to see the previous pattern, which is one of the primary reasons we're trying to keep them all together.

To recap:

1.The Monthly Update
There's one major Google update per month, starting somewhere between the 20th and the 28th. It generally takes several days to settle, at which time the data migrates to Google's partner sites.

2. The Monthly Crawl
The major crawl takes place at right around this time. With some it's been reported to start at the same time or just before the update begins, while others are not crawled until afterward. This continues for several days, and there's been some conjecture that PR might possibly have something to do with the timing.

3. Mid-month activity, referred to as Everflux.
All during the month between updates there has been some spidering, sites added, and minor position changes. This is related to Google keeping their search results "minty fresh" but while minor ranking changes occur, this does not affect the entire index.

It's worked well, so we'll continue to do this now, in October. This is where we can post what we experience and observe, to compare and discuss it among ourselves. We'll again confine it all to one thread so it's simpler to keep track of the information and observations we share that are so important to us all.

Note: We'll be moving some threads onto this one, as we did last month, as well as for the update itself.

We sincerely thank you for your contributions to our collective knowledge and understanding and wish you all good luck with your sites.

11:39 am on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 16, 2002
posts:53
votes: 0


We all know about the main deep crawling Googlebot, and we know that they've recently added a Freshbot that hangs around your site every few days, just waiting to index your new and exciting content (I take it personally of course. I believe the Google Freshbot genuinely enjoys reading my new content :)).

But I just discovered something brand new. I added a new page to my site on the 1st of this month. Hence, it was created and posted AFTER the main monthly index update.

However, this new page already has appeared in Google. It appears that the Freshbot has taken a fancy to it, and has added it in, even though it was never in Google originally.

Has anyone else seen this? Is this a recent skill added to the Freshbot?

11:51 am on Oct 3, 2002 (gmt 0)

New User

10+ Year Member

joined:Aug 31, 2002
posts:39
votes: 0


I can verify that, similar experience here. Seems as if the new page has to be linked from the index page for a fresh to happen.
11:53 am on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member macguru is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 30, 2000
posts:3300
votes: 0


Everflux [searchengineworld.com]...
12:06 pm on Oct 3, 2002 (gmt 0)

New User

10+ Year Member

joined:Aug 31, 2002
posts:39
votes: 0


Mac, yes sounds like an everflux. Only difference (or change), is that everflux would only update pages that had been included in the dance. This is a new spin to everflux. Will follow from index to completely new pages.
12:24 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 16, 2002
posts:53
votes: 0


Yes, it looks like Google's attempt at freshening up its index has been taken to a new level.

There's no greater satisfaction than adding a new page today, and having it listed in the top SERPS by tomorrow. Now that's service for you!

12:28 pm on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 28, 2001
posts:1380
votes: 0


Furthermore,
Google will even add a new site with a new domain with the fresh bot. Of course, there will be noe page rank, but it will rank in the SERPs.
12:30 pm on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3443
votes: 1


Googlebot is in action on my site no, but its the good old Googlebot.

zeus

12:32 pm on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 29, 2002
posts:1819
votes: 0


If my theory of freshness taking more weight in the google algo is correct then it would not surprise me to see google adding pages as it goes, keeping the guess pr and then recalculating that monthly.

Which google cralwer is the fresh one, I often see many crawl12, 4, 10 etc. But do we have any idea to which is fresh googlebot?

1:08 pm on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


This is not something recent. For several months, Google will add new pages if they find them linked from pages they "fresh crawl" from. They will appear for around 3 days and then disappear to reappear again permanantly after the next monthly update. If it keeps on finding reocurrences of the new page from "fresh crawls" they may reappear again for 3 day periods before the next update, from all apperances here.
2:02 pm on Oct 3, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 2, 2002
posts:446
votes: 0


Visit Thailand,

Someone said in a recent thread (can't find it now) that the fresh bot is 64.68.82.* and the deep-crawling bot is 216.239.46.* . (Thanks to whoever it was that posted the info!)

And that seems to jibe with what I'm seeing in my log files.

Beth

2:10 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 16, 2002
posts:53
votes: 0


bether2

That would be warmasol in [webmasterworld.com...]

2:31 pm on Oct 3, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3443
votes: 1


chiyo your right every 3-4 days my page get a fresh date and the links from that page will be updated but is then gone 3-4 days.

zeus

mog

4:02 pm on Oct 3, 2002 (gmt 0)

New User

10+ Year Member

joined:Dec 4, 2001
posts:14
votes: 0


New site went live on Monday 30th Oct, the site was added to the dmoz directory on Tuesday and the entire site has been added to the google database today.

This is the fastest inclusion I have experienced, Probably down to the dmoz inclusion.

4:06 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 25, 2002
posts:77
votes: 0


I don't know if this means anything but here is what I have noticed.
Last month during the fresh crawl they added my new site and it looked like this:
Title
description
www.domain.net CACHED 9-22
(then this vanished in a few hours from the index until the dance)

With this fresh crawl I see:
Title
description
www.domain.net (but it doesn't have CACHED "DATE" this time)Well, it did initally but not now.
(still indexed 2 days later)

I wonder if you see sites that don't have the CACHED "date" version, if they are here to stay?

4:10 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 26, 2001
posts:164
votes: 0


I've been seeing the 'freshbot' adding new pages for some time. But they often get missed because they are not being looked for. ;) In my experience any 'freshbot' content - new pages or old - drops out after about 48hrs unless recrawled.
4:11 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 20, 2002
posts:81
votes: 0


After a month, deep crawler came to my site last night. Just grabbed 2 pages and went away. Came back again this morning and grabbed another 2 pages. It is hitting my login page (when trying to access the secured pages) and going away from that. I do not want the bot to crawl my secure pages. From the index page, it just went to one of the inside link and got the login page.

Can you guys suggest anything about this behavior?

4:22 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 26, 2001
posts:164
votes: 0


Google honors the robots.txt file. disallow the pages you don't want crawled.
10:24 pm on Oct 3, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 24, 2001
posts:113
votes: 0


I had a site drop out of Google this last update. The entire site is gray-barred... no penalty. On the morning of Sun. the 29th, three pages were back and by the afternoon about 27 pages were back, all with a Fresh tag of Sept. 28. By Monday the 30th, 43 pages where in the index, all with a Fresh tag of Sept. 28. No pages are showing in the Google cache and they all still have no PR. Today, they lost their Fresh tag, but they are still in the index. I'm hoping they stay in the index for the rest of the month, but I'm skeptical. Is there a chance?
6:20 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Hi all,

I thought that new pages were added during an update?

I added a new page to a site at the weekend and now, I'm ranking No.1. for it's keyphrase with a date of wednesday this week...

er... can someone set me straight please, I appear to be working under some misconceptions here..?

Thanks

Nick

ppg

6:55 pm on Oct 4, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Mar 14, 2002
posts:328
votes: 0


everflux [searchengineworld.com]?
6:58 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


I'd not considered it as (correct me if I'm wrong) that deals with indexed pages. ie. pages added during an update?

Nick

7:02 pm on Oct 4, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 26, 2002
posts:402
votes: 0


Check your logs for an ip that looks like this:

64.68.82.*
I believe this may be the spider which is indexing pages between update.

7:07 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


We have got new pages added many times during updates. If it is found linked from at least our home page during our daily fresh crawl, it appears for at least 2 to 3 days, searchable from the main index.

This most probably occurs for all other pages that are crawled daily too, but cant be absolutely sure as we have never really tested it, and most new pages are linked from our index page anyway making it hard to assess.

Our index page changes at least 5 times to maybe 10 times a week with new pages added.

7:12 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


it appears for at least 2 to 3 days, searchable from the main index.

Meaning it disappears after that?

I've been constantly crawled (everyday) for well over a month now... Good sign?

It's PR5, I thought that I would just get the standard 1-3 crawls a month?

...and yes Quinn, it is that IP: Significance?

Nick

7:20 pm on Oct 4, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 26, 2002
posts:402
votes: 0


Sorry Nick_W, I should have elaborated a bit....

This thread can probably explain as this better than I.

[webmasterworld.com...]

7:25 pm on Oct 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Right, got it! THanks. Check out my next thread ;)

Nick

9:40 am on Oct 5, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 16, 2002
posts:53
votes: 0


It is hitting my login page (when trying to access the secured pages) and going away from that. I do not want the bot to crawl my secure pages

QNetwork, googlebot is clearly trying to hack into your site ;). Before you know it, your private content will be duplicated on Googles main site ;)

11:23 am on Oct 5, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 13, 2002
posts:676
votes: 0


One thing I've noticed, though I haven't been able to do enough research to confirm it, so it's still a THEORY, at this point:

FreshBot prefers NEW over Updated.

Now, that's not to say that it doesn't like "updated" pages. It just seems that the Updated Page needs to have good PR and good incoming links (I guess those two are almost one and the same).

Here's how I come to this conclusion:

My site has millions of pages - about 25K of them are in the main index. When a page is added or updated, it gets slapped onto a "New and Updated" page and my robots text (and internal linking structure) tries to guide all the web spiders to use these pages to seed their index of the site rather than my Alphabetical Listings. It's been my hope that, since nothing will crawl my entire site, at least the new and pertinent stuff will get into the index. (With google, this plan never worked until 6 weeks or so ago when the "minty freshness" factor became much more prevalant - in the past it would hit the front page and wander around my site in a manner I couldn't possibly guess. It made the Bull in a China Shop look like it had a plan).

As Freshbot goes through the "new and updated" listings, usually all of the NEW pages are slapped into the index within a few days (and even if they don't get crawled again, they have been STAYING in the index until the next dance). The updated pages, though, have only about a 1 in 8 or 1 in 10 chance of getting put up at Google. Some do, some don't.

Now, that won't seem odd to many, but for someone with a big site, I ask - How does it know if it's new or updated if the page wasn't in the index before? I've got roughly 2 million pages that aren't in the index now (half of them are new as I just added a whole bunch of new features, but a million of them have been there for ages and most have never been seen by a living soul, not to mention the Googlebot). So, if I update a page and it's never been crawled, how does google know to call it a "new" page or an "updated" page? The only thing I can figure is that my pages all have an "?ID=#" to determine what the page is. Is the googlebot logging the highest "#" it came across so that it can tell what's new? <shrug> Who Knows?

G.

11:49 am on Oct 5, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 16, 2002
posts:53
votes: 0


Grumpus

Based on what I see, you have 22,800 pages index in Google, so I'd say you have your fair share of Google's index :)

This 94 message thread spans 4 pages: 94
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members