Welcome to WebmasterWorld Guest from 184.108.40.206
Forum Moderators: open
You can refer to September's Everflux / mid-month changes [webmasterworld.com] to see the previous pattern, which is one of the primary reasons we're trying to keep them all together.
1.The Monthly Update
There's one major Google update per month, starting somewhere between the 20th and the 28th. It generally takes several days to settle, at which time the data migrates to Google's partner sites.
2. The Monthly Crawl
The major crawl takes place at right around this time. With some it's been reported to start at the same time or just before the update begins, while others are not crawled until afterward. This continues for several days, and there's been some conjecture that PR might possibly have something to do with the timing.
3. Mid-month activity, referred to as Everflux.
All during the month between updates there has been some spidering, sites added, and minor position changes. This is related to Google keeping their search results "minty fresh" but while minor ranking changes occur, this does not affect the entire index.
It's worked well, so we'll continue to do this now, in October. This is where we can post what we experience and observe, to compare and discuss it among ourselves. We'll again confine it all to one thread so it's simpler to keep track of the information and observations we share that are so important to us all.
Note: We'll be moving some threads onto this one, as we did last month, as well as for the update itself.
We sincerely thank you for your contributions to our collective knowledge and understanding and wish you all good luck with your sites.
But I just discovered something brand new. I added a new page to my site on the 1st of this month. Hence, it was created and posted AFTER the main monthly index update.
However, this new page already has appeared in Google. It appears that the Freshbot has taken a fancy to it, and has added it in, even though it was never in Google originally.
Has anyone else seen this? Is this a recent skill added to the Freshbot?
Which google cralwer is the fresh one, I often see many crawl12, 4, 10 etc. But do we have any idea to which is fresh googlebot?
Someone said in a recent thread (can't find it now) that the fresh bot is 64.68.82.* and the deep-crawling bot is 216.239.46.* . (Thanks to whoever it was that posted the info!)
And that seems to jibe with what I'm seeing in my log files.
With this fresh crawl I see:
www.domain.net (but it doesn't have CACHED "DATE" this time)Well, it did initally but not now.
(still indexed 2 days later)
I wonder if you see sites that don't have the CACHED "date" version, if they are here to stay?
Can you guys suggest anything about this behavior?
I thought that new pages were added during an update?
I added a new page to a site at the weekend and now, I'm ranking No.1. for it's keyphrase with a date of wednesday this week...
er... can someone set me straight please, I appear to be working under some misconceptions here..?
This most probably occurs for all other pages that are crawled daily too, but cant be absolutely sure as we have never really tested it, and most new pages are linked from our index page anyway making it hard to assess.
Our index page changes at least 5 times to maybe 10 times a week with new pages added.
it appears for at least 2 to 3 days, searchable from the main index.
Meaning it disappears after that?
I've been constantly crawled (everyday) for well over a month now... Good sign?
It's PR5, I thought that I would just get the standard 1-3 crawls a month?
...and yes Quinn, it is that IP: Significance?
It is hitting my login page (when trying to access the secured pages) and going away from that. I do not want the bot to crawl my secure pages
QNetwork, googlebot is clearly trying to hack into your site ;). Before you know it, your private content will be duplicated on Googles main site ;)
FreshBot prefers NEW over Updated.
Now, that's not to say that it doesn't like "updated" pages. It just seems that the Updated Page needs to have good PR and good incoming links (I guess those two are almost one and the same).
Here's how I come to this conclusion:
My site has millions of pages - about 25K of them are in the main index. When a page is added or updated, it gets slapped onto a "New and Updated" page and my robots text (and internal linking structure) tries to guide all the web spiders to use these pages to seed their index of the site rather than my Alphabetical Listings. It's been my hope that, since nothing will crawl my entire site, at least the new and pertinent stuff will get into the index. (With google, this plan never worked until 6 weeks or so ago when the "minty freshness" factor became much more prevalant - in the past it would hit the front page and wander around my site in a manner I couldn't possibly guess. It made the Bull in a China Shop look like it had a plan).
As Freshbot goes through the "new and updated" listings, usually all of the NEW pages are slapped into the index within a few days (and even if they don't get crawled again, they have been STAYING in the index until the next dance). The updated pages, though, have only about a 1 in 8 or 1 in 10 chance of getting put up at Google. Some do, some don't.
Now, that won't seem odd to many, but for someone with a big site, I ask - How does it know if it's new or updated if the page wasn't in the index before? I've got roughly 2 million pages that aren't in the index now (half of them are new as I just added a whole bunch of new features, but a million of them have been there for ages and most have never been seen by a living soul, not to mention the Googlebot). So, if I update a page and it's never been crawled, how does google know to call it a "new" page or an "updated" page? The only thing I can figure is that my pages all have an "?ID=#" to determine what the page is. Is the googlebot logging the highest "#" it came across so that it can tell what's new? <shrug> Who Knows?