| This 54 message thread spans 2 pages: < < 54 ( 1  ) || |
|Mozilla Googlebot Crawling Deep and Fast|
G Storm-Surge has presaged SERPs Updates in the past
The is some slight evidence from my site, and also from some others [webmasterworld.com], that G is beginning extensive-, deep- and fast-crawls of websites across the board.
This kind of activity has preceded a SERPs Update in the recent past [webmasterworld.com].
The G Mozilla-bot (the one that mostly carries out this kind of behaviour) is restricted in speed on my site, so I find it difficult for an accurate appraisal. Any other confirmation/denials?
Earlier today I had Mozilla Googlebot grab 350 page (35% of the site) over a 3 hour period. After seeing this thread I just checked my logs and Googlebot 2.1 has grabbed about 300 pages over the last hour or so.
Up until today:
Googlebot 2.1 = 3,043 pages
Mozilla Googlebot = 2,559 pages
Despite all this spidering, I continue to lose pages in BD's index. From a high of 1,020 pages I'm now down to 312 (with about 100 of those being supplemental) - an all time low for the last two years.
Ever the optimist, I think this is a good sign:
- I'm hoping for a big refresh
- A massive jump in pages indexed in the BD centers
- A huge jump in positions
- So much traffic that my host hit me with bandwidth penalties
Same thing here. I have never seen so few pages indexed by Google for my site in the last two years. Hoping you are right. Like a tsunami the Ocean retracts from the shore line prior to the tidal wave.
Now that March is here I've got access to the full stats for February. In what follows, understand that the figures actually only relate to 2 weeks, since the Google Mozilla-bot began to bang away at this site on 16 Feb.
The Google M-Bot has gone from a typical 50 hits/month to 1,000/day (200¦301¦304¦404) (with a further 40,000 attempted page-requests stopped by the site's Unruly-Bot Blocking routines). Everything else is normal, although it should be noted that, because the Adsense-bot often shared the same IP as the Mozzie-bot, it also got blocked.
|Adsense-bot crawls are ~25,000/month. |
Google-bot crawls are ~1,000/month.
Google Mozzie-bot crawls are ~60/month
503 Server-Busy hits are 50-1,000/month (varies enormously)
|Adsense-bot crawls are 18,565. |
Google-bot crawls are 1,141.
Google Mozzie-bot crawls are 11,511.
503 Server-Busy hits are 111,919.
I have had some major crawls last week, I also have noticed pr updates on a lot of my pages.
Supplemental club: Big Daddy coming - Part 1 [webmasterworld.com] (msg#23 + others):
|mozilla bot is crawling big time |
(this thread is the first part (this is part 2 [webmasterworld.com]) of a vast discussion on sites suddenly going supplemental under Big Daddy Data-Centres, starting Feb 21).
Added very much later:
This page on Matt Cutts blog [mattcutts.com] gives the means to distinguish between a Big Daddy Data-Centre and a normal Data-Centre (search for [sf giants] - "giants.mlb.com" as first result means you have hit BigDaddy; this info is going to be important in the next post).
Algo Update watch:
I am going to declare that the new Google-algorithm is both in-place and active. G have snuck it in behind everyone's backs whilst our attention was elsewhere.
Now, my site is a hyphenated domain name, on both .com and .co.uk:
Using this McDar tool [mcdar.net] to check each site for keyword1, using Check across Datacenter IP C-Block(Group A) gives results across a range of DCs. Using the [ sf giants ] test (previous post) will ID each DC as BigDaddy/not-BigDaddy. The contrast in results between the two is astonishing:
Site position on a keyword1 search (BigDaddy/not-BigDaddy):
|.com: site position: 99/31 (BD/not-BD) |
.co.uk: site position: 14/661 (BD/not-BD)
So, the .com site drops from position 31 to position 99, whilst the UK site leaps from position 661 to 14. My stats began showing movement on Monday 6. The visitor numbers are sharply up, and the percentage of visitors from G as opposed to Y has increased by 16%.
The update is already in place. It is simply a question of how many of the G-IPs are featuring it at any one time, and that is increasing on a daily basis.
G Mozilla bot watch:
This wretched bot continues to attempt to rape my site. It is crawling as hard as the site bot-prevention routines will allow (1,000 pages per IP per day) and has racked up 52,535 503 Server-busy responses in 7 days. This behaviour has continued unabated for 21 days.
Mine went for 28 days averaging around 60k pages. It has since stopped 2 days ago and I am back to normal bot traffic.
As I post this, Moz has been hitting my server at 3 pages per second for a total of 1550 pages so far in the last hour or so and still going strong...
I have been daily recieving around 500-1000 page views from him, but this is the most intense I have ever seen any bot from Google eating up my sites..
anywho - goodmorning, good day or goodnight (whichever is apt for your neck of the woods) - i am off to bed ;-)
I saw Mozillabot crawling parts of one of my sites that is currently de-listed on BD DCs for unknown reasons (not supplemental, but gone) with up to 25 (twenty-five) requests per second.
My site has just made a big bada boom return back to SERPs across all DCs. After languishing for 6 months.
Tomorrow is another day, though.
Mozilla Googlebot very active again today, grabbing 350 page (30% of site) in a bit over an hour.
Googlebot activity is normal on most of my sites except for one which is mostly supplemental since August last year. I had my first deep crawl of the supplemental pages of this site at February 23. this year and today my second. Looking at my sites, it seems that the surges of Googlebot concentrate on sites/pages that are supplemental only, the other pages are crawled at their normal rate. This could cause why some people see increased activity and others don't.
lammert, very nice observation, thanks.
Many pages in the supplemental index (not all) are in there because of cannonical issues. Increased spidering of pages in the supplemental index would go with the belief that Google is hopefully going to be fixing the the cannonical issue soon.
Continuing from Msg#36 [webmasterworld.com]:
G Mozilla bot watch:
13,053 visits in 13 days (restricted by the site Unruly-bot prevention routines [webmasterworld.com] to 1,000/day, and thus 30,000 visits in the last month, which is slightly up on the 60/month of the Autumn + Winter period). 503 Server Busy responses are about 250,000 for the same period. Phew!
Algo Update watch:
The McDar tool is currently offline. Visitor numbers are up on my site yet again compared to last week (about 5% last week, another 5% yesterday making 10% total).
>>>Many pages in the supplemental index (not all) are in there because of cannonical issues.
Indeed - one of the major problems of a homepage canonical problem is crawling depth (GG has confirmed this himself - although probably 2/3 years ago - cant find post at mo)
lammert - did those crawled pages make it into the index? - my site which has canonical problems also gets crawled by this Mozilla Googlebot thing - but only a very very few pages make the index.
Dayo_UK - Although Mozilla bot has crawled these supplementals now multiple times, they are still supplemental with cache dates from August 2005, both on BD and non-BD data-centers.
same situation here. around 30'000 hits within 12 hors...
hope that is a good sign....
noticed the Googlebot v2.1 for the first time - getting lots of pages fast...
|my site ... also gets crawled by this Mozilla Googlebot thing - but only a very very few pages make the index. |
|Mozilla bot has crawled these supplementals now multiple times, they are still supplemental |
You may find the research in this thread (msg#7) [webmasterworld.com] and here (msg#59+60) [webmasterworld.com] interesting reading.
It is, unfortunately, dense and not that easy to follow. In addition, it concerns URL-only SERPs rather than Supplemental. If I write down the headlines to those results, you may get the point:
- a URL has to be parsed 3 TIMES before it will get a title + snippet
- Specifically, 3 x times by the "standard" G-Bot
- If it got parsed once by the M-Bot it went URL-only.
I found those to be frightening stats.
AlexK, thanks for the links. I remembered the thread from some time ago mentioning that there is a minimum number of crawl actions before a page appears in the index but couldn't find it. If your information is correct I have to wait for the next Googlebot surge on these pages and then the first changes in the SERPs could be observed.
I'll keep you updated.
Has anyone noticed a pattern of googlebot only indexing pages that are linked from the hompage, or pages linked externally, and mozzybot indexing the rest?
Continuing from Msg#44 [webmasterworld.com]:
G Mozilla bot watch:
30,000 (200 + 304) pages taken from my site in 35 days (adding 301 + 404 hits gives 21,634 for March, 11,511 for 15 days in Feb = 33,000 total).
The site Unruly-bot Prevention Routines [webmasterworld.com] limit each IP to 1,000/day (503 Server Busy responses are 111,919 + 138,011 = 250,000 for the same period above) and, therefore, the M-Bot has essentially now been pounding my site continuously on a day-in day-out basis for 35 days.
Other threads also indicate heavy crawls: (some examples): msg #:181 [webmasterworld.com] and msg #:195 [webmasterworld.com].
Algo Update watch:
I spotted a rise in visitor numbers on Monday Mar 6, although other Google-watch threads have settled on Wed March 8 (remember! you read it here first!). For my site this rise has continued slow-but-sure on a day-by-day basis. The increase is not large (yesterday was about 11% up on "normal" visitor numbers), though most welcome after 2 years of freefall.
Relative numbers referred to my site from Google and Yahoo! are interesting:
(That is a 27% rise)
- Jan: 13.0 : 1
- Feb: 13.8 : 1
- Mar: 16.6 : 1
Algo Update watch:
Visitor numbers rose yesterday yet again, marking 18 straight days of an upward graph. Amazing. The increase appears to be solely from increased Google-referrals (the March G:Y! ratio is now 16.7 : 1).
The most sensible interpretation of this seems to be that:
- Google's new BigBrother (sorry! BigDaddy) infra-structure favours my site compared to the previous data-centre infrastructure, and
- BD is being steadily, and slowly, rolled out across all DCs.
- changes made 13 months ago to fix canonical-URL problems are only now being reflected in the SERPs.
G Mozilla bot watch:
This bot continues to rain down on my site (38 days) and, since it looks like this will become 40 days and nights of being drenched by it's continuous attention, I am going to name this update the:
Others (particularly Brett) can make their own decision about that, but that is my personal name for it from now on.
Algo Update watch:
My site's experience of this rolling-update has been of a steady upsurge in G-referrals (it continues:- now 21 days, 12% rise). Then I come across this thread [webmasterworld.com] which, near the end of the second page [webmasterworld.com] starts to sound like a Google update of old ("poof my site is gone").
In fact, one of the reasons for making this posting is to make note of the most marvellous coinage by seochristine of the word "searchquake" [webmasterworld.com], (msg#34) something which deserves to enter the lexicon of Google-isms. At the same link is a further report of a site reappearing on 8 March after tanking for 6 months.
The attention from the Google Mozilla-bot stopped on Sunday. Exactly at 40 days! You could not make this up, could you?
Significant events within the continuing Noah update warrant another posting:
The old, much-loved (?) G_Bot (identified by the UA "Googlebot/2.1 (+http://www.google.com/bot.html)") appears to have drowned [webmasterworld.com] (msg#18). It was last spotted swimming around my site on Sunday, March 26, although others have quoted March 28 [webmasterworld.com] as the last date. I would have spotted this sooner if some Bat-ty prat on a German-Uni IP had not been using a forged G_Bot UA whilst browsing my site. At least it will be easier to spot such deception in the future.
The King is dead; long live the King:
As noted above, the M_Bot stopped on my site on the same day as the G_Bot. In fact, it turned out to be only a pause - it was immediately back, full-tilt, and has not stopped (although the swim-rate compared to March is a little less - perhaps). Watch out for this bot! Whereas the old bot was slow and congenial, this new bot is vigorous and indifferent - it will take pages at a rate which may have your server (and bandwidth) groaning. Thankfully, it does accept compressed pages.
The new Princes, BTW, appear to be a phalanx of Nokias: I have spotted:
...all on Google IPs.
- "Nokia6010/1.0 (8.62) Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0)"
- "Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
- "Nokia3510i/1.0 (03.40) Profile/MIDP-1.0 Configuration/CLDC-1.0 UP.Link/1.1 (Google WAP Proxy/1.0)"
- "Nokia3120/1.0 (05.30) Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0)"
The Google Noah Update has been rolled-back:
This one is more contentious [webmasterworld.com], and I cannot fully understand what has happened.
As noted above, other threads have settled on December 27, March 8 and March 28 as significant moments in this rolling-update. The first date was insignificant for my site, but March 6 saw 2 keyphrases (actually single words) suddenly achieve vast importance in the G-SERPs for my site, and visitor numbers rose significantly all through March. On the third date that all stopped and, once again, in April they do not feature at all:
(from March AWStats results):
My site is currently #1 in the SERPs for Keyphrase1, yet no April results (nor pre-March). Keyphrase2 is currently >100 in the SERPs. Goodness knows what is happening with Google, but there can be no doubt that the March changes have been rolled-back.
- Keyphrase1 : 11,281
- Keyphrase2 : 5,000
- Keyphrase3 : 445 (a 'normal' Keyphrase results number)
| This 54 message thread spans 2 pages: < < 54 ( 1  ) |