Welcome to WebmasterWorld Guest from 54.145.173.147

Message Too Old, No Replies

Google stopped crawling many sites Jun 15 AM

   
3:34 pm on Jun 15, 2010 (gmt 0)



Hey,

does anyone have crawling issues starting this morning at around 4:00 to 4:30 (Central European Time Zone)?

I checked the crawling of big sites in different verticals, all mainly in the European market. On all Sites the crawling by Google decrased by around 98%.

Thank you
7:19 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seems to be pretty widespread

[twitter.com...]

if you scroll way back to yesterday morning I can see the first confirms of this starting at the exact same timeframe as us.

Several threads ongoing over at Googles support forum as well.

Plex hasnt commented yet as far as I can see.

[edited by: drall at 7:26 pm (utc) on Jun 16, 2010]

7:25 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member lame_wolf is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Personally, I am not bothered. All my pages were cached ages ago. There have been no changes to the site. If they ain't crawling, then I am saving BW :)
8:18 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok now I am getting really wierded out. I thought hmm maybe they are ramping up mediabot as the primary crawler as was mentioned since we have adsense on the sites this would make sense.

I go and grep out all of the mediapartners-google lines in my log and see a few of the tens of thousands that are on obviously different ip addies.

I trace them back and they lead to Microsoft Corp?

Microsoft is spoofing Mediapartners-Google? The crazy drop in indexing started less then 1 hour later. I see how your friend came up with Bingbot-Googlebot going out on a date. This is really getting strange now.

These IP addies are showing in my logfiles labeled as Mediapartners-Google

65.55.218.36
65.55.215.168

Less then 1 hour later almost all regular GBOT activity dies.
9:23 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Don't be worried drall, just observe. Observation makes good science of things.

More observations from my site
~ crawling is non existent right now save for the index page.
~ google cache of the index page is outdated by a week.
~ My top content is still ranked well and getting search visitors, less important content is getting none and rankings are gone for those.
~ The site isn't banned or de-indexed.

Eliminating what makes no sense leaves what must be going on even if that too makes no sense.

Right now I think a good majority of websites are getting "cherry picked" in that their best content is returning in search like always but how many more pages are being ranked well depends on site SIZE. Auto generated content doesn't count towards site size right now.
It seems to be percentage based, if size is deemed on topic, related, quality enough your site may be entitled to x% more top ranked pages. Sites like WW that are full of such content get more longtail traffic but not as much as before because the % is smaller than before. Crappy sites or small sites get their "BASE" quota and nothing more, they've lost the most longtail traffic.

What bothers me about this, if it's true, is that mashup sites also get their quota and it comes at the expense of the sites they scrape.

Gone are the days of auto-generated top rankings and going with 500 thin sites vs one good one, a good thing.

Speculations - nothing more. Incoming links overpower anything, work on those (carefully, lower quality pages could start replacing your best stuff if a quota is in play), link out freely and consider doing it WITHOUT nofollow tags since those are Google only. Relying solely on Google, well, look at the mess that's created.
10:18 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh im not worried, Google put me on xanex years ago:)

Seeing a small sputtering to life the last hour. I think all the posts over at googles forums and here got someone to look into it.

I still dont understand how mediabot is coming from a microsoft ip.
10:23 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



my crawls are down 50%
10:24 pm on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



How about this explanation?

Google security has banned the use of Windows.

No one told them that googlebot is run from Windows machines.

Google bot-herders discovered bots weren't working and asked MS to cover for them.

:) :) :)

Well, it's as daft as a lot of things google is doing nowadays.
10:29 pm on Jun 16, 2010 (gmt 0)

10+ Year Member



This caffeine roll out has been interesting. I heard from many customers that crawling had come to almost a complete stop yesterday (and down quite a bit today for most of them). I on the other hand had several hours where my large site was getting 100K hits or more from gbot. This is by far a new record for hits/hour from them, so i'm still up in the air as to what is going on.
11:09 pm on Jun 16, 2010 (gmt 0)



Me too.

I've post here: [webmasterworld.com...]

My site has released for 4 months,in the last 15 days,Googlebot crawling 300,000-500,000 pages every day.

But,From 08/06/2010 I find Googlebot usually only crawling my homepage every 10-20 minutes:

"GET / HTTP/1.1" 200

Occasional,Googlebot will repeatedly crawling other 3-4 pages,such as:

............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............


Googlebot has been continuously crawling some pages that obviously does not belong to my site:

"GET /include/setup.exe HTTP/1.1" 404
"GET /author/*_klayiv_*/pisma_*/download.*.prc.zip HTTP/1.1" 404
"GET /usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html HTTP/1.1" 404
"GET /download/projects/vcpp/*_screen.zip HTTP/1.1" 404

Then I changed my site's IP,Googlebot get the robots.txt,but it continues to crawl my site's home page only:

"GET /robots.txt HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............


I've checked google webmaster tools,did not find any abnormality.

Google indexing has dropped some,24 hours indexing is zero,and my site's traffic has dropped 1/3.

I search my site name in google,is first.

I search "site:example.com" in google, and the domain root, example.com, is not the first.
1:08 am on Jun 17, 2010 (gmt 0)

5+ Year Member



After about 40 hours of Google crawling at about 2% of the usual volume on my hobby site, I suddenly see the crawl rate returning to it's normal levels. Hopefully it will stay.

Anyone else sees the same?
1:33 am on Jun 17, 2010 (gmt 0)

5+ Year Member



Before any of this google change my site was ranking #1 for the majority of all our major keywords for months. We slowly moved to #12, now we have been at #16 for the past two or three weeks now on our major keyword.

So I made some changes on the content and I have noticed this lag in google bot activity as well. Our main landing page has not been cached since Jun 02. For 14 days we have been waiting but no updates. I have no clue what is going on.
1:47 am on Jun 17, 2010 (gmt 0)

5+ Year Member



Hey guys, We have been seeing pages drop on two of our sites for the last 60 days. We are down to 20,000 pages in google from 300k from 3 months ago. Has anyone else seen this happen recently?
2:34 am on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yes, I've been seeing numbers like that on some sites. But when I dig into the data it doesn't hold water. URLs that are not included as indexed anymore are still getting search traffic! I think it's a data bug and not a reality.
3:52 am on Jun 17, 2010 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



we normally run 50-75k page views from gbot a day here. Two days ago it was 5k. Today 35k.
3:53 am on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



The google cache version of one of my index pages was 7 days old this morning, now it's dated May 1st. I'm going in the wrong direction here, lol. Google's making quite a mess of things.

I agree with that last post tedster but I will caution that it depends on time of day and day of week. Midnight to 3 am I'm 9999 with no traffic for some of my best keywords and then top of page 1 for the next few hours with associated traffic... My 3rd party tracking graphs show a strong ON/OFF pattern. My top keywords however, steady as a brick (for now).

edit: Brett, we posted at the same time, those numbers are eye opening! I have to tell you my faith in what Google is doing has never been lower, I'm not just saying that either, a mashup site is now one of my toughest competitors and it's not even their content!
8:32 am on Jun 17, 2010 (gmt 0)

5+ Year Member



It looks like everything is back to normal. At least for all my sites i see regular googlebot traffic again from today...
8:47 am on Jun 17, 2010 (gmt 0)



hi first of all

it seems like GOOGLEBOT IS BACK - please confirm. today 17th June 2010 at 01:00am (+/- 1 hour for the different domans) CET (central european time) googlebot showed up again.

crawling in the same rate before the outage as before, verified over 12 domains.

only strange thing is the IP of some (only some) of these requests is 109.169.26.7 which does not verify with this method [google.com...] as googlebot. - which could scrappers.

also we found out that the remaining one percent on googlebot rewquests during the downtime were in fact either googlebot mobile, googlebot image and googlebot which fetches the simtemap.xml and googlebot on the home page request. every normal googlebot activity was actually 0.

but hey, googlebot is now back - i like.

this was the first webmaster world thread i actually posted actively - nice experience, but
- i don't think it makes sense to mix up crawling with indexing with caching with search refereed traffic with ranking with caffeine discussions - these are all interesting aspects of the google economy, but as they are measured differently (or not at all like caffeine) and so should be discussed separately. just my 2 c.

does anybody else see 109.169.26.7 requests (a few hundred per hour) with googlebot headers?
11:33 am on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Back to normal - since 12:01 am until now anyway, the old average for that time period is back.
12:40 pm on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Back to normal it seems since 1am.
1:09 pm on Jun 17, 2010 (gmt 0)

5+ Year Member



Isn't a drop in individual crawl rates expected with their new emphasis on lots more crawling in parallel?
1:58 pm on Jun 17, 2010 (gmt 0)

5+ Year Member



Im seeing the same thing in the cache aging - its been 14 days for us, and we work hard at new content, as well!

judy
2:07 pm on Jun 17, 2010 (gmt 0)

5+ Year Member



seems like googlebot is back...had about 5000 hits in the last 8 hours.
4:33 am on Jun 18, 2010 (gmt 0)



My site has not returned to normal.
6:26 am on Jun 18, 2010 (gmt 0)

5+ Year Member



we had a good start but ours seems to be slower yet again. On our main site we had about 5000 hits at night but after 8am (PST) we only had 3000 which is pretty low.Seems like its goin away or working part time... Yahoo had over 15000 and our norm used to around 15000 from Google while yahoo was only around 8000. Seems like yahoo is picking up where Google is slaking :)
9:28 am on Jun 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't have raw logs to confirm details, but we put changes online on Monday and this morning (Friday) google's cache for our pages has changed and the resultant rankings have also changed.
10:22 am on Jun 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Flat zero crawl on many UK sites from the beginning of June.

Many new sites still not indexed and new pages on old sites (PR4-6's with thousands of backlinks each) not being indexed either.

Of course, the new twitter accounts for all those new sites have been indexed ... and the MFAs we created have been indexed fine too ... Caffeine seems to be faster indexing of web noise.
2:45 pm on Jun 19, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The google cache version of one of my index pages was 7 days old this morning, now it's dated May 1st.

I have seen that happen many times over the last few years. Google reverts to older data for a short while just before the latest crawl data appears. It is likely you'll show a brand new less-than-24hours-old cache date again in just a few days time.
This 57 message thread spans 2 pages: 57