Google hits!

Forum Moderators: open

Message Too Old, No Replies

Google hits!

kosar

2:25 pm on Oct 25, 2004 (gmt 0)

60,000 google hits so far this month never have seen anything like this. Anyone else having this experience?

jack_dt

12:54 am on Nov 4, 2004 (gmt 0)

Killroy, you need some better scripts.

One of my servers gets about 30k hits an hour from the G bot. All codes.

aleksl

3:21 am on Nov 4, 2004 (gmt 0)

30K hits an hour is only about 8 per second. That should not create 50 simultaneous processes. He's got bigger issues - slow code, and a bigger server won't help much here.

jcoronella

3:32 am on Nov 4, 2004 (gmt 0)

What language is that in Sven? ;)

steveb

3:36 am on Nov 4, 2004 (gmt 0)

It seems Google is heavily crawling some stuff it lightly crawled before, and more lightly crawling sites it used to crawl much more heavily. From what I read here Google seems to be heavy crawling the low PR side of the web. In contrast, there is a thread in another forum talking about Googlebots recent timidity.

RedWolf

3:49 am on Nov 4, 2004 (gmt 0)

I'm glad to see that GBot is hard at work. I put up a new site last week with a link from one of my other sites. Over the weekend GBot showed up and grabbed the home page a couple times, then today Gbot came with a vengence and looks like it got almost all the pages. Tonight I got my first visitor from a Google search. Page 5 of the SERP for a term I really didn't plan for, but what the hey, I'm in in a week.

archnizzle

3:59 am on Nov 4, 2004 (gmt 0)

I concur Steveb---on one PR8 site with 100,000+ pages, Gbot is not crawling like it used to (only a few hundreds pages crawled), despite getting hundreds of new links from a massive network of high pr sites (mostly 8's). Worse yet, rankings have slipped in the last 48 hours.

On another site, also 100,000+ pages, with just a few new links here and there (but from the same high pr8's network as site1 above), Gbot took in nearly 50,000 pages yesterday. It's rankings have jumped!

Is it possible that I over-did it with new inbound links? Why is one site getting all the crawls and the other nothing?

WebFusion

4:15 am on Nov 4, 2004 (gmt 0)

Is it possible that I over-did it with new inbound links? Why is one site getting all the crawls and the other nothing?

Maybe google's on to the network your buying links from (and they are taking steps to stop it from passing PR)?

archnizzle

4:31 am on Nov 4, 2004 (gmt 0)

WebF ¦ While it's certainly a very real possibility in general, I don't think so in this case. The main reason I'm ruling it out, is because the other site (site2) that is now being crawled more and that has moved up in rankings is linked off the same network in question (but albeit to a much lesser extent than site1). As expected, the biggest serp jumps for site2 were for keywords/pages specifically linked off of this network.

I don't want to get too off topic here, but I'm wondering why a pr8 site with hundreds of pr8 links from diverse and VERY well respected networks is not getting crawled as would be expected.

Am I wrong to be looking at off-page factors and is it possibly something on page? Is Googlebot the bellweather of a bigger problem? Dupe content?

sasha

5:38 am on Nov 4, 2004 (gmt 0)

Our site, PR4 has 80,000 pages. Got 105,000 hits on Nov 1, at about 20 pages per second.

Prior to Nov 1 Google had 15,000 pages in its index. As of today the number of pages in the index ACTUALLY DROPPED by 3,000 pages. On top of that majority of pages that appear in the index are 'undigested' NO DESCRIPTION/NO TITLE pages.

I am at a point where I am completely exasperated with Google.

Has anyone with partially indexed sites actually seen an increase in Google index of their site as a result of increased spidering?

archnizzle

6:16 am on Nov 4, 2004 (gmt 0)

well, i guess i know why i'm not getting spidered now...seems Gbot is too busy elsewhere :P

ccton

7:48 am on Nov 4, 2004 (gmt 0)

Hi sasha, I saw the same.

But things are in rapid change process. My indexed amount changes very fast, every minute I got a new number, sometimes more sometimes less, yes, it would be dancing. When the number dropped down, some pages got url only but few minutes later they came back and the amount even increased.

Hard to tell what will be, but now I am considerring one poosibility: Google is filling database with new data, but before that, it drops old data first. It takes time, so we are facing some temp problems.

I hope it is.

referer

7:50 am on Nov 4, 2004 (gmt 0)

Yesterday I got 561143 hits from GoogleBot. Do I have the new record?

victor

8:36 am on Nov 4, 2004 (gmt 0)

In absolute numbers, I can't beat the numbers here.

But here's what happened on a non-commercial site I look after. Over 50,000 pages (mainly a forum). PR6.

Googlebot normally drops by and samples 1000 or so a day. That's fine.

Yesterday, it hit pretty much every single page, and a lot of them twice. Peak rate was 7 a second (probably would have been higher, but it's a slow server, and we are not alone on it).

That's a couple of gig in bandwidth gone in a day and a complaint from the ISP about our CGIs "running wild" (they thought our code was spawning runaway processes).

What we saw on the site yesterday was a denial of service attack.

These things should be illegal.

MSNbot obeys the (admittedly non-standard) crawl-delay directive in the robots.txt and drops by at a regular and managable rate.

Googlebot, on the other hand, decided to shut us down -- at least, that how it looks from here.

Bluepixel

9:55 am on Nov 4, 2004 (gmt 0)

Well if you don't want googlebot to crawl your site, simply block it. It's really simple :-)

DavidT

11:00 am on Nov 4, 2004 (gmt 0)

Really going for the amazon feed pages, no attempt made at distinguishing crap from good at least before the fact.

Romeo

11:06 am on Nov 4, 2004 (gmt 0)

Bluepixel,
... hmm, this seems not to be the solution to the problem. Wasn't there an ancient bot rule, that a bot should not generate high loads to a target site, but use a GET rate of a reasonable slow value?
We want google to visit our sites, but it would be nice if the bot would behave accordingly.
The old bots did respect this, so perhaps the new ones could do it, too.

If it is a serialized "attack" from just one bot, perhaps it can be slowed down by some added php code like
"if ($visitor == googlebot) then hold and just wait 5 seconds before giving out that page"?
However, I don't know, how multiple php waits would impact the web server.

Regards,
R.

digital

11:34 am on Nov 4, 2004 (gmt 0)

I am getting my pages crawled deep by:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

But I don't notice any of the pages crawled by that bot, indexed at Google.. I guess a big update is going to happen soon.

Do you have any pages indexed at Google and crawled by that version of the bot?

dataguy

11:49 am on Nov 4, 2004 (gmt 0)

I suppose you can't have it both ways. Last week we made an executive decision: we added another web server to cache pages served frequently and doubled the RAM in our existing database servers, all pretty much because of all the new Gbot, Jet, MSN, Tehoma, and Alexa/A9 activity.

This is the trend, and it's not going to go away. There are only going to be more bots looking for more data more frequently. And this is a good thing!

truth_speak

3:10 pm on Nov 4, 2004 (gmt 0)

me too

my small noncommercial site normally receives 30 googlebot hits every day or two; yesterday it had 235, nearly a 10-fold increase.

i am stoked.

but the new pages aren't being served in the index yet.

kahuna

3:13 pm on Nov 4, 2004 (gmt 0)

Did you see the new late night T.V. ads?

"Bots Gone Wild."

It triggered the V-chip in my tv.

One thing I did notice is G was looking for both listings of www.thedomain.com and thedomain.com

Without the "www" was how many of has had our pages listed years ago, I hope it doesn't evolve into a penalty because my rankings have recently improved.

oh, oh... I just jinxed myself.

webmktg

5:14 pm on Nov 4, 2004 (gmt 0)

My website was crawled Yesterday within 15 mins. My website has around 180000 pages. I think Google is going to have a major update of SERPs very soon.

webhound

5:16 pm on Nov 4, 2004 (gmt 0)

well its about time there was a major update in the serps.

webmktg

5:17 pm on Nov 4, 2004 (gmt 0)

Sorry my website has around 18000 pages 180000

walkman

8:37 pm on Nov 4, 2004 (gmt 0)

still over 300 requests today in one site with about 4000 pages....yesterday was very quiet though.
Stop crawling and start changing the SERPS ;)

sasha

8:47 pm on Nov 4, 2004 (gmt 0)

Has anybody seen 'freshly' spidered URLs in SERPS or in their site's index?

They spidered 105,000 pages on Monday and on Wed our site index (on a partially indexed site) was reduced by 3,000 to 15,000!

jnmconsulting

9:02 pm on Nov 4, 2004 (gmt 0)

I got hammered by googlebot tuesday, I have to go back and check the version when I get home. But I see lots of new pages when using site:www.mydomain.com. the are listed with just the url and the link for similar pages.

I have not seen any movement in serps for 3 months, probably an off topic issue though.

walkman

9:46 pm on Nov 4, 2004 (gmt 0)

"I have not seen any movement in serps for 3 months, probably an off topic issue though."

My theory: Google has a major change in filter or algo every 3 months. All the daily updates are based on that algo or filter.

Based on the money I made, I can trace the quartly changes at least since March. The last major update was on the first week of August and another one should be coming within days.

BigJay

9:47 pm on Nov 4, 2004 (gmt 0)

Does anyone think this might be related to all the index entries that are URL only?

Perhaps they we maxed out, and in an attempt to list all the pages, showed only URLs where the googlebot was behind.

With an inflow of finds to the crawl, now they can remedy the situation. This could account for the hard crawl.

victor

10:25 pm on Nov 4, 2004 (gmt 0)

Well if you don't want googlebot to crawl your site, simply block it. It's really simple

I don't think anyone is saying they don't want to be crawled.

We all want Googlebot (and any others) to visit at a civilised rate.

That rate will vary by site, of course, so it should be settable by the webmaster in some way. (MSNBot has got such a mechanism).

Googlebot not only lacks such a mechanism, it has simply gone wild in the last few days.

That's doing evil -- something Google used to claim they don't do. I don't find that acceptable.

PhraSEOlogy

10:54 pm on Nov 4, 2004 (gmt 0)

It would be nice if googlebot could be given a directive in the robots.txt to indicate the acceptable number of page fetches per second.

That would certainly prevent webmasters from having to block googlebot if it gets too aggressive.

When you have new content you could increase the crawl rate and then when the pages are indexed - ease back on it.

I am sure a PHD at google can add a line of code to the algo that handles robots.txt to determine the crawl rate.

This 96 message thread spans 4 pages: 96

Google hits!

kosar

jack_dt

aleksl

jcoronella

steveb

RedWolf

archnizzle

WebFusion

archnizzle

sasha

archnizzle

ccton

referer

victor

Bluepixel

DavidT

Romeo

digital

dataguy

truth_speak

kahuna

webmktg

webhound

webmktg

walkman

sasha

jnmconsulting

walkman

BigJay

victor

PhraSEOlogy

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week