Gbot running hard

Forum Moderators: open

Message Too Old, No Replies

Gbot running hard

ncw164x

9:04 am on Sep 23, 2004 (gmt 0)

googlebot requesting between 2 - 5 pages a second, not seen this type of spidering for a long time

idf03

2:52 pm on Sep 28, 2004 (gmt 0)

I'm also getting pounding by this 'googlebot'

An example line:

66.249.65.236 - - [28/Sep/2004:15:48:24 +0100] "GET /tps_page.html HTTP/1.1" 404 335 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The IP does appear to belong to Google, but whereas googlebot normally spreads it's load across several bot machines, these are all coming from one IP address.

Also they are requesting docs which do not and have never existed on my site. They do appear to be docs which exist on sites I link to. It's as though the bot has followed links but has not changed the server part of the link.

sri_gan

3:00 pm on Sep 28, 2004 (gmt 0)

Yes its possible to spoof the ip ranges through a network program. Its a very rare occurance.

It could be something which could take control over the Big 2's network.

I will keep it watching....

If the webservers cannot handle it, I suggest to Block the Ip Ranges we detect over here.

When there is a Mass Block, there will be a Solution.

petehall

3:10 pm on Sep 28, 2004 (gmt 0)

idf03 tracert for you:

17 94 ms 94 ms 92 ms reserved.above.net [209.249.73.70]
18 93 ms 92 ms 92 ms 66.249.65.236
Trace complete.

Almost all pages, accross all of our websites, are indexed by Goolge on a daily basis.

Up until the 14th/15th of this month, this had been happening on IP ranges 64.68.*

We are still being indexed however not on the above IP range!

idf03

3:18 pm on Sep 28, 2004 (gmt 0)

Point taken Petehall, I get same:

15 so-3-0-0.mpr2.iad2.us.above.net (64.125.28.214) 89.643 ms * 88.532 ms
16 reserved.above.net (209.249.73.70) 94.271 ms 94.594 ms 93.38 ms
17 66.249.65.236 (66.249.65.236) 107.996 ms 90.158 ms 90.918 ms

however, whois clearly shows google. Is this a spoof whois entry?

sri_gan

3:32 pm on Sep 28, 2004 (gmt 0)

I did verify the Whois Information and they all belong to the Big 2's, but still the way they are requesting for Pages is really weird.

sri_gan

3:37 pm on Sep 28, 2004 (gmt 0)

Flooding Ips and Agents:

GOOGLEBOT (MOZILLA)
66.249.66.0

GOOGLEBOT (MOZILLA)
66.249.65.0

MSNBOT
207.46.98.0

After Blocking the above 3 ranges, Google still Crawls with its regular user agent in the below ip:

66.249.64.xx Googlebot/2.1 (+http://www.google.com/bot.html)

MSN crawls with

65.54.188.xx
msnbot/0.3 (+http://search.msn.com/msnbot.htm)

jnmconsulting

3:49 pm on Sep 28, 2004 (gmt 0)

I believe that google has set up new bots and filters and is building a new index. It appear to be an accross the board change, new IP's new Bots, new index, new filters.

DaveN

3:50 pm on Sep 28, 2004 (gmt 0)

lets all ban that pesky little fellow :)

DaveN

petehall

3:53 pm on Sep 28, 2004 (gmt 0)

sri_gan, 66.249.64.* is indeed the new IP range from which we are being indexed.

As mentioned previously, this change took place on the 14th of September 2004.

sri_gan

4:15 pm on Sep 28, 2004 (gmt 0)

Pete,

66.249.64.* seems to crawl in the usual way, So I left it to crawl.

The other 3 ranges are way too crazy when they start crawling, if its big 2's codes then they must hire some good code writers :), if not they must know they are mis-used.

BillyS

4:46 pm on Sep 28, 2004 (gmt 0)

This robot has been around since at least August 10th, see this thread:

[webmasterworld.com...]

Reading through some information on this robot, at least one person states that they have confirmed with Google that it is their agent.

it's just not the bot you want to see if do aff marketing.... just a feeling and a few tests ;)

Speculation is (and I think their might be something to this) that Google may be stepping up the fight against cloaking.

jmwebguy

5:35 pm on Sep 28, 2004 (gmt 0)

Here's my question. Starting on Monday I noticed that the cache that showed on Google was of my newly designed site, which I was very happy about. Today though, it's of my old design.

Anyone know why?

defone

5:52 pm on Sep 28, 2004 (gmt 0)

It seems that googlebot is requesting LOTS of pages that are not existing on my server. I noticed that some pages (that googlebot is trying to request) are belonging to other sites about the same topic. Why this is happening?

defone

6:01 pm on Sep 28, 2004 (gmt 0)

aah it seems that googlebot is trying to find pages from my server where I am (out) linking.

So when I have these kind of links:

[mysite.com...]

gbot is trying to request from my site:

[mysite.com...]

is this a bug or is there something weird going on?

sri_gan

6:06 pm on Sep 28, 2004 (gmt 0)

Speculation about identifying Cloaking has nothing to do with this kind of a Mass Request, Cloaking cannot be identified by this way for sure.

There are better ways to identify cloaking, Personally I rule out that possibility.

Apart from that, What has it to do with the MSNBOT on the same time?

There is one more person confirmed in this thread itself he also sees a mass request from MSNBOT.

idf03

6:16 pm on Sep 28, 2004 (gmt 0)

aah it seems that googlebot is trying to find pages from my server where I am (out) linking.
So when I have these kind of links:
[mysite.com...]
gbot is trying to request from my site:
[mysite.com...]
is this a bug or is there something weird going on?

That's precisely what's happening to me. Methinks there's a bug.

macdave

8:22 pm on Sep 28, 2004 (gmt 0)

So when I have these kind of links:
[mysite.com...]
gbot is trying to request from my site:
[mysite.com...]

Google may be probing to find out how various redirects are being used. They may be trying to detect, for examples, redirects intended to canonicalize a domain name (where the same page would exist on both domains) versus redirects used to count clicks (as in your example) versus cloaking and hijacking redirects.

dataguy

10:19 pm on Sep 28, 2004 (gmt 0)

Here's another theory on the new Googlebot: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

I manage a few search engine web sites and two months ago we were getting hit hard by the new MSNbot. Their bot was actually posting search terms to our search results pages, and a lot of the 'search terms' that it was using were just a bunch of gibberish. Some of it set off some SQL inject attack alarms so I decided I had to do something about it. I contact MS about the issue, and they said that their bot would only crawl URL's that they knew had been viewed by humans, which I had to question. Somewhere their data got corrupted because the garbage they were searching for on my sites was barely text. I figured that in their zeal for adding as many pages to their SERPs as quickly as possible, they were probably taking info from their toolbar users and crawling those URL's.

For the past week or so I've noticed an approximate 20% increase on search traffic on my search engine sites so I decided I needed to investigate. Sure enough, I'm getting searched by the new Googlebot. I don't recall ever having the Googlebot (or any other bot besides MSNbot) crawl my SERPs.

I also see in this thread that others are reporting pages being crawled by the new Googlebot that no longer exist, or perhaps never existed. Getting URL's to crawl from the Google toolbar could be an explanation for this since users often type in incorrect URL's before they type in the correct page.

This is just a theory, but with my small sampling, it makes sense. Can anyone add anything that helps prove or disprove this theory?

BillyS

10:48 pm on Sep 28, 2004 (gmt 0)

I'm going to add to the list of Google problems:

1. Adsense stats not available for over 1 day
2. Adwords stats not showing up (on the same day Adsense had problems)
3. Gbot Running hard
4. PR not showing up on the toolbar

I have never seen the PR bar blank before - maybe it is just my connection.

Coincidence? I think not.

steveb

10:56 pm on Sep 28, 2004 (gmt 0)

"3. Gbot Running hard"

I wouldn't call that a "problem". It is a great thing. The underlying reason may be because of a problem, but heavy crawling is a good thing.

Powdork

11:09 pm on Sep 28, 2004 (gmt 0)

It's Gbot on Viagra,... or Levitra,... or Cialis,... or Vardenafil,... or SuperViagra or ... anyway, you get the idea. Gbot is running hard for long time.;)

sabai

11:58 pm on Sep 28, 2004 (gmt 0)

So when I have these kind of links:
[mysite.com...]

It's off topic, but you should be URL encoding the second URL like this since it's a get variable....

http://www.mysite.com/index.php?out=http%3A%2F%2Fwww.othersite.com%2Fpage1.html

Nuttzy99

12:09 am on Sep 29, 2004 (gmt 0)

It's Gbot on Viagra,... or Levitra,... or Cialis,... or Vardenafil,... or SuperViagra or ... anyway, you get the idea. Gbot is running hard for long time.;)

I posted msg #7 of this topic. At the time, GBot was going nuts to the point of putting quite noticable and prolonged spikes on my bandwidth graph 3 days last week. However, such a spike has not occurred since Thursday.

I'm going to go ahead and say GBot running normal. Or perhaps slightly elevated.

-Nuttzy

Spine

12:37 am on Sep 29, 2004 (gmt 0)

My site isn't very large (about 100 pages) and Gbot has spidered each page 4x today. The extra spidering attention just began on Sunday for me.

Also, my site is one of the ones affected by the Sept 23 traffic drop problem.

Liane

1:43 am on Sep 29, 2004 (gmt 0)

I wouldn't say Google Bot is running normally at all. I just fished this out of today's logs:

Page Views�154
Percent�5.98�%
Spider��Google: 66.249.65.111�Mozilla/5.0 (compatible; Googlebot/2.1;+http://www.google.com/bot.html)

After that ... there were another 68 additional hits at a more reasonable rate. Not a very polite spider!

Come on Google, pull the reigns in on this bot. Spidering is one thing. Slamming the site is another.

johnnyb

3:00 am on Sep 29, 2004 (gmt 0)

I've a website which has nearly 200,000 pages but for me Googlebot has crawled:

4000 pages yesterday
and 5500 pages till now.

In my case I would'nt say this as a very deep crawl. Does anyone with more than 100,000 pages got all the pages crawled in the past 3-4 days?

cabbagehead

3:35 am on Sep 29, 2004 (gmt 0)

Geeze - well, I have a new site I launched 2.5 weeks ago that I submitted to Google last week and I haven't seen any signs of gBot on my door step yet.

:-\

coosblues

3:52 am on Sep 29, 2004 (gmt 0)

Good point - I quit my full time job to do just that (among other things)

Nice :). I have little doubt it's not a computer or SEO book either.

Lesson's learned here brought my site to page 1 and all the better for us that do care when others shun the tidbits and morsels to be found on WebmasterWorld.

How many times since I've been a memember here I've heard rants and raves and conspiracy theories aboud.

All I can tell you is Google has been very good to me and no "the sky is not falling".

Nuttzy99

3:52 am on Sep 29, 2004 (gmt 0)

My site isn't very large (about 100 pages) and Gbot has spidered each page 4x today.

Page Views 154
After that ... there were another 68 additional hits at a more reasonable rate. Not a very polite spider!

No. You people are talking about a couple hundred hits. Give me a break! Look at the first post of this topic...

googlebot requesting between 2 - 5 pages a second, not seen this type of spidering for a long time

...we're talking magnitudes higher. I had a few hundred visits this month before last week. Last week I had 20,000 visits. Now it is back to being much more reasonable.

I'm pretty sure that's the magnitude we are talking about people... not an extra 100 here and there.

-Nuttzy

ronburk

4:22 am on Sep 29, 2004 (gmt 0)

It seems that googlebot is requesting LOTS of pages that are not existing on my server.

One possibility is this:

If you have a page that does an HTTP 301 Location: redirect to an HTML page on another domain, Googlebot for the last few days (at least on my site) has started using the following new behavior: it will assume the base URI is the URL of the page that issued the redirect.

Thus, if www.A.com/misc/links?dipsy_doodle issues a 301 redirect to www.dipsydoodle.com/links.html, and that HTML contains an anchor with an href of "./fred.html", the Googlebot will issue an HTTP GET for "www.A.com/misc/fred.html". As another example, if the foreign page contains an href like "/sammy.html", then Googlebot will issue a GET for "www.A.com/sammy.html".

This is a change in behavior, and is generating a lot of 404s for a lot of webmaster logs, since this mechanism is a common way to get reliable click tracking.

This 176 message thread spans 6 pages: 176