Gbot running hard

Forum Moderators: open

Message Too Old, No Replies

Gbot running hard

ncw164x

9:04 am on Sep 23, 2004 (gmt 0)

googlebot requesting between 2 - 5 pages a second, not seen this type of spidering for a long time

Kirby

5:01 am on Sep 29, 2004 (gmt 0)

>Geeze - well, I have a new site I launched 2.5 weeks ago that I submitted to Google last week and I haven't seen any signs of gBot on my door step yet.

Put a new site up 30 hours ago with one PR5 link to it. Gbot crawled it and cache showed up within 12 hours. Then new fresh tag 8 hours after that.

willybfriendly

5:03 am on Sep 29, 2004 (gmt 0)

t's Gbot on Viagra,... or Levitra,... or Cialis,... or Vardenafil,... or SuperViagra or ... anyway, you get the idea. Gbot is running hard for long time.

If that's true, does it mean that we are getting sc****d?

WBF

jnmconsulting

5:24 am on Sep 29, 2004 (gmt 0)

Running hard, It's an orgi...7 diff google bots spidering one of my sites since 4 pm

cabbie

7:02 am on Sep 29, 2004 (gmt 0)

>>>If that's true,...

Powdork

7:10 am on Sep 29, 2004 (gmt 0)

If that's true, does it mean that we are getting sc****d?

More than likely, most of us will think so.

No, we will not get a kiss afterwards, or even a BackRub.

Staffa

7:19 am on Sep 29, 2004 (gmt 0)

One of the new 66.249.xx.xx numbers is on it's 28 visit since yesterday.
Not to mention the countless visits of many other numbers in that range.

In that time period it asked 4 times for robots.txt and still crawls a DIR that's off limits.

Whatever their 'panic crawling' they better get their act together.

Josk

9:25 am on Sep 29, 2004 (gmt 0)

Have had a site that I've been trying to get into Google for ages. Right at the moment I'm not sure why its being spidered, but I think might due to a link from somewhere else.

Its being spidered hard as I speak... :)

ruserious

10:27 am on Sep 29, 2004 (gmt 0)

Here's something I don't understand:

Why are they hitting everybody that hard? With Billions of websites in their index one could assume they would be able to easily spread it out over enough websites that a single site doesn't get so strong at one point in time.
If they spider a site with 1 request in 2 seconds, they'd still be able to get > 2.5 Million pages over 24 hours from a single site. Why request 5-6 pages per second from the same website? Are we at the point where there are more webcrawlers than webservers?

I really hope this won't become standard-behaviour. Some sites have scripts that prevent people from downloading complete sites by serving 503s if too many requests come in during a short time period...

GerBot

10:34 am on Sep 29, 2004 (gmt 0)

friendly tip,
what your bandwidth limits closely - I just discovered a site of mine (large content/low visitors) had run out of bandwidth with most being a result of the Gbot.

Vork

1:07 pm on Sep 29, 2004 (gmt 0)

I bet hundreds of webmasters who are watching this post closely would do anything just to get a confimation from GoogleGuy (those were the days...) that this might herald the end of sandbox yoke :)
c'mon googlebot crawl deeper - I don't even mind exceeding my bandwidth for the month as long as this torture is over :)

Critter

1:09 pm on Sep 29, 2004 (gmt 0)

If they spider a site with 1 request in 2 seconds, they'd still be able to get > 2.5 Million pages over 24 hours from a single site.

Huh?

There's 86,400 seconds in a day. One page every two seconds would be 43,200 pages in a day, not 2.5 million.

You're a little off. :)

petehall

1:27 pm on Sep 29, 2004 (gmt 0)

If they spider a site with 1 request in 2 seconds, they'd still be able to get > 2.5 Million pages over 24 hours from a single site.

Huh?
There's 86,400 seconds in a day. One page every two seconds would be 43,200 pages in a day, not 2.5 million.
You're a little off. :)

I think the keyword here is site and not page?

He said spider a site with a single request every 2 seconds. If there could be 43,200 sites spidered at an average of 57 pages per site that would equal 2.5 million pages.

Having said that, I myself am a little confused about the statement and its true meaning :-\

I presume an entire site can not be indexed through a single request... :-)

Critter

1:47 pm on Sep 29, 2004 (gmt 0)

Nope, he said you could get 2.5 million pages in a 24 hour period *from a single site*.

Ain't gonna happen at that rate.

ddogg

4:58 pm on Sep 29, 2004 (gmt 0)

I haven't noticed an increase in Googlebot activity on my sites. Same as usual, bummer..

MrSpeed

6:14 pm on Sep 29, 2004 (gmt 0)

To add to the comment about trying to spider pages that don't exist.

Googlebot is trying to spider pages that don't exist on the domain but do exist on a domain on the same server and thus the same IP I think.

66.249.65.130 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

a104u2nv

6:22 pm on Sep 29, 2004 (gmt 0)

First, I would like to say hi to everyone this will be my first post even though I have been reading these boards for quite sometime.

Googlebot blasted my site yesterday after not visiting the site since the 14th of Sept. We were starting to get a bit worried about it but I kept coming here and reading about others that were getting the same treatment (not visiting or visiting very little then out of the blue a very deep crawl).

Just wanted to chime in and more or less introduce myself to everyone and share what Googlebot did to our site today.

ruserious

6:47 pm on Sep 29, 2004 (gmt 0)

Yes, Critter, you're absolutely right. I was off by a factor of 60. *embarassed*

But wouldn't 40.000+ pages a day per site still be enough? They've been at it for a week (?), so that would have been enough to sufficiently spider even the larger sites.

idf03

7:05 pm on Sep 29, 2004 (gmt 0)

Googlebot is trying to spider pages that don't exist on the domain but do exist on a domain on the same server and thus the same IP I think.

No, definitely requesting pages which exist on sites I link to on completely different domains and servers.

I can even see this happening between my sites.

Site 1 is getting requests for documents which only exist on Site 2. Sites 1 and 2 are on completely different servers in different servers/ISPs/locations/IP ranges.

What I can say, is that the links are of the type: link.php?url=www.abcd.com

sblake

7:45 pm on Sep 29, 2004 (gmt 0)

SERPS dancing around pretty significantly in the areas I monitor right now-- different results from search to search.

idoc

9:15 pm on Sep 29, 2004 (gmt 0)

"links are of the type: link.php?url=www.abcd.com"

provided that the bots were inappropriately attributing these dynamic redirects as belonging to site 1 and not site 2 where the content actually resides on site 2... then now the bots now have to determine *if* the page is on site 1 or not. I suspect the site 1 serp will be delisted and *hopefully* site 2 will now show the serp *and* get the appropriate p.r. transfer it is rightfully due from the incoming link from site 1.

idf03

9:48 pm on Sep 29, 2004 (gmt 0)

provided that the bots were inappropriately attributing these dynamic redirects as belonging to site 1 and not site 2 where the content actually resides on site 2..

Just to confirm - The link from Site 1 is to site 2 root only, not to any document on site 2, but gbot is looking on site 1 for documents which exist on site 2, but are not linked directly.

If I had a link to webmasterworld, I would be getting requests for control panel, site search, glossary etc.

g1smd

10:32 pm on Sep 29, 2004 (gmt 0)

#52: >> Was still wondering though ... how long till the cache of the new web page, or newly spidered webpage shows up in the google index? <<

A page modified on Sept 8th, and cached daily, showed up in the index for new search terms from new content on the page within 48 hours, but was still findable for search terms no longer on the page until only 3 days ago (even though the cache reflected the new content, the snippet still contained the old content, when running a search for the old content).

On the day that it was no longer findable for old content, Googlebot had shifted one hour earlier in its spidering compared with the time of arrival daily for the previous 3 weeks or more. Additionally, a fresh date was included on the day of the change (even though the fresh content had been online for 3 weeks, and had been indexed and cached daily for 3 weeks). Until then the new content result had not included a fresh date.

cabbie

1:39 am on Sep 30, 2004 (gmt 0)

Welcome a104u2nv and Thanks for your contribution.:)

darqSHADOW

3:46 am on Sep 30, 2004 (gmt 0)

GoogleBot has been tearing my site up, as well.

Over 10k pages cached, and 320MB of bandwidth used in 4 or 5 days straight of constant crawling. My entire website uses phpBB2 as a backend, and the forums reported this as of yesterday:

Most users ever online was 238 on 27 Sep 2004 06:18 pm

So its been going at me quite hard, it seems. Hopefully the new linking of the forums has caused the crawler to get my whole site, instead of the little 35 pages it crawled before.

webdude

11:48 am on Sep 30, 2004 (gmt 0)

provided that the bots were inappropriately attributing these dynamic redirects as belonging to site 1 and not site 2 where the content actually resides on site 2...

This is the crux of what is happening I believe.

So its been going at me quite hard, it seems. Hopefully the new linking of the forums has caused the crawler to get my whole site, instead of the little 35 pages it crawled before.

I have the exact same scenario. I have a forum on one of my sites that has never been fully crawled. Recently every page in the forum was crawled. In the past, a few pages of the forum would actually make it in the SERPs, but only after a month or 2. I check the SERPs now and it seems that about 30% of the forum pages are listed. That is way up since last month. Also some of these pages are very recent.

WebFusion

5:38 pm on Sep 30, 2004 (gmt 0)

Personally, I think they amy have finally solved the space limitations of their old system, and are doing a massive re-crawl of the entire web to recalculate the whole thing based on a new algo.

Stand by for the holiday cheers and jeers...if history is any indication, google's about to shake things up in a big way ;-)

g1smd

7:32 pm on Sep 30, 2004 (gmt 0)

It will be interesting to see if the old message changes anytime soon:

>> �2004 Google - Searching 4,285,199,774 web pages

darqSHADOW

7:45 pm on Sep 30, 2004 (gmt 0)

Well, new pages have been added to the index from my site now. For the past few months I've only had 35 (sometimes 34) pages indexed by GoogleBot. As of today I now have 552 pages indexed, most of them from my forums (which should help our rank, since my forums are full of my primary keywords).

My site has also moved up in many of our targetted results, which is nice. Hopefully I can track this change tonite and determine exactly where I moved up, and if any moved down, etc.

jnmconsulting

7:47 pm on Sep 30, 2004 (gmt 0)

Interesting, 90 million pages have been dropped out of the index by google since friday of last week. I maybe wrong...This is using the search term "+the" without quotes, it was 5,809,000,000 now its 5,710,000,000

kashyap rajput

7:37 am on Oct 1, 2004 (gmt 0)

I have noticed some deep crawling of googlebot with some new spider. Msnbot is too crawling my website

kashyap

This 176 message thread spans 6 pages: 176