Forum Moderators: open
On another site, also 100,000+ pages, with just a few new links here and there (but from the same high pr8's network as site1 above), Gbot took in nearly 50,000 pages yesterday. It's rankings have jumped!
Is it possible that I over-did it with new inbound links? Why is one site getting all the crawls and the other nothing?
I don't want to get too off topic here, but I'm wondering why a pr8 site with hundreds of pr8 links from diverse and VERY well respected networks is not getting crawled as would be expected.
Am I wrong to be looking at off-page factors and is it possibly something on page? Is Googlebot the bellweather of a bigger problem? Dupe content?
Prior to Nov 1 Google had 15,000 pages in its index. As of today the number of pages in the index ACTUALLY DROPPED by 3,000 pages. On top of that majority of pages that appear in the index are 'undigested' NO DESCRIPTION/NO TITLE pages.
I am at a point where I am completely exasperated with Google.
Has anyone with partially indexed sites actually seen an increase in Google index of their site as a result of increased spidering?
But things are in rapid change process. My indexed amount changes very fast, every minute I got a new number, sometimes more sometimes less, yes, it would be dancing. When the number dropped down, some pages got url only but few minutes later they came back and the amount even increased.
Hard to tell what will be, but now I am considerring one poosibility: Google is filling database with new data, but before that, it drops old data first. It takes time, so we are facing some temp problems.
I hope it is.
But here's what happened on a non-commercial site I look after. Over 50,000 pages (mainly a forum). PR6.
Googlebot normally drops by and samples 1000 or so a day. That's fine.
Yesterday, it hit pretty much every single page, and a lot of them twice. Peak rate was 7 a second (probably would have been higher, but it's a slow server, and we are not alone on it).
That's a couple of gig in bandwidth gone in a day and a complaint from the ISP about our CGIs "running wild" (they thought our code was spawning runaway processes).
What we saw on the site yesterday was a denial of service attack.
These things should be illegal.
MSNbot obeys the (admittedly non-standard) crawl-delay directive in the robots.txt and drops by at a regular and managable rate.
Googlebot, on the other hand, decided to shut us down -- at least, that how it looks from here.
If it is a serialized "attack" from just one bot, perhaps it can be slowed down by some added php code like
"if ($visitor == googlebot) then hold and just wait 5 seconds before giving out that page"?
However, I don't know, how multiple php waits would impact the web server.
Regards,
R.
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
But I don't notice any of the pages crawled by that bot, indexed at Google.. I guess a big update is going to happen soon.
Do you have any pages indexed at Google and crawled by that version of the bot?
This is the trend, and it's not going to go away. There are only going to be more bots looking for more data more frequently. And this is a good thing!
"Bots Gone Wild."
It triggered the V-chip in my tv.
One thing I did notice is G was looking for both listings of www.thedomain.com and thedomain.com
Without the "www" was how many of has had our pages listed years ago, I hope it doesn't evolve into a penalty because my rankings have recently improved.
oh, oh... I just jinxed myself.
I have not seen any movement in serps for 3 months, probably an off topic issue though.
My theory: Google has a major change in filter or algo every 3 months. All the daily updates are based on that algo or filter.
Based on the money I made, I can trace the quartly changes at least since March. The last major update was on the first week of August and another one should be coming within days.
Perhaps they we maxed out, and in an attempt to list all the pages, showed only URLs where the googlebot was behind.
With an inflow of finds to the crawl, now they can remedy the situation. This could account for the hard crawl.
Well if you don't want googlebot to crawl your site, simply block it. It's really simple
I don't think anyone is saying they don't want to be crawled.
We all want Googlebot (and any others) to visit at a civilised rate.
That rate will vary by site, of course, so it should be settable by the webmaster in some way. (MSNBot has got such a mechanism).
Googlebot not only lacks such a mechanism, it has simply gone wild in the last few days.
That's doing evil -- something Google used to claim they don't do. I don't find that acceptable.
That would certainly prevent webmasters from having to block googlebot if it gets too aggressive.
When you have new content you could increase the crawl rate and then when the pages are indexed - ease back on it.
I am sure a PHD at google can add a line of code to the algo that handles robots.txt to determine the crawl rate.
<edit>typo</edit>