If the irrelevant pages are from your own site, a robots.txt, a few judicious robots meta tags with noindex should help out.
Kukenan & Robster124
Either one would be interesting, but the second one would be mighty impressive as well. :)
I can't believe how many pages Gbot is pulling down and the rate. It grabbed 5K pages per hour for four straight hours. I haven't seen this kind of speed in a long time. Used to be a steady stream of 1 or 2 K pages. This rate is causing me to worry about surver overload.
5K pages/hour is less than 2 pages per second.
Last big crawl I peaked at 170 pages per second.
Sweet dreams. :)
Wow! I was just looking at my logs the other day and saw Google last hit me around the 18th. After reading this, I processed my logs and saw google's been hitting me on and off on the 24th and 25th. I hope it keeps up the pace today..hehe
Keep on truckin' Googlebot!
> Where is the significance in thos Mozilla-bot-thing?
> Maybe the moz-5.0 version accepts newer standards?
See this thread [webmasterworld.com] where GG offers some info.
I had a closer look at my logs spanning all of my sites right now, and I am a bit surprised. While it is true that on Jan 23 the GoogleBot activity is remarkably high, on an overall scale for January Yahoo's Slurp maxed every other spider out!
Slurp accounted for 73% spider traffic (more than 8000 requests), where GoogleBot only accounted for 20%. The Mozilla-Version of GoogleBot only made up for 0.15%.
The funny thing is though, that I'm not doing particularly well on Yahoo.
Same here, can't recall Gbot being so greedy, I wonder why?
Also msnbot is going crackers as well
Hah! Yahoo bot? Puh-lese.
Ask Jeeves' bot *regularly* crawles 10 times the amount of pages on my site than any other bot including Googlebot.
I was testing some updates to one of my sites a little while ago, and noticed that PR (according to one of my Firefox extensions) had shot up on a number of pages. Then it occurred to me that one of those pages had been at its current URL less than 24 hours. I restructured some things and set up 301 redirects, and the new URL is already PR7 (as I recall, the old one was PR2 yesterday).
So I checked a few searches, and noticed one of my pages had a 1/25/2005 date in the SERPS while showing a title that was changed on 1/26/2005. Go figure. Looks like a lot of things are updating.
GB is going totally frenzy on my site. Very, very deep crawl here. Two days ago my site was in their index for the first time, now it's been reduced to URL only again. What's going on here?
At least on the 23rd Google bot was using HTTP 1.1. This protocol requests dynamic GZIP compressed web page content. If your webhost supports GZIP ( all can ) you will see in your log files that the byte count for each file read is typically reduced by a factor of 4. Only 6% of the webhosts support this free dynamic GZIP compression functionality which cuts Internet bandwidth usage by a factor of 4.
Googlebot could crawl your site 4X faster with 4X less bandwidth usage if your site supports GZIP. So few sites support GZIP that Google has no real incentive to switch over to a faster crawler. When properly set up GZIP would speed up many, many websites.
HTTP 1.1 does *not* request gzip encoding by default or as part of the protocol behavior. Googlebot is, however, sending a Content-Encoding header with its request that specifies gzip encoding is acceptable.
Googlebot is also devouring my site.
Could this portend a serp update or a PR update?
Thanks critter for the specifics clarification.
There have been several questions regarding the differences of the two BOTS and GZIP compression is a big difference that few take advantage of.
To date the optional GZIP request correlates with Google bots indication of HTTP 1.1 protocol.
When Googlebot uses HTTP 1.0 protocol it is definitely not requesting GZIP compressed content.
The protocol indicator 1.0/1.1 is almost adjacent to the page size in bytes so it's very convenient to use as a "GZIP" flag when reviewing your logs.
Even though this capability is available in virtually all web server software, only about 6% of all web hosts and therefore webmasters support this virtually free 4X performance improving and 4X bandwidth reducing technology.
As a 56K modem user, I'd sure like to see dynamic GZIP compression fully supported. Webmaster World unfortunately does not GZIP, Google does for SERPS.
would the pages crawled by the Mozilla googlebot be indexed if they were nt crawled by the other googlebot?
Ok - something is going on with GoogleBot. For my site its normal they try to catch up crawking at the end of the month as they always begin very slow. But this time they seem to have some more punch as reported here. Lets see what it means... As assumed otherwise they seem to have been short in computing power last year. Maybe they put up some more clusters...
But to relate all our hopes here a little bit, Google is really sick in many ways this time (SandBox, overating links - link farm impact, hilltop oligarchy, big sites oligarchy, 2x32...). And when they are not able to solve this out in some way they will drive it to the wall.
It would be time to put in some cure now.
Or they at least should abandom their ugly sandbox.
Simply search again your keyword with 13x -adfs and see how good SERPS could be...
So why are the results difference by adding 13x -adfs then by normal searching? Is there a thread that I can read that explains this theory? I just tried it and for my keword term I am ranked 25th without the 13x -adfs and 1st with them which is where I was prior to doing a 301 redirect.
|SandBox, overating links - link farm impact, hilltop oligarchy, big sites oligarchy, 2x32 |
also: 301 redirects not working, 302 page hijacking...
It would be nice to make some sort of wishlist.
Who knows? maybe Google would listen.
|would the pages crawled by the Mozilla googlebot be indexed if they were not crawled by the other googlebot? |
No. Not in the "public" index.
I made it!
Effective of January 29, I am now listed as #1 for my most important keyword. Before the recent deep crawl, I was the runner-up for almost a year, with an on-topic non commercial site being #1.
I need to check other SERPS, but it seems the recent deep crawl finds its way into the results.
Is there any software I can use to see Googlebot crawling on my site live?
Assuming that you are running Apache on Linux, the most easiest solution is to open a terminal window and then issue the command
tail -f access.log ¦ grep -i googlebot
There are more sophisticated solutions though. Some log analysis tools can do "live stats" for example. There are some CRM packages which offer website-visitor-chat-functionality, which give you a live view on your sites visitors. But these tools usually exclude spiders.
I personally use a tool called "What's on?", which monitors all of my sites current visitors and which I have constantly open. It's the only one I found to do this stuff and it has a few bugs and glitches especially when it comes to DNS grouping and geotargetting. Probably there are other tools as well.
The tool's name is actually "Who's on", and not "What's on".
Can anyone tell me in laymans terms what they think 1000 Hits from google relates to in the number of pages it will cash?.
How many google hits to the average page of text?
| This 55 message thread spans 2 pages: < < 55 ( 1  ) |