Forum Moderators: open
This year about the end of August, my site was down and I had a discussion with a person at the hosting company who indicated that the server was down because of a foreign spider. He indicated that he blocked the spider and the server was restored to normal operation.
This got me thinking about the Google problem so I asked if he had blocked Googlebot. To my astonishment, the hosting company was blocking Googlebot because it had caused server performance issues. This completely shocked me. I've never heard of a hosting company blocking a spider so important to most, if not all, of the 200 sites on this server. Their thinking is that it's best to have the server up and running... apparently at the expense of killing half (or more) of traffic going to all of these sites.
My site was moved to a new server, and within days, it was back in Google's index. Almost a year of lost traffic and sales resulted because of the hosting company.
If you are having troubles getting into Google's index or have fallen out for some unknown reason, you might want to check with your hosting company to see if they are blocking spiders. I know there are at least 199 other web sites that are blocked from Google because of this hosting company.
We knew the importance of Google, of course,... so we rebooted the server when ever the load got to heavy... this cleared the millions of shopping carts... but did not kick googlebot out.
It was a little suprising since this is the first month we have ever had a problem.
JP
Allthough i absolutely agree that it's somehow also your own fault if you don't notice it, i would blame my hoster for blocking bots without letting me and others now about it.
Fortunately i have my own dedicated network.
>follow any of the links that add items to a cart
erm, yes that'd be silly if you'd allow this. ;)
----------------------------------------
Address: crawl16.googlebot.com
Browser: Googlebot 2.1 (http:/www.googlebot.com/bot.html)
Protocol: HTTP/1.0GET 20.75k /index.cfm
Anyone reading this thread who can't access the raw log file on their server should start looking for a new hosting company now. If you were looking at the actual log file, you would have been able to see how the server responded to the request.
My guess is they were serving Google a 403 everytine they requested a page.
2003-08-02 13:01:03 64.68.85.28 W3SVC69 80 GET /index.cfm - 200 21231 188 Googlebot/2.1+(+http://www.googlebot.com/bot.html) -
I just thought the WebTrends report was more legible.
I don't exactly know how they where blocking Googlebot, but they said they were. They moved the site to a different server (yes... same hosting company until after the end of the year... too busy now) and the site was back in Google's index.
I know nothing about server administation. Maybe someone here can explain how they could do this.
I mean Googlebot is very active on our servers but does not cause any bandwidth issues and if it were to, I would rather cut down the users on the server in question rather than block our good friend! Now I have heard it all.. <G>
A junior tech at any hosting company without background info might be in a position to block any SE spider almost on a whim. Especially as there are plenty of spiders out there that need that sort of treatment.
So it makes it vitally important to webmasters to ensure that they are checking their logs to see that the visits are being responded to correctly.
SetEnvIf User-Agent ^Googlebot keep_out
deny from 64.68.85.*
deny from env=keep_out
and he may have used a sub categories in your directories.
Get a linux dedicated server with a lot of storage space for a logfile.