homepage Welcome to WebmasterWorld Guest from 54.226.235.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Knocked out of Google by hosting company
hutchins13




msg:161883
 7:26 pm on Oct 14, 2003 (gmt 0)

I had a site that dropped out of Google after the October 2002 update. I spent countless days... actually months... trying to determine the cause. I even emailed Google several times and always got the response that the site is not being penalized.

This year about the end of August, my site was down and I had a discussion with a person at the hosting company who indicated that the server was down because of a foreign spider. He indicated that he blocked the spider and the server was restored to normal operation.

This got me thinking about the Google problem so I asked if he had blocked Googlebot. To my astonishment, the hosting company was blocking Googlebot because it had caused server performance issues. This completely shocked me. I've never heard of a hosting company blocking a spider so important to most, if not all, of the 200 sites on this server. Their thinking is that it's best to have the server up and running... apparently at the expense of killing half (or more) of traffic going to all of these sites.

My site was moved to a new server, and within days, it was back in Google's index. Almost a year of lost traffic and sales resulted because of the hosting company.

If you are having troubles getting into Google's index or have fallen out for some unknown reason, you might want to check with your hosting company to see if they are blocking spiders. I know there are at least 199 other web sites that are blocked from Google because of this hosting company.

 

dnbjason




msg:161884
 3:57 pm on Oct 15, 2003 (gmt 0)

Man... people can be so stupid! There goes that hosting companies business.

Yidaki




msg:161885
 4:31 pm on Oct 15, 2003 (gmt 0)

>Almost a year of lost traffic and sales

Send them the bill - at least that's what i'd do definitely!

Shak




msg:161886
 5:20 pm on Oct 15, 2003 (gmt 0)

learn from your mistakes and move on.

Shak

jpavery




msg:161887
 5:43 pm on Oct 15, 2003 (gmt 0)

We have a stand-alone server... last month Googlebot brought our site to its knees... opening millions and millions of carts.

We knew the importance of Google, of course,... so we rebooted the server when ever the load got to heavy... this cleared the millions of shopping carts... but did not kick googlebot out.

It was a little suprising since this is the first month we have ever had a problem.

JP

WayneStPaul




msg:161888
 5:55 pm on Oct 15, 2003 (gmt 0)

Had a similar problem once upon a time. Now all of my e-commerce sites have a robots.txt that tells all spiders not to follow any of the links that add items to a cart (and thus create carts). If you have a large site I highly recomend this.

oilman




msg:161889
 5:59 pm on Oct 15, 2003 (gmt 0)

logfiles, logfiles, logfiles - you didn't notice googlebot in your logfiles for nearly a year and the blame is to be solely directed at your host?

dirkz




msg:161890
 6:10 pm on Oct 15, 2003 (gmt 0)

Googlebot is such a precious species, I can't believe they kill it.

Yidaki




msg:161891
 6:12 pm on Oct 15, 2003 (gmt 0)

>logfiles, logfiles, logfiles

Allthough i absolutely agree that it's somehow also your own fault if you don't notice it, i would blame my hoster for blocking bots without letting me and others now about it.

Fortunately i have my own dedicated network.

>follow any of the links that add items to a cart

erm, yes that'd be silly if you'd allow this. ;)

stuntdubl




msg:161892
 6:47 pm on Oct 15, 2003 (gmt 0)

I've had some hosting nightmares as well. Pay a little bit more and be sure the company "knows the deal". Also, always have a plan B to jump ship in case they start to sink. Don't ever expect a hosting company to be accountable for their actions either.

hutchins13




msg:161893
 6:48 pm on Oct 15, 2003 (gmt 0)

oilman, The page requests from Googlebot did show up in the logs every month.

----------------------------------------
Address: crawl16.googlebot.com
Browser: Googlebot 2.1 (http:/www.googlebot.com/bot.html)
Protocol: HTTP/1.0

GET 20.75k /index.cfm
----------------------------------------

Yidaki




msg:161894
 6:53 pm on Oct 15, 2003 (gmt 0)

>oilman, The page requests from Googlebot did show up in the logs every month.

Huh? Then it's not your hoster who caused the trouble. If the requests show in the logs, the pages HAVE BEEN fetched by googlebot.

oilman




msg:161895
 6:53 pm on Oct 15, 2003 (gmt 0)

>>the page requests from Googlebot did show up in the logs every month

ok - so Google was crawling your site? or was it just requesting the the index page and getting denied?

hutchins13




msg:161896
 6:56 pm on Oct 15, 2003 (gmt 0)

Apperently it was requesting the page and was then blocked.

plasma




msg:161897
 7:25 pm on Oct 15, 2003 (gmt 0)

We have a stand-alone server... last month Googlebot brought our site to its knees... opening millions and millions of carts.

Then you're doing something wrong.
Just don't open a cart if it's a spider :)
(And don't create SID for it)

hot_tubs




msg:161898
 7:48 pm on Oct 15, 2003 (gmt 0)

Wouldn't it be an easy fix to get rid of the text links for adding items to the basket and replacing them with regular buttons?

WebGuerrilla




msg:161899
 12:37 am on Oct 16, 2003 (gmt 0)

----------------------------------------
Address: crawl16.googlebot.com
Browser: Googlebot 2.1 (http:/www.googlebot.com/bot.html)
Protocol: HTTP/1.0

GET 20.75k /index.cfm

Anyone reading this thread who can't access the raw log file on their server should start looking for a new hosting company now. If you were looking at the actual log file, you would have been able to see how the server responded to the request.

My guess is they were serving Google a 403 everytine they requested a page.

seofreak




msg:161900
 2:21 am on Oct 16, 2003 (gmt 0)

OUCH! that's horibble stuff. That's why i love my host .. they inform about everything .. even if they are about to reboot.

hutchins13




msg:161901
 2:30 am on Oct 16, 2003 (gmt 0)

Here is the raw log file data:

2003-08-02 13:01:03 64.68.85.28 W3SVC69 80 GET /index.cfm - 200 21231 188 Googlebot/2.1+(+http://www.googlebot.com/bot.html) -

I just thought the WebTrends report was more legible.

I don't exactly know how they where blocking Googlebot, but they said they were. They moved the site to a different server (yes... same hosting company until after the end of the year... too busy now) and the site was back in Google's index.

I know nothing about server administation. Maybe someone here can explain how they could do this.

jady




msg:161902
 2:41 am on Oct 16, 2003 (gmt 0)

Thanks for the story! WOW! As a hosting firm employee here in Florida I can assure you that this is not normal nor right to do! Maybe this is a good thing for everyone to confirm with their host or prospective hosting firm - make sure they dont do stupid stuff!

I mean Googlebot is very active on our servers but does not cause any bandwidth issues and if it were to, I would rather cut down the users on the server in question rather than block our good friend! Now I have heard it all.. <G>

BlueSky




msg:161903
 3:53 am on Oct 16, 2003 (gmt 0)

That log entry shows a 200 not a 403 -- he pulled at least that page okay.

AthlonInside




msg:161904
 6:07 am on Oct 16, 2003 (gmt 0)

Look forward, don't look back!

You have pay a great price to learn this lesson, so work smarter for the next year to earn back double what you have lost!

IanTurner




msg:161905
 12:35 pm on Oct 16, 2003 (gmt 0)

The interesting thing is that as a hosting company I would think that I was perfectly justified in disallowing any IPs that I thought were abusing the network (I'm not saying I would do it to Googlebot, Slurp, Scooter or any other major SE spider - but thats only because I know about these things)

A junior tech at any hosting company without background info might be in a position to block any SE spider almost on a whim. Especially as there are plenty of spiders out there that need that sort of treatment.

So it makes it vitally important to webmasters to ensure that they are checking their logs to see that the visits are being responded to correctly.

jady




msg:161906
 1:17 pm on Oct 16, 2003 (gmt 0)

I am just curious if that hosting company has the server that THEIR website is on blocked by GoogleBot. Something is telling me that they dont.. :)

asinah




msg:161907
 4:24 pm on Oct 16, 2003 (gmt 0)

If you see actually a 200 response code I don't think they blocked it. If it is a UNIX box your provider may have programmed something like:

SetEnvIf User-Agent ^Googlebot keep_out
deny from 64.68.85.*
deny from env=keep_out

and he may have used a sub categories in your directories.

Get a linux dedicated server with a lot of storage space for a logfile.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved