homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
When Your Host Toasts Your Domains
GBot knocks, but can't get in...
PatrickDeese




msg:668686
 9:08 pm on Aug 20, 2004 (gmt 0)

I have over 100 domains (belonging to clients and myself)hosted at what was perceived to be a good and reliable hosting company (and recommended by a number of WW'ers) - however - suddenly every site on their server's is "drying up" in Google - tons of URL only listings - pages not cached, and sites with Dmoz/G Directory listings not showing the directory icon in the GTB...

First pass with their tech support: "Maybe google is banning your domains". I say I don't think so - I have 90+ other domains hosted with other companies, and they are a-okay.

Second pass with their tech support: "It's not us - we're not blocking anyone". So I write webmaster@google.com - and they say that the domains I mentioned are not penalized - but weren't accessible in the last crawl(s).

Third pass with their tech support: "Hey Google says they can't crawl sites hosted by you". Tech support doesn't believe me - so I write back to G and say - pretty please - can you check example.com via a crawler IP - my host says they aren't doing anything to block Google. Google says - something is blocking gBot.

Is it time to move these domains - are these guys being coy with me - or are they clueless.

My guesstimate is that in total, so far, at least 2500 - 3000 pages from 100+ domains have been dropped due to this issue.

How long do you give a host to make things right?

 

SEOMike




msg:668687
 9:13 pm on Aug 20, 2004 (gmt 0)

Wow. With this scale of incompetence, and that big of a loss in the SEs you'd better get out before it's too late. They are costing you a TON of money! If Google says they can't get in, you need to get out.

Try doing a little research and finding another client on their servers... see if they have any Google listings. If not, then you have 100% answered the LAST question I would ask before pulling the plug.

Good luck!

bhartzer




msg:668688
 9:28 pm on Aug 20, 2004 (gmt 0)

Time to get a new host.

Hosts have been doing this lately to save their bandwidth--and as I remember a very popular host/registrar was caught doing this recently.

PatrickDeese




msg:668689
 9:32 pm on Aug 20, 2004 (gmt 0)

Yeah - I remember that thread [webmasterworld.com] all too well.

Ironically - I moved several domains from that company to this company specifically because of this issue.

I am biting the bullet and moving my sites.

Span




msg:668690
 9:35 pm on Aug 20, 2004 (gmt 0)

Maybe you can see what's happening with [wannabrowser.com ]?

PatrickDeese




msg:668691
 10:17 pm on Aug 20, 2004 (gmt 0)

> Maybe you can see what's happening ....

Well - that site doesn't seem to be able spoof Googlebot - at any rate I changed the UA w/firefox and the site was fine - so if it is them - I can only assume that it is a block on the IP range.

jdMorgan




msg:668692
 12:44 am on Aug 21, 2004 (gmt 0)

> Well - that site doesn't seem to be able spoof Googlebot

WB can spoof any user-agent; Just cut-n-paste any user-agent string from your log file into the user-agent box in the form. The pull-down menu is just for convenience, you are not restricted to using it.

I'd like to know if it *is* an IP-range block, a user-agent block, or if they are cloaking your robots.txt

Jim

Marcia




msg:668693
 12:57 am on Aug 21, 2004 (gmt 0)

Patrick, are you just dealing with first_tier support? If so, make sure and ask for tier_two people if it's a big company.

I'd sooner take Google's word for it that the bot's being blocked than anyone at the host.

GoogleGuy




msg:668694
 1:04 am on Aug 21, 2004 (gmt 0)

I doubt that this problem is user-agent based; the issue appears to be just for your webhost and just from known-Googlebot IP ranges. PatrickDeese, is your site hosted by BlueGravity? I asked someone from our crawl team to try crawling from a known-Googlebot IP vs. a not-as-well-known Googlebot IP. The known-as-a-crawler IP couldn't fetch your pages but the not-well-known-as-Googlebot IP could, which does lend support to your hypothesis.

The issue doesn't appear to be with your sites and doesn't appear to be with Googlebot, but I couldn't say for sure where in the middle the problem might be. Could be the webhost you're using, or it could be upstream of them. PatrickDeese, if you hear of other sites besides yours running into this, would you pass them on? Thanks for mentioning this; I'll keep my ears open for similar reports.

PatrickDeese




msg:668695
 1:20 am on Aug 21, 2004 (gmt 0)

I was talking to a new hosting company today and he suggested that perhaps they had added Google's IPs to a "bad bot" list by accident.

> first-tier

It's a small company - I was speaking to the vice president :o

---

Googleguy - I appreciate you checking in - I think that there is something gravely wrong with their set up - frankly I am not too interested in waiting for them to fix it - I spent an hour on the phone with another hosting company and they're going to help me get my sites (and my clients' sites) off of that host and somewhere else ASAP.

Thanks again for all the advice - it is definitely an IP thing because I was getting 200's no matter what UA i tried.

GoogleGuy




msg:668696
 3:34 am on Aug 21, 2004 (gmt 0)

Yup, I doubt it would be something too intentional. Could be a misconfigured file, could be that someone thought that our IP loaded pages too heavily, could be someone upstream from the webhost in some way. I'm glad you mentioned it though; if I find out anything on my end I'll mention it.

robertito62




msg:668697
 3:47 am on Aug 21, 2004 (gmt 0)

> How long do you give a host to make things right?

This is not the issue.
The lesson here is NEVER EVER EVER to have 100 domains in one server or one host or one country or one galaxy.

It is not about deciding whether a host is good, bad or cute. It is not about giving him more time, or giving him another call or a slap in the wrist.

It is about managing your business professionally, which means:

Rule #1: COVER YOUR A_S

There are no rules #2 or #3.

funandgames




msg:668698
 4:33 pm on Sep 14, 2004 (gmt 0)

Check the Google cached page and see if it is up to date. If it is you are being spidered.

mipapage




msg:668699
 4:48 pm on Sep 14, 2004 (gmt 0)

PatrickDeese,

Could you please sticky me (your box is full) with the name of this company? I went with a company that you use a while back and am now having some trouble with a new site getting indexed.

It may be a 'sandbox' thing or maybe not. Info from you would help me to pinpoint the problem. Thanks.

PatrickDeese




msg:668700
 5:33 pm on Sep 14, 2004 (gmt 0)

> Check the Google cached page and see if it is up to date. If it is you are being spidered.

Umm. Yeah, no kidding.

As an example, I had a 785 page site that was fully indexed & spidered - and every page went URL only until I moved it to my new hosting company - now unfortunately I only have about 120 pages re-indexed.

I can only hope that the site gets reindexed in time for the holiday shopping.

iThink




msg:668701
 6:07 pm on Sep 14, 2004 (gmt 0)

PatrickDeese, is your site hosted by BlueGravity?

PatrickDeese, is that correct? or is any other hosting company?

PatrickDeese




msg:668702
 6:35 pm on Sep 14, 2004 (gmt 0)

PatrickDeese, is your site hosted by BlueGravity?

PatrickDeese, is that correct? or is any other hosting company?

Yes - the sites were formerly hosted with them.

internetheaven




msg:668703
 7:48 pm on Sep 14, 2004 (gmt 0)

Okay, now I've got something else to be paranoid about - better go wake up my account manager at my hosting company. I've got all three of my servers with them so I'll be damned if I'm going to sit here and wait to see if this could possibly happen .....

iJeep




msg:668704
 7:53 pm on Sep 14, 2004 (gmt 0)

My experience has told me to switch anytime there is a major problem or a minor reoccuring problem. Either scenario proves that the host is not up to the standards you have for them.

For example a host I was with a year ago switched to VDS system. After they switched my account over, the database would crash at least once a day. I was told they didn't know what was causing the crashes, but it was my fault even though I hadn't changed anything.

Case #2 my last host got bought out. They disabled the support system and said they enabled me on the new one, but didn't so I didn't have a way to submit a support ticket. In addition I have been told I need to change hosting plans. Not because I am using too much disc space or bandwidth, but because my site gets too many hits. Finally, their servers have been crashing and running at 15% load on a normal basis since the aquizition.

The main thing is to make the site flexable where you can change one file that has all of your includes directories and database commands in it. By doing that all you have to do is upload everything and change just that one file to get running again.

We'll see how this new host does...I'm sure it will be fine until they decide to change something that doesn't need to be changed.

zomega42




msg:668705
 9:52 pm on Sep 14, 2004 (gmt 0)

Check your logs. Is googlebot trying to access your pages? I bet gbot is getting a 406 error because your host screwed up their settings. This happened to me recently. Ask your host if they recently upgraded to Coldfusion MX by any chance.

If you REALLY want to see what googlebot sees, changing your user-agent isn't good enough. You also have to change your http ACCEPT header to the same as googlebot's. You can do this in opera but not IE as far as I know.

Incidentally this could also explain GoogleGuys response. GG might have crawled once with the old version of GoogleBot and once with the new version, and the two versions handle HTTP ACCEPT headers differently, so one would work and the other would not.

mipapage




msg:668706
 10:25 pm on Sep 14, 2004 (gmt 0)

zomega42,

can you explain the details of doing this with Opera, or is it fairly complicated?

I have one site with them that is being a bit troublesome, and no logs available yet (though I do have a little script set up to tell me if gbot visits...).

PatrickDeese




msg:668707
 10:45 pm on Sep 14, 2004 (gmt 0)

Check your logs. Is googlebot trying to access your pages? I bet gbot is getting a 406 error because your host screwed up their settings.

If you read Googleguy's post - you will see that googlebot could successfully crawl a site hosted by them when it used one of their "unknown" IPs - but was blocked whenever it used a "traditional" gBot IP address.

msg 9: I asked someone from our crawl team to try crawling from a known-Googlebot IP vs. a not-as-well-known Googlebot IP. The known-as-a-crawler IP couldn't fetch your pages but the not-well-known-as-Googlebot IP could, which does lend support to your hypothesis.

At any rate - Yahoo, MSN and et al were crawling the site fine.

My first warning sign was that Adsense switched to the alternate ads for most of the sites - apparently they even blocked MediaBot initially - then after a day or two, the ads came back.

Then the index pages started showing up with no cache, and URL only.

zomega42




msg:668708
 1:06 am on Sep 15, 2004 (gmt 0)

Yes PatrickDeese, I did read GGs post, but I was suggesting that he might have inadvertantly used two different googlebots. No need for the knock, just trying to help here.

The googlebot with user-agent GoogleBot/2.1(...) caused 406 errors on my site, while the one with user-agent Mozilla (compatible; GoogleBot/2.1 ...) does not cause errors. The two bots use different accept headers.

It was only a suggestion anyway, but still, you should check your logs for 406 errors. What you're seeing is completely consistent with the problem I had, including GG's comments.

Mipapage -- to diagnose this problem with Opera, open your Opera6.ini file or whatever it's called, and under [Adv User Prefs] add a new line that says "HTTP Accept=text/html,text/plain,application/*" or whatever you want your accept header to be. I'm not sure exactly what header googlebot uses but this one here was good enough to find the problem on my server.

mipapage




msg:668709
 2:11 pm on Sep 15, 2004 (gmt 0)

zomega42,

Thanks, all good on my end. Same with adsense, I put up some ads and media-bot came right along...

sh0ck




msg:668710
 3:51 pm on Sep 15, 2004 (gmt 0)

Hey guys this is Tim from Blue Gravity. I have been getting reports of this issue from a few customers and really want to resolve it. I feel that this has to be someone upstream from us as we do not block google, and have many customers whos sites are spidered just fine.

If I could please get an email or post of the ips in question and perhaps a googlebot ip that this is known to NOT work with I will do additional searching to determine who exactly along our path is blocking the spider and fix it.

This is not normal or being taken lightly by myself or anyone here at Blue Gravity and I will put this issue to rest. My sincerest apologies for letting this continue at all.

--tim
tim@bluegravity.com

DaveAtIFG




msg:668711
 4:31 pm on Sep 15, 2004 (gmt 0)

Moderator's note: Normally sh0ck's/Tim's email address would be edited out of his post but this is an important issue to many, so I'm "looking the other way." Your cooperation would be appreciated. If you have a report for Tim, please DO NOT include IPs or other specifics in this thread, use the email adress Tim provided. Thanks! :)

mipapage




msg:668712
 4:33 pm on Sep 15, 2004 (gmt 0)

DaveAtIFG,

Thanks for that.

CritterNYC




msg:668713
 4:43 pm on Sep 15, 2004 (gmt 0)

PatrickDeese,

Really sorry to hear about this. At least Google was cool about helping you figure it out.

Since the host *is* blocking the IP and either (a) can't figure out why or (b) is lying about it... I'd say a quick switch to a new host is definitely in order. I think you'll be up, mostly, for the November/December rush if you do it now, though.

If you need hosting recommendations, I'm sure lots of folks here would be happy to offer up our own. I know my host of 4 years has never had a problem with Googlebot.

Regards,
John

PatrickDeese




msg:668714
 8:14 pm on Sep 15, 2004 (gmt 0)

> If you need hosting recommendations

No, thanks.

I have a number of hosting companies that I work with, but I had about half of my domains with BG. Now no host has more than about 20% of my sites.

I still have loads of sites with single companies - but it just isn't feasible to have 200+ domains with 200 different companies.

I appreciate Tim's response - but he lost my business. I started reporting this issue a month ago and lost hundreds of dollars of business because of it, and will continue to lose business until my sites are reindexed by Google - which will take weeks or months for some of the largest sites.

[edited by: PatrickDeese at 8:21 pm (utc) on Sep. 15, 2004]

martinibuster




msg:668715
 8:21 pm on Sep 15, 2004 (gmt 0)

This is not normal or being taken lightly by myself or anyone here at Blue Gravity...

I find it disturbing that a webhost should be made aware of a problem and a month later has yet to identify what the problem is, much less resolved it.

The web host was put on notice about this a month ago. Whether BG were blocking Googlebot or it was being blocked upstream is besides the point. The point is that the issue was not resolved in a timely manner.

I was considering hosting with Blue Gravity at the time this fiasco happened, and I called them myself to find out if they were resolving this. They told me there may have been a bandwidth saving measure in place, by blocking GoogleBot.

They offered to email me when this issue was identified and resolved. To date I have not received any communication that the issue has been identified or resolved. Did someone drop a ball?

I have since opened hosting accounts elsewhere.

This 39 message thread spans 2 pages: 39 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved