Wow. With this scale of incompetence, and that big of a loss in the SEs you'd better get out before it's too late. They are costing you a TON of money! If Google says they can't get in, you need to get out.
Try doing a little research and finding another client on their servers... see if they have any Google listings. If not, then you have 100% answered the LAST question I would ask before pulling the plug.
Time to get a new host.
Hosts have been doing this lately to save their bandwidth--and as I remember a very popular host/registrar was caught doing this recently.
Yeah - I remember that thread [webmasterworld.com] all too well.
Ironically - I moved several domains from that company to this company specifically because of this issue.
I am biting the bullet and moving my sites.
Maybe you can see what's happening with [wannabrowser.com ]?
> Maybe you can see what's happening ....
Well - that site doesn't seem to be able spoof Googlebot - at any rate I changed the UA w/firefox and the site was fine - so if it is them - I can only assume that it is a block on the IP range.
> Well - that site doesn't seem to be able spoof Googlebot
WB can spoof any user-agent; Just cut-n-paste any user-agent string from your log file into the user-agent box in the form. The pull-down menu is just for convenience, you are not restricted to using it.
I'd like to know if it *is* an IP-range block, a user-agent block, or if they are cloaking your robots.txt
Patrick, are you just dealing with first_tier support? If so, make sure and ask for tier_two people if it's a big company.
I'd sooner take Google's word for it that the bot's being blocked than anyone at the host.
I doubt that this problem is user-agent based; the issue appears to be just for your webhost and just from known-Googlebot IP ranges. PatrickDeese, is your site hosted by BlueGravity? I asked someone from our crawl team to try crawling from a known-Googlebot IP vs. a not-as-well-known Googlebot IP. The known-as-a-crawler IP couldn't fetch your pages but the not-well-known-as-Googlebot IP could, which does lend support to your hypothesis.
The issue doesn't appear to be with your sites and doesn't appear to be with Googlebot, but I couldn't say for sure where in the middle the problem might be. Could be the webhost you're using, or it could be upstream of them. PatrickDeese, if you hear of other sites besides yours running into this, would you pass them on? Thanks for mentioning this; I'll keep my ears open for similar reports.
I was talking to a new hosting company today and he suggested that perhaps they had added Google's IPs to a "bad bot" list by accident.
It's a small company - I was speaking to the vice president :o
Googleguy - I appreciate you checking in - I think that there is something gravely wrong with their set up - frankly I am not too interested in waiting for them to fix it - I spent an hour on the phone with another hosting company and they're going to help me get my sites (and my clients' sites) off of that host and somewhere else ASAP.
Thanks again for all the advice - it is definitely an IP thing because I was getting 200's no matter what UA i tried.
Yup, I doubt it would be something too intentional. Could be a misconfigured file, could be that someone thought that our IP loaded pages too heavily, could be someone upstream from the webhost in some way. I'm glad you mentioned it though; if I find out anything on my end I'll mention it.
> How long do you give a host to make things right?
This is not the issue.
The lesson here is NEVER EVER EVER to have 100 domains in one server or one host or one country or one galaxy.
It is not about deciding whether a host is good, bad or cute. It is not about giving him more time, or giving him another call or a slap in the wrist.
It is about managing your business professionally, which means:
Rule #1: COVER YOUR A_S
There are no rules #2 or #3.
Check the Google cached page and see if it is up to date. If it is you are being spidered.
Could you please sticky me (your box is full) with the name of this company? I went with a company that you use a while back and am now having some trouble with a new site getting indexed.
It may be a 'sandbox' thing or maybe not. Info from you would help me to pinpoint the problem. Thanks.
> Check the Google cached page and see if it is up to date. If it is you are being spidered.
Umm. Yeah, no kidding.
As an example, I had a 785 page site that was fully indexed & spidered - and every page went URL only until I moved it to my new hosting company - now unfortunately I only have about 120 pages re-indexed.
I can only hope that the site gets reindexed in time for the holiday shopping.
|PatrickDeese, is your site hosted by BlueGravity? |
PatrickDeese, is that correct? or is any other hosting company?
|PatrickDeese, is your site hosted by BlueGravity? |
PatrickDeese, is that correct? or is any other hosting company?
Yes - the sites were formerly hosted with them.
Okay, now I've got something else to be paranoid about - better go wake up my account manager at my hosting company. I've got all three of my servers with them so I'll be damned if I'm going to sit here and wait to see if this could possibly happen .....
My experience has told me to switch anytime there is a major problem or a minor reoccuring problem. Either scenario proves that the host is not up to the standards you have for them.
For example a host I was with a year ago switched to VDS system. After they switched my account over, the database would crash at least once a day. I was told they didn't know what was causing the crashes, but it was my fault even though I hadn't changed anything.
Case #2 my last host got bought out. They disabled the support system and said they enabled me on the new one, but didn't so I didn't have a way to submit a support ticket. In addition I have been told I need to change hosting plans. Not because I am using too much disc space or bandwidth, but because my site gets too many hits. Finally, their servers have been crashing and running at 15% load on a normal basis since the aquizition.
The main thing is to make the site flexable where you can change one file that has all of your includes directories and database commands in it. By doing that all you have to do is upload everything and change just that one file to get running again.
We'll see how this new host does...I'm sure it will be fine until they decide to change something that doesn't need to be changed.
Check your logs. Is googlebot trying to access your pages? I bet gbot is getting a 406 error because your host screwed up their settings. This happened to me recently. Ask your host if they recently upgraded to Coldfusion MX by any chance.
If you REALLY want to see what googlebot sees, changing your user-agent isn't good enough. You also have to change your http ACCEPT header to the same as googlebot's. You can do this in opera but not IE as far as I know.
Incidentally this could also explain GoogleGuys response. GG might have crawled once with the old version of GoogleBot and once with the new version, and the two versions handle HTTP ACCEPT headers differently, so one would work and the other would not.
can you explain the details of doing this with Opera, or is it fairly complicated?
I have one site with them that is being a bit troublesome, and no logs available yet (though I do have a little script set up to tell me if gbot visits...).
|Check your logs. Is googlebot trying to access your pages? I bet gbot is getting a 406 error because your host screwed up their settings. |
If you read Googleguy's post - you will see that googlebot could successfully crawl a site hosted by them when it used one of their "unknown" IPs - but was blocked whenever it used a "traditional" gBot IP address.
|msg 9: I asked someone from our crawl team to try crawling from a known-Googlebot IP vs. a not-as-well-known Googlebot IP. The known-as-a-crawler IP couldn't fetch your pages but the not-well-known-as-Googlebot IP could, which does lend support to your hypothesis. |
At any rate - Yahoo, MSN and et al were crawling the site fine.
My first warning sign was that Adsense switched to the alternate ads for most of the sites - apparently they even blocked MediaBot initially - then after a day or two, the ads came back.
Then the index pages started showing up with no cache, and URL only.
Yes PatrickDeese, I did read GGs post, but I was suggesting that he might have inadvertantly used two different googlebots. No need for the knock, just trying to help here.
The googlebot with user-agent GoogleBot/2.1(...) caused 406 errors on my site, while the one with user-agent Mozilla (compatible; GoogleBot/2.1 ...) does not cause errors. The two bots use different accept headers.
It was only a suggestion anyway, but still, you should check your logs for 406 errors. What you're seeing is completely consistent with the problem I had, including GG's comments.
Mipapage -- to diagnose this problem with Opera, open your Opera6.ini file or whatever it's called, and under [Adv User Prefs] add a new line that says "HTTP Accept=text/html,text/plain,application/*" or whatever you want your accept header to be. I'm not sure exactly what header googlebot uses but this one here was good enough to find the problem on my server.
Thanks, all good on my end. Same with adsense, I put up some ads and media-bot came right along...
Hey guys this is Tim from Blue Gravity. I have been getting reports of this issue from a few customers and really want to resolve it. I feel that this has to be someone upstream from us as we do not block google, and have many customers whos sites are spidered just fine.
If I could please get an email or post of the ips in question and perhaps a googlebot ip that this is known to NOT work with I will do additional searching to determine who exactly along our path is blocking the spider and fix it.
This is not normal or being taken lightly by myself or anyone here at Blue Gravity and I will put this issue to rest. My sincerest apologies for letting this continue at all.
Moderator's note: Normally sh0ck's/Tim's email address would be edited out of his post but this is an important issue to many, so I'm "looking the other way." Your cooperation would be appreciated. If you have a report for Tim, please DO NOT include IPs or other specifics in this thread, use the email adress Tim provided. Thanks! :)
Thanks for that.
Really sorry to hear about this. At least Google was cool about helping you figure it out.
Since the host *is* blocking the IP and either (a) can't figure out why or (b) is lying about it... I'd say a quick switch to a new host is definitely in order. I think you'll be up, mostly, for the November/December rush if you do it now, though.
If you need hosting recommendations, I'm sure lots of folks here would be happy to offer up our own. I know my host of 4 years has never had a problem with Googlebot.
> If you need hosting recommendations
I have a number of hosting companies that I work with, but I had about half of my domains with BG. Now no host has more than about 20% of my sites.
I still have loads of sites with single companies - but it just isn't feasible to have 200+ domains with 200 different companies.
I appreciate Tim's response - but he lost my business. I started reporting this issue a month ago and lost hundreds of dollars of business because of it, and will continue to lose business until my sites are reindexed by Google - which will take weeks or months for some of the largest sites.
[edited by: PatrickDeese at 8:21 pm (utc) on Sep. 15, 2004]
|This is not normal or being taken lightly by myself or anyone here at Blue Gravity... |
I find it disturbing that a webhost should be made aware of a problem and a month later has yet to identify what the problem is, much less resolved it.
The web host was put on notice about this a month ago. Whether BG were blocking Googlebot or it was being blocked upstream is besides the point. The point is that the issue was not resolved in a timely manner.
I was considering hosting with Blue Gravity at the time this fiasco happened, and I called them myself to find out if they were resolving this. They told me there may have been a bandwidth saving measure in place, by blocking GoogleBot.
They offered to email me when this issue was identified and resolved. To date I have not received any communication that the issue has been identified or resolved. Did someone drop a ball?
I have since opened hosting accounts elsewhere.
| This 39 message thread spans 2 pages: 39 (  2 ) > > |