Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: phranque
First pass with their tech support: "Maybe google is banning your domains". I say I don't think so - I have 90+ other domains hosted with other companies, and they are a-okay.
Second pass with their tech support: "It's not us - we're not blocking anyone". So I write firstname.lastname@example.org - and they say that the domains I mentioned are not penalized - but weren't accessible in the last crawl(s).
Third pass with their tech support: "Hey Google says they can't crawl sites hosted by you". Tech support doesn't believe me - so I write back to G and say - pretty please - can you check example.com via a crawler IP - my host says they aren't doing anything to block Google. Google says - something is blocking gBot.
Is it time to move these domains - are these guys being coy with me - or are they clueless.
My guesstimate is that in total, so far, at least 2500 - 3000 pages from 100+ domains have been dropped due to this issue.
How long do you give a host to make things right?
Try doing a little research and finding another client on their servers... see if they have any Google listings. If not, then you have 100% answered the LAST question I would ask before pulling the plug.
WB can spoof any user-agent; Just cut-n-paste any user-agent string from your log file into the user-agent box in the form. The pull-down menu is just for convenience, you are not restricted to using it.
I'd like to know if it *is* an IP-range block, a user-agent block, or if they are cloaking your robots.txt
The issue doesn't appear to be with your sites and doesn't appear to be with Googlebot, but I couldn't say for sure where in the middle the problem might be. Could be the webhost you're using, or it could be upstream of them. PatrickDeese, if you hear of other sites besides yours running into this, would you pass them on? Thanks for mentioning this; I'll keep my ears open for similar reports.
It's a small company - I was speaking to the vice president :o
Googleguy - I appreciate you checking in - I think that there is something gravely wrong with their set up - frankly I am not too interested in waiting for them to fix it - I spent an hour on the phone with another hosting company and they're going to help me get my sites (and my clients' sites) off of that host and somewhere else ASAP.
Thanks again for all the advice - it is definitely an IP thing because I was getting 200's no matter what UA i tried.
This is not the issue.
The lesson here is NEVER EVER EVER to have 100 domains in one server or one host or one country or one galaxy.
It is not about deciding whether a host is good, bad or cute. It is not about giving him more time, or giving him another call or a slap in the wrist.
It is about managing your business professionally, which means:
Rule #1: COVER YOUR A_S
There are no rules #2 or #3.
Could you please sticky me (your box is full) with the name of this company? I went with a company that you use a while back and am now having some trouble with a new site getting indexed.
It may be a 'sandbox' thing or maybe not. Info from you would help me to pinpoint the problem. Thanks.
Umm. Yeah, no kidding.
As an example, I had a 785 page site that was fully indexed & spidered - and every page went URL only until I moved it to my new hosting company - now unfortunately I only have about 120 pages re-indexed.
I can only hope that the site gets reindexed in time for the holiday shopping.
For example a host I was with a year ago switched to VDS system. After they switched my account over, the database would crash at least once a day. I was told they didn't know what was causing the crashes, but it was my fault even though I hadn't changed anything.
Case #2 my last host got bought out. They disabled the support system and said they enabled me on the new one, but didn't so I didn't have a way to submit a support ticket. In addition I have been told I need to change hosting plans. Not because I am using too much disc space or bandwidth, but because my site gets too many hits. Finally, their servers have been crashing and running at 15% load on a normal basis since the aquizition.
The main thing is to make the site flexable where you can change one file that has all of your includes directories and database commands in it. By doing that all you have to do is upload everything and change just that one file to get running again.
We'll see how this new host does...I'm sure it will be fine until they decide to change something that doesn't need to be changed.
If you REALLY want to see what googlebot sees, changing your user-agent isn't good enough. You also have to change your http ACCEPT header to the same as googlebot's. You can do this in opera but not IE as far as I know.
Incidentally this could also explain GoogleGuys response. GG might have crawled once with the old version of GoogleBot and once with the new version, and the two versions handle HTTP ACCEPT headers differently, so one would work and the other would not.
Check your logs. Is googlebot trying to access your pages? I bet gbot is getting a 406 error because your host screwed up their settings.
If you read Googleguy's post - you will see that googlebot could successfully crawl a site hosted by them when it used one of their "unknown" IPs - but was blocked whenever it used a "traditional" gBot IP address.
msg 9: I asked someone from our crawl team to try crawling from a known-Googlebot IP vs. a not-as-well-known Googlebot IP. The known-as-a-crawler IP couldn't fetch your pages but the not-well-known-as-Googlebot IP could, which does lend support to your hypothesis.
At any rate - Yahoo, MSN and et al were crawling the site fine.
My first warning sign was that Adsense switched to the alternate ads for most of the sites - apparently they even blocked MediaBot initially - then after a day or two, the ads came back.
Then the index pages started showing up with no cache, and URL only.
The googlebot with user-agent GoogleBot/2.1(...) caused 406 errors on my site, while the one with user-agent Mozilla (compatible; GoogleBot/2.1 ...) does not cause errors. The two bots use different accept headers.
It was only a suggestion anyway, but still, you should check your logs for 406 errors. What you're seeing is completely consistent with the problem I had, including GG's comments.
Mipapage -- to diagnose this problem with Opera, open your Opera6.ini file or whatever it's called, and under [Adv User Prefs] add a new line that says "HTTP Accept=text/html,text/plain,application/*" or whatever you want your accept header to be. I'm not sure exactly what header googlebot uses but this one here was good enough to find the problem on my server.
If I could please get an email or post of the ips in question and perhaps a googlebot ip that this is known to NOT work with I will do additional searching to determine who exactly along our path is blocking the spider and fix it.
This is not normal or being taken lightly by myself or anyone here at Blue Gravity and I will put this issue to rest. My sincerest apologies for letting this continue at all.
Really sorry to hear about this. At least Google was cool about helping you figure it out.
Since the host *is* blocking the IP and either (a) can't figure out why or (b) is lying about it... I'd say a quick switch to a new host is definitely in order. I think you'll be up, mostly, for the November/December rush if you do it now, though.
If you need hosting recommendations, I'm sure lots of folks here would be happy to offer up our own. I know my host of 4 years has never had a problem with Googlebot.
I have a number of hosting companies that I work with, but I had about half of my domains with BG. Now no host has more than about 20% of my sites.
I still have loads of sites with single companies - but it just isn't feasible to have 200+ domains with 200 different companies.
I appreciate Tim's response - but he lost my business. I started reporting this issue a month ago and lost hundreds of dollars of business because of it, and will continue to lose business until my sites are reindexed by Google - which will take weeks or months for some of the largest sites.
[edited by: PatrickDeese at 8:21 pm (utc) on Sep. 15, 2004]
This is not normal or being taken lightly by myself or anyone here at Blue Gravity...
I find it disturbing that a webhost should be made aware of a problem and a month later has yet to identify what the problem is, much less resolved it.
The web host was put on notice about this a month ago. Whether BG were blocking Googlebot or it was being blocked upstream is besides the point. The point is that the issue was not resolved in a timely manner.
I was considering hosting with Blue Gravity at the time this fiasco happened, and I called them myself to find out if they were resolving this. They told me there may have been a bandwidth saving measure in place, by blocking GoogleBot.
They offered to email me when this issue was identified and resolved. To date I have not received any communication that the issue has been identified or resolved. Did someone drop a ball?
I have since opened hosting accounts elsewhere.