|You can create Bing and Google Webmaster accounts and configure your domains to have a lower crawl-delay |
OMG, I just had to pinch myself and check the calendar to make sure I remember which year we are in. It's almost like saying "tell your visitors not to visit every day, our servers can't handle it" :)
It is the hosting. A Bad Gateway error means that there is a miscommunication between a proxy server and one of their servers. In other words, they don't give the proxy server the response it expects. In other words they're blocking access to your site.
|is the problem on my site or is it the hosting? |
The best plan of action is to check your Google Webmaster account (Crawl -> Crawl Stats section) to make sure there isn't a glitch on Google's side. Check "pages crawled per day" - considering the size of your site, is this number disproportionally high? If it is (I bet it's not), you can limit the crawl rate via the site settings (cogwheel icon). Similar functionality exists for Bing and Yandex.
Also check the raw stats system provided by your host. In most cases it's the Awstats and you can access it from your hosting control panel. Go to the Robots/Spiders section and again check for any unusually high numbers - bandwidth or number of hits.
They are usually the spam bots who cause problems. If there are a lot of unidentified bots in your Robots/Spiders section and they're consuming ridiculously high bandwidth, you'll have to block them via .htaccess.
The mainstream bots who identify themselves (including Yandex and Baidu) are unlikely to cause problems. If they do cause problems, you should be able to identify them via the Awstats.
My guess is: get a new host... but it's difficult to say without having seen the stats.
Yep, I only see this sort of thing on the cheaper hosts.
Hosting matters, folks. I went through SEVEN before I found one I liked, and you couldn't pry me off with a crowbar now. You don't have to pay a fortune for good hosting, but you probably won't get really good hosting for three bones a month.
Or possibly which planet we're on?
|You can create Bing and Google Webmaster accounts and configure your domains to have a lower crawl-delay. We also recommend configuring a robots.txt file. This will reduce the rate that crawlers initiate requests with your site |
Someone should tell this host that the googlebot explicitly ignores the "Crawl-Delay" directive in robots.txt. They say so in wmt. You can only change it by adding a wmt account-- and then putting your fingers in your ears and humming loudly while they slather on their don't-change-our-defaults recommendations.
|If you would like to reduce traffic from crawlers such as Yandex or Baidu these typically need to be done utilizing something in the nature of an .htaccess block. |
Again, it seems to have escaped the host's notice that Yandex also has a WMT. They also claim to honor the crawl-delay directive; I don't know if anyone with a large site has personally verified this. (Small sites like mine are irrelevant because search engines just don't crawl that often.) Also the "clean-param" directive which I admit I'd never heard of, but sounds like a great alternative to individual wmt settings.
I don't know if this is universal to all hosts. But server-level lockouts (in my case mod_security) come through as 503.
|probably won't get really good hosting for three bones a month. |
Netmeg, what's a "bone" going for these days? $10, $100
(I'm not being snarky, trying to figure out where our server rates at $450/month)
$450/month is what you're personally paying. But that's not the base rate for shared hosting anywhere in the world. Not even in Canada. How much would you be paying as a baseline? Say, if you got two visitors a day and didn't use any significant computing resources. If you don't know, look at the host's front page. It will say something like "hosting as low as..." (I don't know your host. But everyone says this.)
:: detour to check ::
Yikes. What exactly are you getting for that $450? My host's rates don't even go that high-- not even for dedicated physical servers as opposed to VPS-- unless it's hidden in some VIP area that I'm not allowed to see.
|Yikes. What exactly are you getting for that $450? |
managed, quad dual core processors, 12 gig ram, dedicated physical
Yea, I have an ecommerce client who pays $750/month. But that's way above and beyond most people's needs, obviously. But $3/month is too far in the other direction.
Thank you all for your helpful replies. I don't see anything unusual in my webmaster's accounts. But I have a high volume of Unknown robot (identified by 'bot*') I see in my Awstats account. What can I do to limit them?
The host recommends adding a caching mechanism to the code of my site. I'm not familiar with this and I don't want to do something I don't know, but is there anything I can do to block spam bots hitting my site?
From an SEO perspective, can getting a large number of unknown bots have a negative impact on my SEO in Google?
|can getting a large number of unknown bots have a negative impact on my SEO in Google? |
Can't see how-- unless your host is telling the truth and the requests are so overwhelming that it affects page speed as measured by major search engines.
That's assuming for the sake of discussion that the search-engine arm of the operation doesn't trade notes with the analytics arm. ("Hm, why are all those robots clustering around? There must be something shady going on.")
You said it's your own server, right? That means you've even got the option of blocking the most outrageous offenders via a firewall, which consumes fewer resources than a config-level 403. (Or at least it's supposed to, or else why bother with the extra level?)
:: wait, stop, rewind ::
If it's your own server, why is anyone talking about htaccess? There shouldn't be any. Except maybe temporarily if you're making major changes. Otherwise everything should be happening in the config file.
|If it's your own server, why is anyone talking about htaccess? There shouldn't be any. Except maybe temporarily if you're making major changes. Otherwise everything should be happening in the config file. |
I think you misread the same posts I did -- I thought the discussion about the dedicate hosting applied to the OP rather than being an <aside> to the main topic and posted something very similar to the above with more words, until I went back and reread -- Then realized I combined the OP and the <aside> when I said the .htaccess was the wrong place for the solution.
You're right. Whew. I was having a hard time working "htaccess" and "$450/month" into the same sentence.
Is anyone in this thread actually paying $3/month or is that just a figure netmeg threw out for illustrative purposes?
|I was having a hard time working "htaccess" and "$450/month" into the same sentence. |
LOL I was too! -- The TL;DR of what I posted was: WFT? Change hosts Now!
You can get plans starting that low from some of the major enormo-hosts, yes.
|brotherhood of LAN|
I'd recommend a VPS over shared hosting any day. Budget hosts can range from $20-$50 for the year, but the 'brand' ones are more like $10/m. For $10/m you can get a reliable host IMO. Of course, you pay more if you need more.
I have over 100 VPS accounts in the <$50/year bracket and there are niggling issues like IP migrations, DDOS attacks due to other clients they accept etc.
|Or possibly which planet we're on? |
|What can I do to limit them? |
As everyone seemed to agree on this - you only need to take action if your spambot problem is enormously out of proportion. If this is not the case, you need a new host.
The best way to block spambots is via .htaccess
You need to download your raw access logs, which is easily done via cPanel or if there's no cPanel, directly from the server - just locate the /logs folder
Then unzip and open in a txt editor. Locate the offending bot and either block it by name or by its IP address. You'd need to add this to your .htaccess file:
BrowserMatchNoCase TheBadSpamBotName bad_bot
Deny from env=bad_bot
Or, if you're blocking by IP, it would look like this:
Deny from 127.0.0.1
You can add as many bot names or ip addresses and you can mix both methods. Just make sure you don't use wildcard when blocking the bots - otherwise you may end up blocking one of the good bots and that would be really bad for SEO.
|The host recommends adding a caching mechanism to the code of my site. I'm not familiar with this |
If you're not familiar with this, it's a good idea not to mess with caching. There are many big sites out there not using any caching mechanisms and they're doing just fine.
|There are many big sites out there not using any caching mechanisms and they're doing just fine. |
Unless you're on WordPress. WordPress needs caching.