homepage Welcome to WebmasterWorld Guest from 54.167.249.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
502 Bad Gateway error - is it my site or the hosting?
virtualreality




msg:4631552
 5:39 pm on Dec 17, 2013 (gmt 0)

My site is down with the following message: 502 Bad Gateway

I contacted my host and they sent me a very technical reply:

It appears that a crawler robot (or robots) was causing a high load on the server, and due to this affecting all of the other accounts on the system, we were forced to take immediate action for the health of the server.

We have blocked the crawler from being able to connect to your site by either an IP block in the .htaccess for this domain, or by specifically blocking the crawlers by UserAgent inside the .htaccess file.

Regarding the recent traffic for your account, it appears to be receiving a large amount of traffic from web crawlers and we would like to pass some details along regarding the crawlers. You can create Bing and Google Webmaster accounts and configure your domains to have a lower crawl-delay. We also recommend configuring a robots.txt file. This will reduce the rate that crawlers initiate requests with your site and reduce the resources it requires from the system allowing for more legitimate traffic to be served. If you would like to reduce traffic from crawlers such as Yandex or Baidu these typically need to be done utilizing something in the nature of an .htaccess block. For more details regarding those topics please reference our support knowledge base articles listed below. Please let us know should you have any further issues or require any additional assistance.


Based on this reply I dont understand is the problem on my site or is it the hosting?

They recommend to: "configure your domains to have a lower crawl-delay". Is this a good idea?

What about crawlers such as Yandex or Baidu? Do they have to be limited?

Thanks a lot!

 

adder




msg:4631622
 9:40 pm on Dec 17, 2013 (gmt 0)

You can create Bing and Google Webmaster accounts and configure your domains to have a lower crawl-delay

OMG, I just had to pinch myself and check the calendar to make sure I remember which year we are in. It's almost like saying "tell your visitors not to visit every day, our servers can't handle it" :)

is the problem on my site or is it the hosting?
It is the hosting. A Bad Gateway error means that there is a miscommunication between a proxy server and one of their servers. In other words, they don't give the proxy server the response it expects. In other words they're blocking access to your site.

The best plan of action is to check your Google Webmaster account (Crawl -> Crawl Stats section) to make sure there isn't a glitch on Google's side. Check "pages crawled per day" - considering the size of your site, is this number disproportionally high? If it is (I bet it's not), you can limit the crawl rate via the site settings (cogwheel icon). Similar functionality exists for Bing and Yandex.

Also check the raw stats system provided by your host. In most cases it's the Awstats and you can access it from your hosting control panel. Go to the Robots/Spiders section and again check for any unusually high numbers - bandwidth or number of hits.

They are usually the spam bots who cause problems. If there are a lot of unidentified bots in your Robots/Spiders section and they're consuming ridiculously high bandwidth, you'll have to block them via .htaccess.

The mainstream bots who identify themselves (including Yandex and Baidu) are unlikely to cause problems. If they do cause problems, you should be able to identify them via the Awstats.

My guess is: get a new host... but it's difficult to say without having seen the stats.

netmeg




msg:4631627
 10:01 pm on Dec 17, 2013 (gmt 0)

Yep, I only see this sort of thing on the cheaper hosts.

Hosting matters, folks. I went through SEVEN before I found one I liked, and you couldn't pry me off with a crowbar now. You don't have to pay a fortune for good hosting, but you probably won't get really good hosting for three bones a month.

lucy24




msg:4631643
 10:44 pm on Dec 17, 2013 (gmt 0)

which year we are in

Or possibly which planet we're on?

You can create Bing and Google Webmaster accounts and configure your domains to have a lower crawl-delay. We also recommend configuring a robots.txt file. This will reduce the rate that crawlers initiate requests with your site

Someone should tell this host that the googlebot explicitly ignores the "Crawl-Delay" directive in robots.txt. They say so in wmt. You can only change it by adding a wmt account-- and then putting your fingers in your ears and humming loudly while they slather on their don't-change-our-defaults recommendations.

If you would like to reduce traffic from crawlers such as Yandex or Baidu these typically need to be done utilizing something in the nature of an .htaccess block.

Again, it seems to have escaped the host's notice that Yandex also has a WMT. They also claim to honor the crawl-delay directive; I don't know if anyone with a large site has personally verified this. (Small sites like mine are irrelevant because search engines just don't crawl that often.) Also the "clean-param" directive which I admit I'd never heard of, but sounds like a great alternative to individual wmt settings.

Edit:
I don't know if this is universal to all hosts. But server-level lockouts (in my case mod_security) come through as 503.

Shepherd




msg:4631664
 12:31 am on Dec 18, 2013 (gmt 0)

probably won't get really good hosting for three bones a month.


Netmeg, what's a "bone" going for these days? $10, $100

(I'm not being snarky, trying to figure out where our server rates at $450/month)

netmeg




msg:4631667
 1:02 am on Dec 18, 2013 (gmt 0)

$1

lucy24




msg:4631674
 1:30 am on Dec 18, 2013 (gmt 0)

$450/month is what you're personally paying. But that's not the base rate for shared hosting anywhere in the world. Not even in Canada. How much would you be paying as a baseline? Say, if you got two visitors a day and didn't use any significant computing resources. If you don't know, look at the host's front page. It will say something like "hosting as low as..." (I don't know your host. But everyone says this.)

:: detour to check ::

Yikes. What exactly are you getting for that $450? My host's rates don't even go that high-- not even for dedicated physical servers as opposed to VPS-- unless it's hidden in some VIP area that I'm not allowed to see.

Shepherd




msg:4631678
 1:36 am on Dec 18, 2013 (gmt 0)

Yikes. What exactly are you getting for that $450?


managed, quad dual core processors, 12 gig ram, dedicated physical

netmeg




msg:4631683
 2:07 am on Dec 18, 2013 (gmt 0)

Yea, I have an ecommerce client who pays $750/month. But that's way above and beyond most people's needs, obviously. But $3/month is too far in the other direction.

virtualreality




msg:4631764
 8:56 am on Dec 18, 2013 (gmt 0)

Thank you all for your helpful replies. I don't see anything unusual in my webmaster's accounts. But I have a high volume of Unknown robot (identified by 'bot*') I see in my Awstats account. What can I do to limit them?

The host recommends adding a caching mechanism to the code of my site. I'm not familiar with this and I don't want to do something I don't know, but is there anything I can do to block spam bots hitting my site?

From an SEO perspective, can getting a large number of unknown bots have a negative impact on my SEO in Google?

lucy24




msg:4631766
 9:17 am on Dec 18, 2013 (gmt 0)

can getting a large number of unknown bots have a negative impact on my SEO in Google?

Can't see how-- unless your host is telling the truth and the requests are so overwhelming that it affects page speed as measured by major search engines.

That's assuming for the sake of discussion that the search-engine arm of the operation doesn't trade notes with the analytics arm. ("Hm, why are all those robots clustering around? There must be something shady going on.")

You said it's your own server, right? That means you've even got the option of blocking the most outrageous offenders via a firewall, which consumes fewer resources than a config-level 403. (Or at least it's supposed to, or else why bother with the extra level?)

:: wait, stop, rewind ::

If it's your own server, why is anyone talking about htaccess? There shouldn't be any. Except maybe temporarily if you're making major changes. Otherwise everything should be happening in the config file.

JD_Toims




msg:4631768
 9:28 am on Dec 18, 2013 (gmt 0)

If it's your own server, why is anyone talking about htaccess? There shouldn't be any. Except maybe temporarily if you're making major changes. Otherwise everything should be happening in the config file.

I think you misread the same posts I did -- I thought the discussion about the dedicate hosting applied to the OP rather than being an <aside> to the main topic and posted something very similar to the above with more words, until I went back and reread -- Then realized I combined the OP and the <aside> when I said the .htaccess was the wrong place for the solution.

lucy24




msg:4631772
 10:45 am on Dec 18, 2013 (gmt 0)

You're right. Whew. I was having a hard time working "htaccess" and "$450/month" into the same sentence.

Is anyone in this thread actually paying $3/month or is that just a figure netmeg threw out for illustrative purposes?

JD_Toims




msg:4631799
 12:53 pm on Dec 18, 2013 (gmt 0)

I was having a hard time working "htaccess" and "$450/month" into the same sentence.

LOL I was too! -- The TL;DR of what I posted was: WFT? Change hosts Now!

netmeg




msg:4631805
 2:22 pm on Dec 18, 2013 (gmt 0)

You can get plans starting that low from some of the major enormo-hosts, yes.

brotherhood of LAN




msg:4631811
 2:28 pm on Dec 18, 2013 (gmt 0)

I'd recommend a VPS over shared hosting any day. Budget hosts can range from $20-$50 for the year, but the 'brand' ones are more like $10/m. For $10/m you can get a reliable host IMO. Of course, you pay more if you need more.

I have over 100 VPS accounts in the <$50/year bracket and there are niggling issues like IP migrations, DDOS attacks due to other clients they accept etc.

adder




msg:4631902
 7:15 pm on Dec 18, 2013 (gmt 0)

Or possibly which planet we're on?
LOL :)

What can I do to limit them?

As everyone seemed to agree on this - you only need to take action if your spambot problem is enormously out of proportion. If this is not the case, you need a new host.

The best way to block spambots is via .htaccess
You need to download your raw access logs, which is easily done via cPanel or if there's no cPanel, directly from the server - just locate the /logs folder

Then unzip and open in a txt editor. Locate the offending bot and either block it by name or by its IP address. You'd need to add this to your .htaccess file:

BrowserMatchNoCase TheBadSpamBotName bad_bot
Order Deny,Allow
Deny from env=bad_bot

Or, if you're blocking by IP, it would look like this:

Order Deny,Allow
Deny from 127.0.0.1

You can add as many bot names or ip addresses and you can mix both methods. Just make sure you don't use wildcard when blocking the bots - otherwise you may end up blocking one of the good bots and that would be really bad for SEO.

The host recommends adding a caching mechanism to the code of my site. I'm not familiar with this

If you're not familiar with this, it's a good idea not to mess with caching. There are many big sites out there not using any caching mechanisms and they're doing just fine.

netmeg




msg:4631958
 11:15 pm on Dec 18, 2013 (gmt 0)

There are many big sites out there not using any caching mechanisms and they're doing just fine.


Unless you're on WordPress. WordPress needs caching.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved