Forum Moderators: open

Message Too Old, No Replies

Looks like spiders from China?

Receiving responsecode 500 errors from China visitors

         

Anders

11:28 pm on Jan 26, 2015 (gmt 0)

10+ Year Member



1 month ago I lanched an update/new site that included a domain change.
Everything have been working well, both Google and Bing are now showing results from the new domain.
But a few days after launch I started seeing statuscode 500 errors but only from iprange 180.76.4.* (Baidu) and they have been appearing almost like clockwork every second to third hour (about 10 times every day) for almost a month now.
Log file says "No Referrer" when itīs trying to access the startpage "/" with 0 bandwidth reported. The user agent identifies as "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
The site is working fine and I have had no complaints of errors.
Itīs built according to w3.org standards however browsing with MSIE 7 would certainly bring javascript errors.

The site has no connections to China

I have googled some of the ips and it seems others have sometimes seen these as Baidu spider but in my log file its always MSIE 7 which I donīt understand

Any suggestions/help with this would be most greatful.

Thanks!
/Anders

trintragula

8:48 am on Jan 28, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I think it's not unusual for even the popular search engines to visit sometimes using a browser useragent, so if Baidu have taken up doing that, I'm not surprised. I only recently noticed them asking for images. I didn't think they used to do that (though I may be wrong).

Server error 500 is usually a problem with the software running on a webserver. If it's only affecting some visitors, then perhaps there's some bot filtering or spam-filtering software installed that's malfunctioning? Just a guess. I would be looking at any server-side code you have running, and if it's not that, find out from your hosts (if appropriate) whether it might be something they doing.

lucy24

4:25 pm on Jan 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You're not really complaining, are you? ;) Another possibility is that your host is using mod_security and returning error code 500. The user-agent kinda looks like one that my host blocks.

keyplyr

9:03 pm on Jan 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... your host is using mod_security and returning error code 500. The user-agent kinda looks like one that my host blocks.

I agree. The server may be blocking this IP, or this UA or something in the request header.

Server error 500 is usually a problem with the software running on a webserver.

At the admin level any server response code can be used to describe any other server response. Even at the account level, some of this can be done via htaccess if the admin rules allow it, so a 500 doesn't necessarily mean that the server actually had an error performing the task, only that "500" was used to describe it.

Personally I block anything from Baidu. IMO they are a predatory company that ignores web standards and will scrape & sell your property for their own gain. They also have a bad side.

Anders

1:22 pm on Jan 29, 2015 (gmt 0)

10+ Year Member



Thank you so much!

I will contact my host and see what they know about it.

...I donīt know if this is relevant but here is more info:

At first I detected this from seeing "files does not exist" on 500.shtml in
cPanel "Last 300 Error Log messages".
When analyzing the logfile there were no 500 errors but instead 404 on the startpage ("/") (only from Baidu ip-range/useragent spec. above)

Then I created 500.shtml and now I donīt see any trace of this in cPanel "300 Error Log messages" but in the logfile the 404 on the index page are gone and instead there is the 500 error spec. above.

/Anders

lucy24

6:49 pm on Jan 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ugh. Is there any way you can view the real, raw access logs? It doesn't sound as if cPanel is being helpful.

What you're seeing is this:
-- robot requests a page and receives some intentional error or other: 503, 403, 418, what have you. The request and response are duly recorded in access logs. The response may also be recorded in error logs, depending on selected log level. (On shared hosting, you cannot change this.)
-- to go with the numerical error code, server makes an internal request for the specified Error Document. If it can't find it in the expected location, there will be a supplemental "not found" error referring not to the original request but to the error document itself. This, too, will be recorded in error logs-- but not in access logs, because it's internal.

So if a request receives a 500 response and you don't have a 500 document, there will be two separate errors.