Forum Moderators: phranque

Message Too Old, No Replies

Many server errors (5xx) in Search Console

         

Tehuti

4:53 pm on Apr 15, 2022 (gmt 0)

10+ Year Member Top Contributors Of The Month



In the 'Coverage' section of Search Console, I can see 244 errors. Most are of the type 'Server error (5xx)'. When I click to see the details of these errors, I can see the affected URLs. Most are search pages. It looks like a spam bot is running searches on my site and inserting emojis and spam URLs into the search field. Google is then thinking these are actual pages on my site!

The URLs look like this:

mysite.com/search/emoji+spam+rude+words+emoji+spamURL+emoji+BEST+DATING+SITE+marriage+/feed/rss2

Nearly all end with 'rss2'. Most are really long and packed with emojis and spam links.

Why is Google Search Console picking these up as server errors? Can I fix it by adding something to my htaccess file? Forgive my ignorance, and hopefully this is the right forum as it's a server issue!

lucy24

5:13 pm on Apr 15, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wasn't there a very similar question just a few weeks ago? Google is reporting 500-class errors because it is receiving 500-class responses, so you are right to post in Apache rather than in Google.

Are the errors actually 500 or 503, or something else again? 500 is generally a bona fide error; 503 might be returned intentionally. If it’s the latter, a different response would probably be appropriate. Requests with bogus queries should be getting a 400-class response: most likely 404 (“huh? there’s nothing like that on this site”) or 403 (“begone, foul spammer”) depending on how grumpy you feel. But only your server administrator can figure out what‘s going on.

A possibly unrelated question is where G is getting these URLs in the first place. Sometimes GSC says where they found out about a given URL; sometimes they don’t. I can only think of two situations where G itself spontaneously requests nonexistent URLs: if the site has /directory/ they will periodically ask for /directory without trailing slash; and if they’re testing a site’s handling of 404s they will ask for a string of, I think, 16 arbitratry letters with .html at the end. They shouldn't be requesting random garbage.

Fortunately the question involves G, not bing. The latter does tend to get its database garbled, attaching one site’s paths to a different site’s hostname. When that happens, there's nothing to do but ignore them until they get it sorted.

robzilla

9:03 pm on Apr 15, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Probably best to block crawler access to /search/ through your robots.txt file.

As noted, any malformed or non-existent URL should preferably return something like a 404.

To fix the 5xx error, I'd look to the search engine script(s) rather than Apache/htaccess. Check your error logs to see what goes wrong there for this to result in a 5xx.

phranque

10:05 pm on Apr 15, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



whenever there is a 5XX response there will be a corresponding entry in the web server error log file.

i would look in the access log files for some of these 5XX responses that have been served to googlebot and find the corresponding reported error(s) in the error log files.

it may also be informative to look in the web server access log files for visitors other than googlebot that are requesting these same urls.

Why is Google Search Console picking these up as server errors? Can I fix it by adding something to my htaccess file?

what happens when you request these urls?
if you also get a 5XX response, then that's why.
perhaps your question is "why is googlebot requesting these urls?"

you can't fix the 5XX in .htaccess - you have to fix the underlying problem.
you could possibly configure .htaccess to send a different response (4XX?) for these requests, if appropriate...

lucy24

1:07 am on Apr 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you can't fix the 5XX in .htaccess - you have to fix the underlying problem.
you could possibly configure .htaccess to send a different response (4XX?) for these requests, if appropriate...

Continuing this line of thought: If you know that certain types of requests cause the search tool to throw fits, you could intercept them in htaccess and return a manual 404 or 403 or, heck, 418, or response of your choice. But this is best seen as a stopgap measure while you figure out the problem with the search tool itself. (Analogy du jour: You’re putting up a secure barrier at the top of a broken staircase. You still need to fix the stairs; you’re just ensuring that nobody fractures a limb in the meantime.)

dstiles

8:10 am on Apr 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I regularly get Russian language (cyrillic) page strings. Some are apparently from a normal visitor and hit 404. Others come in from G and B and are also rejected as 404.

I have translated a few of these hits to find they are probably film or TV titles or similar - in other words advertising URLs. Although I've not traced them I believe they must be listed as links on a web site somewhere that G and B can find them. Without that supposition I can't see how SEs can know to try the hits.

It occurs to me the OP may find the hits similarly listed.

Edit: I had one this morning. It translates as "new series-2020-season - How to Get Away with Murder". There are several youtube links.