Hundreds of errors are signalled in my Google Sitemaps panel derived from links coming from other websites, namely - from some directories that use text snippets of my website and truncate the url so as to produce a 404 error message; - and from a malicious-looking archive with file extensions .arc.xml.gz, with links to multiple pages of my website where a spurious /directory/ is added in the path so that again a 404 error is caused. This archive can be accessed on the web and appears as a shared download heap of material, and there is no front page or actually any web page that can be seen. The issue has become apparent now that Google Sitemaps have started to show the page producing the 404 error. What kind of harm, if any, is coming to my website from all these 404's? Is there anything I can possibly do in my robots or .htaccess file to stop these spam referrers (I already have a custom 404.html in place)? Thanks anybody for advice!
Clearly a site is not responsible if other sites link to malformed URLs, and the proper response to such requests is to issue a 404 error - which is what you're doing.
Is there anything I can possibly do in my robots or .htaccess file to stop these spam referrers
Generally, the three options for such links are:
- Block them with robots exclusion - Redirect them where they're supposed to go - Let them return a 404
I can't see any good reason to robots exclude them. If they are malformed requests for obviously non-existent URLs, I would let them generate a 404 error. If it's clear from the request what page they were intended for, 301 redirect them to that page. E.g. www.example.com/category/pag could redirect to www.example.com/category/page