Forum Moderators: Robert Charlton & goodroi
mysite.com/suburl1/suburl2/index.ht
mysite.com/suburl1/suburl2/....
mysite.com/suburl/….
mysite.com/…. (this last shown as mysite.com/…./suburl2/index.html in google's SERP) <cite>, <div> and <span>) that appear at the end of the search result snippet and reading them as bad URLs. tedster wrote:
If the URL actually returns a 404 status in the HTTP header, and it's supposed to be 404 - them why is this a "nightmare"?
If the URL actually returns a 404 status in the HTTP header, and it's supposed to be 404 - them why is this a "nightmare"?
If Marfola is correct when he says that the links to his pages in the Google SERPs have truncated URLs, then people who try to click through to his pages from the SERPs will get 404s. That would be a nightmare.
It often means that I redirect to a 404
I only allow [a-zA-Z0-9\-\_\.\/] in my file names.
[a-zA-Z0-9/.-_] or, better still use [a-z0-9/.-_] with the [NC] flag.
Never redirect to a 404. The 404 status must be returned at the originally requested URL.
You have way too much escaping, use [a-zA-Z0-9/.-_] or, better still use [a-z0-9/.-_] with the [NC] flag.
Any literal character except a-zA-Z0-9 should be escaped in regular expressions.
There is a type of auto-generated spam, in which webpages are created from scraped copies of Google SERPs. Is it possible that this is what is happening?
I don't think it will cause any problems for your rankings to return 404s for these.
You'll never see a real visitor requesting these malformed URLs, only Googlebot.
If the URL actually returns a 404 status in the HTTP header, and it's supposed to be 404 - them why is this a "nightmare"?
These incoming links are from auto-generated spam, webpages created from scraped copies of Google SERPs.
One question, should 400 errors return a custom error page or standard error page?