Welcome to WebmasterWorld Guest from 22.214.171.124
Instead of a 404 the site returns 200 (or in the case of a recent test 301 since the page was in cache).
Google WMT is complaining about this. Will this affect the SERP rating in any way? I don't see why it should as it's for the customer's benefit, not google's. I can falsify the return for google if necessary but curious as to whether I need to.
(To avoid confusion, I believe that the "301" mentioned in the initial post was actually a 304-Not Modified, based on the "already in cache" qualification in that sentence.)
In fact it IS a 301. I posted in a bit of a hurry: it should NOT be a 200 despite what google says it's seeing.
The 404 handler does its best to track down the requested file (replacing extensions mainly) and if that fails it goes for the home page. This technique was always good enough before but now google seems to be second-guessing what a 404 should be (presumably sending stupid URLs to see what comes back).
Thinking back to my earlier "stupid me" response, there is only a canonical issue if google's robots request invalid URLs. On this site I KNOW there aren't any removed pages in their index 'cause we haven't removed any, so they should not be concerned about such issues.
When I notice an issue I fix it and move on
That is the problem needing fixing. Fix it and my suggestion is moot. Which was actually the point I was making. Infact, the OP in his previous post says:
here is only a canonical issue if google's robots request invalid URLs. On this site I KNOW there aren't any removed pages in their index 'cause we haven't removed any, so they should not be concerned about such issues
I was trying to point out that 404s are supposed to be served when resourse does not exist. Instead, every conceivable URL is returning a duplicate content 200.
I fully agree that the problem should be fixed and the OP move on. I was pointing out the problem should he just carry on as is.
If the latter, then that may well cause you problems. Search engines sometimes request random made-up URLs to test your 404 response. If those redirect, that may well confuse them.
The fact that they are interpreting 301 as 200 is the interesting part. If they didn't send duff URLs in the first place it wouldn't do that anyway: it's for humans not dumb machines.
suzukik - as noted, it was actually a 301 not a 200, although the 200 obviously followed on from the successful 301.
Not sure why google thinks it's confusing to punters, since the punter has (generally) mis-typed a page name and gets the site s/he wants with a menu from which they can choose the correct page. Helpful rather than confusing, since otherwise they'd probably get the basic "That didn't work, what did you do wrong?" type of message, which IS confusing (which bit did I get wrong) and, to me, also VERY annoying.
What I read from your google URL is that google thinks it's confusing to their robot. So stop sending duff URL.
It is entirely probable that the webmaster who sets up a 301 redirect to the home page for a duff page request is the kind of person who will ensure removed pages are treated with appropriate redirects, hence helping the visitor. People who do not do that almost certainly don't have a clue about setting up 301's in the first place so they just issue 404s.
You can show whatever content you want for that request, but the HTTP Status Code should be 404.
That is, there should not be a 3xx-numbered redirect returned for such a request.
If you play fast and loose with the HTTP specs, don't be surprised if user agents that do follow the specs get confused by your site and take whatever damage control (damage to their systems in terms of avoiding sites that appear to be bot traps or have infinite duplicate content) they feel like taking.
If they didn't send duff URLs in the first place it wouldn't do that anyway: it's for humans not dumb machines.
Humans get urls wrong too. And as g1smd says, Google will run forms and request urls to see what happens. Google will only index a certain amount of content from a site and seems happy to fill up your quota with non-existent pages with dupe/no content at the expense of proper content.
I've always fixed this as part of a site overhaul/optimisation so I can't say 100% that in itself it affects rankings (i.e. cleaned up non-existent pages in isolation and watched rankings improve), but my gut feeling is that it doesn't help the site's overall profile and if you don't have enough IBLs/PR to get all your pages indexed then you are definitely missing out on traffic.
It's the perfect use of a well designed site map page. And your visitor sees one more of your site's well designed pages (and ads).
Then your user can "text search" your site map page if need be to find the page they truly want.
I can't say SERPS has ever been a problem with this technique.
The design of the 404 is ancient and was set up following advice elsewhere, back when google was a twinkle in the Creator's Eye. It's well over-due for a redesign. Trouble is, some of the sites are equally ancient and their owners ain't paying maintenance for 'em. They're a tight-wad lot, customers. :(