| 7:34 pm on Mar 17, 2009 (gmt 0)|
This practice, over time, can destroy your site's rankings. You are essentially telling Google to index the same content for an infinite number of URLs.
You can return a custom error page that is friendly to your visitors and still send a 404 in the server's http header.
| 11:23 pm on Mar 17, 2009 (gmt 0)|
Stupid of me! I'd forgotten the canonical aspect. Thanks.
| 11:59 pm on Mar 17, 2009 (gmt 0)|
I've found that Google's custom 404 widget is actually pretty good at suggesting the proper URL in the case of file name misspellings.
| 1:31 am on Mar 18, 2009 (gmt 0)|
Sticking with strict server response code semantics, you can return either a 404-Not Found or a 301-Moved Permanently redirect to the correct URL in response to a mis-spelled type-in URL.
(To avoid confusion, I believe that the "301" mentioned in the initial post was actually a 304-Not Modified, based on the "already in cache" qualification in that sentence.)
| 2:32 am on Mar 18, 2009 (gmt 0)|
You can only return the correct page if you can pre-guess all the spelling combinations. :)
In fact it IS a 301. I posted in a bit of a hurry: it should NOT be a 200 despite what google says it's seeing.
The 404 handler does its best to track down the requested file (replacing extensions mainly) and if that fails it goes for the home page. This technique was always good enough before but now google seems to be second-guessing what a 404 should be (presumably sending stupid URLs to see what comes back).
Thinking back to my earlier "stupid me" response, there is only a canonical issue if google's robots request invalid URLs. On this site I KNOW there aren't any removed pages in their index 'cause we haven't removed any, so they should not be concerned about such issues.
| 9:20 am on Mar 18, 2009 (gmt 0)|
Well, if I were your competitior, I would be pointing as many incorrect links at your site as I could. So many duplicate pages (literally as many as I could get 'discovered') would make you look spammy, and kill all those 'hidden' ranking factors like Trust and Authority.
| 12:13 pm on Mar 18, 2009 (gmt 0)|
I'd like you as a competitor, while you scheme and plot against me I ignore you and work on great content (hint hint).
When I notice an issue I fix it and move on, your efforts would then be wasted. :-)
| 12:25 pm on Mar 18, 2009 (gmt 0)|
|When I notice an issue I fix it and move on |
Yep, but the problem is that any invalid URL returns an identical page with a 200 header status.
That is the problem needing fixing. Fix it and my suggestion is moot. Which was actually the point I was making. Infact, the OP in his previous post says:
|here is only a canonical issue if google's robots request invalid URLs. On this site I KNOW there aren't any removed pages in their index 'cause we haven't removed any, so they should not be concerned about such issues |
I was trying to point out that 404s are supposed to be served when resourse does not exist. Instead, every conceivable URL is returning a duplicate content 200.
I fully agree that the problem should be fixed and the OP move on. I was pointing out the problem should he just carry on as is.
| 9:14 pm on Mar 18, 2009 (gmt 0)|
Contrary to what I said in the thread's title, it's 301 not 200, as I noted here yesterday, so any dead links are redirected permanently to the home page.
| 10:22 pm on Mar 18, 2009 (gmt 0)|
Does it do that for "only links that used to work but no longer do", or for "all URLs that do not exist, even if they have never existed"?
If the latter, then that may well cause you problems. Search engines sometimes request random made-up URLs to test your 404 response. If those redirect, that may well confuse them.
| 7:11 pm on Mar 20, 2009 (gmt 0)|
|Instead of a 404 the site returns 200 Tags for SEO |
They are called "soft 404s". Google do not recommend to use them because:
|they can be a confusing experience for users and search engines. |
| 11:22 pm on Mar 20, 2009 (gmt 0)|
g1smd - as far as I know there is no actual problem except that WMT notes that what google thinks should be a 404 returns a 200. I assume they test this using impossible URLs that they expect to return 404. Which frankly is no concern of theirs since the site is designed to retain customers not to play catch with google (yeah, I know!).
The fact that they are interpreting 301 as 200 is the interesting part. If they didn't send duff URLs in the first place it wouldn't do that anyway: it's for humans not dumb machines.
suzukik - as noted, it was actually a 301 not a 200, although the 200 obviously followed on from the successful 301.
Not sure why google thinks it's confusing to punters, since the punter has (generally) mis-typed a page name and gets the site s/he wants with a menu from which they can choose the correct page. Helpful rather than confusing, since otherwise they'd probably get the basic "That didn't work, what did you do wrong?" type of message, which IS confusing (which bit did I get wrong) and, to me, also VERY annoying.
What I read from your google URL is that google thinks it's confusing to their robot. So stop sending duff URL.
It is entirely probable that the webmaster who sets up a 301 redirect to the home page for a duff page request is the kind of person who will ensure removed pages are treated with appropriate redirects, hence helping the visitor. People who do not do that almost certainly don't have a clue about setting up 301's in the first place so they just issue 404s.
| 11:31 pm on Mar 20, 2009 (gmt 0)|
If the URL is incorrect, a 404 response should be returned in the HTTP header. That's in the HTTP specs.
You can show whatever content you want for that request, but the HTTP Status Code should be 404.
That is, there should not be a 3xx-numbered redirect returned for such a request.
If you play fast and loose with the HTTP specs, don't be surprised if user agents that do follow the specs get confused by your site and take whatever damage control (damage to their systems in terms of avoiding sites that appear to be bot traps or have infinite duplicate content) they feel like taking.
| 10:17 am on Mar 21, 2009 (gmt 0)|
|If they didn't send duff URLs in the first place it wouldn't do that anyway: it's for humans not dumb machines. |
Humans get urls wrong too. And as g1smd says, Google will run forms and request urls to see what happens. Google will only index a certain amount of content from a site and seems happy to fill up your quota with non-existent pages with dupe/no content at the expense of proper content.
I've always fixed this as part of a site overhaul/optimisation so I can't say 100% that in itself it affects rankings (i.e. cleaned up non-existent pages in isolation and watched rankings improve), but my gut feeling is that it doesn't help the site's overall profile and if you don't have enough IBLs/PR to get all your pages indexed then you are definitely missing out on traffic.
| 10:49 am on Mar 21, 2009 (gmt 0)|
Return a 404 in the response header and show your site map page as the content. In fact make sure you return a 404; redirecting to your sitemap page and returning a 200 is a big mistake.
It's the perfect use of a well designed site map page. And your visitor sees one more of your site's well designed pages (and ads).
Then your user can "text search" your site map page if need be to find the page they truly want.
| 9:44 pm on Mar 21, 2009 (gmt 0)|
Ok, guys. Thanks for the feedback.
I can't say SERPS has ever been a problem with this technique.
The design of the 404 is ancient and was set up following advice elsewhere, back when google was a twinkle in the Creator's Eye. It's well over-due for a redesign. Trouble is, some of the sites are equally ancient and their owners ain't paying maintenance for 'em. They're a tight-wad lot, customers. :(
| 1:07 am on Mar 22, 2009 (gmt 0)|
The canonical is issue is easily resovled using the rel=canonical tag. You could also return a 404 and do suggestions (or return search results based on the URL) on your 404-page.
| 10:58 pm on Mar 22, 2009 (gmt 0)|
The whole issue is easy even without the canonical tag. It's time that's the issue here, since a) I manage a lot of sites and b) I have no idea off-hand which have this (potential) problem and c) there seem to be far fewer hours in each day than there used to be.
| 3:03 pm on Mar 24, 2009 (gmt 0)|
|The canonical is issue is easily resolved using the rel=canonical tag |
Google don't *guarantee* that using this new tag will sort out such issues. I wouldn't rely on that myself.
| 6:51 pm on Mar 25, 2009 (gmt 0)|
For many problems the canonical tag isn't the easiest method to implement, nor is it the most effective.