Best thing for soft 404s is a hard 404. If the pages don't exist and were not replaced with another, a 404 is the right response.
|not2easy: Best thing for soft 404s is a hard 404. If the pages don't exist and were not replaced with another, a 404 is the right response. |
That's the thing. This was a site migration from one CMS to another. The pages still exist. It's just that the old CMS threw out lots and lots of URLs that had unique identifiers added to the end of them. (Thanks god for canonical tags)
Is there any real benefit in redirecting the URLs we've found that do not have any backlinks? (which is the majority of the soft 404s found in WMT)
When I encounter soft 404s, I fix them, whether there are ten or ten thousand.
If the old pages were indexed they should have been redirected to the new URLs. That will fix all the soft 404s which G regards as a poor user experience. Depending on your old vs. new URL structure, you may be able to fix that with an URL rewrite, using the applicable docs for your server and CMS. If it is something like Drupal or WP look for a plugin to handle it.
|Is there any real benefit in redirecting the URLs we've found that do not have any backlinks? (which is the majority of the soft 404s found in WMT) |
google doesn't usually start throwing around "soft 404" accusations just because one set of parameters redirects to a different (or none) set of parameters. Fine-tooth-comb your logs and you'll see periodic requests for "qjeklrj.html" or similar-- the kind of URL you'd get if the cat walked across the keyboard. This is google checking whether your site is capable of giving out 404s at all.* If a garbage request leads to a 200, there's a problem and you should fix it.
* Personal experience suggests that this is automatically triggered any time a certain proportion of requests leads to a 301-- regardless of what's at the other end of the 301.
|not2easy: If the old pages were indexed they should have been redirected to the new URLs. That will fix all the soft 404s which G regards as a poor user experience. Depending on your old vs. new URL structure, you may be able to fix that with an URL rewrite, using the applicable docs for your server and CMS. If it is something like Drupal or WP look for a plugin to handle it. |
We are running on Drupal. Any recommendations for a plugin that would be able to do that?
@lucy24: Much appreciated. Thanks for the insight.
Same here, we also redesigned our Drupal site and now google-WT is showing 404's in the thousands. Any recommendation would be appreciated.
You may need to edit your htaccess file manually. But that's a question for the apache and/or Content Management subforum.
|google-WT is showing 404's in the thousands |
Actual 404s or "soft 404"s? A real 404 isn't usually a problem-- UNLESS the search engine asked for the page because you yourself have a link to it elsewhere on the site. Then they start blathering about "technical quality".
vlexo and born2run
Properly configured, Drupal should return a proper 404 unless it is very old (like Drupal 5).
If you are getting thousands of hard or soft 404s, it is because you have some type of config problem. I have no idea what or why. Out of the box Drupal should return a standard 404 for any page that isn't found. Check it with LiveHTTPHEaders or some similar header checking tool.
First question: What are these URLs? Do they resemble valid URLs? Are they related to pagination, date stamps, search parameters or anything like that?
A note about URLs.
If you have something that generates valid "native" URLs (like "node/15") but then appends a parameter, you will get the same result as the page with no result. For example
This is functionally the equivalent of adding a get query string to any URL, such as
Now if you are using a Drupal URL alias (set manually or via pathauto or what have you) then you cannot append random stuff and have a valid URL, because in that case it is doing a DB lookup for the entire page and it will not find it.
So if I have
will return a 404 unless such a page exists.
If you are set up with pages using valid URLs with pagers as GET query strings, then you can also end up with thousands of 404s. Again
Are all valid ad infinitum. That's not Drupal-specific. In other words, it's probably not the handling of 404s that is the problem, but the generation of bogus URLs that is the problem.
Special note about Views
Views behaves like the rest of Drupal. If you are getting Views pages that are causing 404s, however, you can set your Contextual Filters to serve a 404 if argument validation fails. Under "More" you can also set a filter to return a 404 if there are *more* arguments than required.
If you don't have a contextual filter, you can use the Global:Null filter which does pretty much nothing except let you set these options.
That said, you may simply be masking a problem (the problem being that you are generating URLs with extra parameters).
As a side note, there are things you can do to improve 404 handling in Drupal. None of these are oriented toward fixing your problem, because that shouldn't be happening period.
- Make sure you have created and set custome 403 and 404 pages in your site settings. In D7 this is Configuration -> System -> Site Information (at admin/config/system/site-information).
- Fast 404: serve a 404 without bootstrapping the whole system
- Search 404: Attempt to search based on url keywords
- Global Redirect: not so much for handling 404s, as for a number of things like 301s to URLs that use the page ID rather than the friendly. Should be on all Drupal sites.
Hi Ergophobe, thanks much for the detailed reply!
I have a small related question: I have drupal cms running and there are some links that are set via Drupal. How do I 301 redirect these links to the new updated links?
Can I do it via htaccess or via Drupal?
Your help is appreciated much!
Kip - you can do it either with .htaccess or with the redirect module
Thanks ergophobe. I shall try htaccess first, then drupal.
One nice thing about the redirect module is if you are logged in as admin and you go to a page that is 404, it will ask you right there if you want to add a 301 redirect.
Obviously .htaccess is more efficient (and httpd.conf is even more efficient), but if it's not a high-traffic page, the redirect module has it's uses.
Yes htaccess worked but I'm still facing another htaccess question which I asked in the Apache forum but no solution yet.