homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Webmaster Tools showing 2500 soft 404s. What would you do?
vlexo




msg:4640112
 4:30 pm on Jan 25, 2014 (gmt 0)

Hey all,

Webmaster Tools is showing me 2,500 soft 404s.

What's the plan? What would you do next?

What I've done is downloaded all of those URLs and placed them into a .txt file. I've then uploaded them to Majestic SEOs Bulk Backlink Checker.

That tool has found around 97 URLs that have external backlinks of varying quality from that original list of 2,500 URLs. A lot of these URLs are simply due to the way the old CMS functions and it seems that the old CMS spat out a lot of these unique URLs with added numbers on the end of them. We've moved away from this sort of CMS to a more modern one that doesn't do this. I've mapped a redirect spreadsheet for those 97 URLs, so that we can retain the link equity that these URLs possess.

However there are still 2,403 URLs showing up in WMT as soft 404s, which Majestic SEO's Bulk Backlink Checker didn't discover any external links for.

Is it worth pursuing the rest of those URLs? Moreover, is there any benefit in redirecting these URLs to the most appropriate pages if they don't actually have any external links pointing to them?

Would be interesting to hear your thoughts on this.

 

not2easy




msg:4640114
 4:52 pm on Jan 25, 2014 (gmt 0)

Best thing for soft 404s is a hard 404. If the pages don't exist and were not replaced with another, a 404 is the right response.

vlexo




msg:4640130
 5:42 pm on Jan 25, 2014 (gmt 0)

not2easy: Best thing for soft 404s is a hard 404. If the pages don't exist and were not replaced with another, a 404 is the right response.


That's the thing. This was a site migration from one CMS to another. The pages still exist. It's just that the old CMS threw out lots and lots of URLs that had unique identifiers added to the end of them. (Thanks god for canonical tags)

Is there any real benefit in redirecting the URLs we've found that do not have any backlinks? (which is the majority of the soft 404s found in WMT)

netmeg




msg:4640132
 6:42 pm on Jan 25, 2014 (gmt 0)

When I encounter soft 404s, I fix them, whether there are ten or ten thousand.

not2easy




msg:4640143
 7:25 pm on Jan 25, 2014 (gmt 0)

If the old pages were indexed they should have been redirected to the new URLs. That will fix all the soft 404s which G regards as a poor user experience. Depending on your old vs. new URL structure, you may be able to fix that with an URL rewrite, using the applicable docs for your server and CMS. If it is something like Drupal or WP look for a plugin to handle it.

lucy24




msg:4640164
 9:35 pm on Jan 25, 2014 (gmt 0)

Is there any real benefit in redirecting the URLs we've found that do not have any backlinks? (which is the majority of the soft 404s found in WMT)

google doesn't usually start throwing around "soft 404" accusations just because one set of parameters redirects to a different (or none) set of parameters. Fine-tooth-comb your logs and you'll see periodic requests for "qjeklrj.html" or similar-- the kind of URL you'd get if the cat walked across the keyboard. This is google checking whether your site is capable of giving out 404s at all.* If a garbage request leads to a 200, there's a problem and you should fix it.


* Personal experience suggests that this is automatically triggered any time a certain proportion of requests leads to a 301-- regardless of what's at the other end of the 301.

vlexo




msg:4640197
 11:00 pm on Jan 25, 2014 (gmt 0)

not2easy: If the old pages were indexed they should have been redirected to the new URLs. That will fix all the soft 404s which G regards as a poor user experience. Depending on your old vs. new URL structure, you may be able to fix that with an URL rewrite, using the applicable docs for your server and CMS. If it is something like Drupal or WP look for a plugin to handle it.


We are running on Drupal. Any recommendations for a plugin that would be able to do that?

@lucy24: Much appreciated. Thanks for the insight.

born2run




msg:4640202
 11:08 pm on Jan 25, 2014 (gmt 0)

Same here, we also redesigned our Drupal site and now google-WT is showing 404's in the thousands. Any recommendation would be appreciated.

lucy24




msg:4640221
 12:06 am on Jan 26, 2014 (gmt 0)

You may need to edit your htaccess file manually. But that's a question for the apache and/or Content Management subforum.

google-WT is showing 404's in the thousands

Actual 404s or "soft 404"s? A real 404 isn't usually a problem-- UNLESS the search engine asked for the page because you yourself have a link to it elsewhere on the site. Then they start blathering about "technical quality".

ergophobe




msg:4640222
 12:15 am on Jan 26, 2014 (gmt 0)

vlexo and born2run

Properly configured, Drupal should return a proper 404 unless it is very old (like Drupal 5).

If you are getting thousands of hard or soft 404s, it is because you have some type of config problem. I have no idea what or why. Out of the box Drupal should return a standard 404 for any page that isn't found. Check it with LiveHTTPHEaders or some similar header checking tool.

First question: What are these URLs? Do they resemble valid URLs? Are they related to pagination, date stamps, search parameters or anything like that?

A note about URLs.

If you have something that generates valid "native" URLs (like "node/15") but then appends a parameter, you will get the same result as the page with no result. For example

https://drupal.org/node/1432230
https://drupal.org/node/1432230/other-stuff

This is functionally the equivalent of adding a get query string to any URL, such as
https://www.webmasterworld.com/webmasterworld/4640004.htm?q=sdfaefsdf

Now if you are using a Drupal URL alias (set manually or via pathauto or what have you) then you cannot append random stuff and have a valid URL, because in that case it is doing a DB lookup for the entire page and it will not find it.

So if I have
https://example.com/about

then

https://example.com/about/random-stuff

will return a 404 unless such a page exists.

If you are set up with pages using valid URLs with pagers as GET query strings, then you can also end up with thousands of 404s. Again
https://www.webmasterworld.com/webmasterworld/4640004.htm?page=23
https://www.webmasterworld.com/webmasterworld/4640004.htm?page=24
https://www.webmasterworld.com/webmasterworld/4640004.htm?page=25

Are all valid ad infinitum. That's not Drupal-specific. In other words, it's probably not the handling of 404s that is the problem, but the generation of bogus URLs that is the problem.

Special note about Views
Views behaves like the rest of Drupal. If you are getting Views pages that are causing 404s, however, you can set your Contextual Filters to serve a 404 if argument validation fails. Under "More" you can also set a filter to return a 404 if there are *more* arguments than required.

If you don't have a contextual filter, you can use the Global:Null filter which does pretty much nothing except let you set these options.

That said, you may simply be masking a problem (the problem being that you are generating URLs with extra parameters).

-----

As a side note, there are things you can do to improve 404 handling in Drupal. None of these are oriented toward fixing your problem, because that shouldn't be happening period.

- Make sure you have created and set custome 403 and 404 pages in your site settings. In D7 this is Configuration -> System -> Site Information (at admin/config/system/site-information).

- Fast 404: serve a 404 without bootstrapping the whole system
https://drupal.org/project/fast_404

- Search 404: Attempt to search based on url keywords
https://drupal.org/project/search404

- Global Redirect: not so much for handling 404s, as for a number of things like 301s to URLs that use the page ID rather than the friendly. Should be on all Drupal sites.
https://drupal.org/project/global_redirect

born2run




msg:4640228
 1:36 am on Jan 26, 2014 (gmt 0)

Hi Ergophobe, thanks much for the detailed reply!

I have a small related question: I have drupal cms running and there are some links that are set via Drupal. How do I 301 redirect these links to the new updated links?

Can I do it via htaccess or via Drupal?

Your help is appreciated much!

Regards,
Kip

ergophobe




msg:4640229
 1:41 am on Jan 26, 2014 (gmt 0)

Kip - you can do it either with .htaccess or with the redirect module

https://drupal.org/project/redirect

born2run




msg:4640230
 1:45 am on Jan 26, 2014 (gmt 0)

Thanks ergophobe. I shall try htaccess first, then drupal.

Regards,
Kip

ergophobe




msg:4640241
 3:51 am on Jan 26, 2014 (gmt 0)

One nice thing about the redirect module is if you are logged in as admin and you go to a page that is 404, it will ask you right there if you want to add a 301 redirect.

Obviously .htaccess is more efficient (and httpd.conf is even more efficient), but if it's not a high-traffic page, the redirect module has it's uses.

born2run




msg:4640255
 5:52 am on Jan 26, 2014 (gmt 0)

Yes htaccess worked but I'm still facing another htaccess question which I asked in the Apache forum but no solution yet.

Thanks again!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved