Google crawling non-existing old urls and giving soft 404
nitin webdirekt
10:19 am on May 28, 2012 (gmt 0)
Hello, Google is crawling some old urls that are non-existing. How i come to know from where google is getting these urls for crawling.
Can anyone know about this, if yes please help me.
Thanks in advance.
tedster
3:22 am on May 29, 2012 (gmt 0)
Hello nitin_webdirektm and welcome to the forums.
You can't always know the source of a /url googlebot requests. But if you are seeing a soft 404 listing in WMT, then the source link should also be listed there.
By the way, if your site is returning some variety of a soft 404 - you should fix that and return a true and immediate 404 status.
JamesWt
11:59 am on May 29, 2012 (gmt 0)
If you are getting soft 404 for your website then you should redirect those links on some relevant page of your website and get rid of these 404 errors.
g1smd
7:34 pm on May 29, 2012 (gmt 0)
The 'soft 404 problem' can occur when you mass redirect requests for multiple pages to another single page when instead you should have directly returned '404 Not Found' or '410 Gone' for those requests.
'Soft 404' means that the URL should be returning 404 or 410 but is not doing so.
lucy24
9:23 pm on May 29, 2012 (gmt 0)
If you are getting soft 404 for your website then you should redirect those links on some relevant page of your website and get rid of these 404 errors.
If you are getting soft 404s you are already doing so. That's what a "soft 404" (as opposed to a "real" 404) is.
Robert Charlton
9:47 pm on May 29, 2012 (gmt 0)
If you are getting soft 404 for your website then you should redirect those links on some relevant page of your website and get rid of these 404 errors.
No, you should not redirect such links. As g1smd suggests, such redirection is likely the source of the problem, not the cure. You want 404 errors returned as 404s so the user agent knows that a requested url is a 404, so Google knows a requested url is a 404, and so you know that a requested url is a 404.
nitin_webdirektm - What kind of server do you have, and have you set up any sort of custom error page? The way Microsoft IIS in particular often handles 404s can create considerable problems.
Take a look at this thread for further discussion of some of the issues involved....
Try a server header checker or a tool like HTTP Headers to see what response code the reported urls are actually returning. It is possible, btw, to build a very user-friendly custom 404 error page, but it's got to be delivered as a 404.
Sgt_Kickaxe
2:16 am on May 30, 2012 (gmt 0)
If pages don't exist and never existed you want them to remain 404 because there isn't a problem. The exception would be if it's an obvious mistyped url link on another site pointing to yours in which case you may want to redirect the url via 301 to the proper page. My advice is to get the link author to fix their link first before doing any redirecting at all, if you can.