Msg#: 4625828 posted 3:56 pm on Nov 25, 2013 (gmt 0)
WMT is showing me 50+ soft 404's for pages it claims are in the sitemap xml file, but there are no such pages either in the website or in the sitemap and never have been. Is there a problem with WMT I am not aware of?
Along those lines, is Google getting too big to be accurate?
Msg#: 4625828 posted 5:21 pm on Nov 25, 2013 (gmt 0)
Is it reporting SOFT 404s, or did your fingers type the extra word by mistake? Seems like that's a bigger problem than any imaginary pages.
"in sitemap" doesn't necessarily mean in the current sitemap, just in some sitemap that they've seen at some time in the present geological era. Sometimes this may turn out to mean "in the search engine's fevered imagination".
Msg#: 4625828 posted 5:30 pm on Nov 25, 2013 (gmt 0)
Thanks Lucy24 for putting the vision of a megalomaniacal search engine with a fevered imagination into my head. Soft 404s in WMT can come from a link on another site that's entered wrong so a fevered bot might not know what it's looking at and take the owner of that bad link to be the authority of what's actually on your site. Question is, how much credence should one give to an unhealthy robot.
Msg#: 4625828 posted 1:28 am on Nov 26, 2013 (gmt 0)
or did your fingers type the extra word by mistake?
This is a client's e-commerce site and the sitemap is generated from the database of products. The product URL's are the product names, i.e. www.example.com?prod=small-widget and WMT is showing www.example.com?prod=123456, something which simply does not exist. I cannot even figure out where WMT got that link regardless if they are saying "in sitemap.xml". It just is not there and never was.
Msg#: 4625828 posted 1:44 am on Nov 26, 2013 (gmt 0)
What I'm focusing on is the "soft" element.
404 = page doesn't exist. You may or may not want to spend time figuring out why the search engine thinks the page exists. (In my case it generally points to a typo in some recently added or edited link. Oops.)
Soft 404* = search engine suspects that a redirect response is taking the place of a 404 or 410.
Search engines will periodically ask for nonexistent pages. I have to assume this behavior is triggered by some fully automated process, because I see it on my personal site every time I've instituted a fresh crop of redirects. The search engine gets anxious and hurries to check whether these are bona fide redirects or a cover-up for missing pages.
* afaik this term was invented by google. But it's useful so, heck, let's keep it.
Msg#: 4625828 posted 4:40 am on Nov 26, 2013 (gmt 0)
Google will report a 404 on a page that another site has linked to, regardless of whether you have a page there or not. It will also use malformed URLs on othersites to invent pages on your site.
I have had some pages 410'ed for over 18 months, and Google still reports them as errors. No links to them on my site, the links from other sites are long gone, but Google still knows of them and keeps trying to find them.