| This 40 message thread spans 2 pages: < < 40 ( 1  ) || |
|Impressed with google's soft 404 detection|
Been getting messages in gWMT regarding "soft" 404s on a couple of sites lately.
Looked into them and was fairly impressed. These sites (2 of them) are completely custom, no off the shelf code, built from scratch. NO structured data, NO google Highlighter.
Google is identifying pages on the site that are of little use to the visitor, for example, maybe a product out of stock, or an option selected for a product that does not exist.
To me this would not be as impressive if ours was an off the shelf CMS or ECom software, something opensource that google could train the bot on. This is completely unique, custom to us stuff, and it changes a lot.
What the machine receives... is a 200, an html page with content (albeit thin), from a link to the specific page, not a site search or a made up url.
What google (formally known as the machine) is doing is somehow make the determination that the content on the page is suggesting that the page should not be there. And they are doing it correctly.
|What google (formally known as the machine) is doing is somehow make the determination that the content on the page is suggesting that the page should not be there. And they are doing it correctly. |
Your statement is plain wrong on so many levels, don't know where to start or if I should even start. Worst part of it - google has no right to decide outside of their own rankings. They can opt to not include the page in their index due to thin content, but they cannot "steal" 404 from HTTP and twist its meaning. YOU (the webmaster) are the one who should decide if the page should be there or not.
|My position is that a page which says, very nicely, "I'm devastated to have to tell you that we don't currently have any articles about the particular subject you're interested in", surrounded by helpful navigation links, is effectively a very high quality 404 page. But it's not a page that you would want anyone to index or link to or include in the site's page count or do any of the other things you do with pages. |
Your position is not always the correct one, you can't use a blanket statement like that.
What if I had a website that lists all NFL teams and the main focus is transfer rumors. Let's say I want to have a page "Steelers transfer rumors", but right now there is nothing going on with the Steelers so the page is empty and displays "No rumors right now". Now, I want that page to be indexed and ranking and so do the visitors of my website. They can search for "Steelers transfer rumors" and if the page is blank - then there are no rumors about future transfers. The page being blank (or displaying "No rumors at this time") has worked perfectly to answer people's questions if there are any transfer rumors about the Steelers.
Just because the content is thin or it says "no rumors right now" it doesn't make it a "fancy 404 page" or a page that I don't want indexed/visited/linked to, or a page that is not useful to my visitors.
|Your statement is plain wrong on so many levels, |
That's a bit funny that you use a statement of opinion like that to preface your assertion that google has no right to form an opinion about my content.
I'm not debating right or wrong here, I'm interested in the technical ability to do what appears to have been done here. I'm impressed with the accuracy of what appears to be an algo that can accurately asses the content of a page as a human would.
404 is not an opinion.
As far as the ability of the algo to find thin content pages - this has been evident for a long time. Just because in your case is a good decision - it doesn't make it universally so. See my example above of thin content page that serves purpose.
404 is an area code in Georgia.
Your example above was addressed earlier in this thread. The fact that there are no trade rumors can be addressed in the area where you would link to a page that listed trade rumors. As a user, to go to another page to see what you could have told me in place of the link to the empty page is annoying and it pollutes the interweb.
And even in Georgia - it's not an opinion but a fact.
|As a user, to go to another page to see what you could have told me in place of the link to the empty page is annoying and it pollutes the interweb. |
You don't have to go to another page - you could just land there via search. Your query is "are there any rumors" and the page answers "no". Everyone is happy. I would love to see how you'd answer the question differently, maybe beam it to my brain?...404 doesn't mean there are no rumors, it means there is no page to answer "no".
And speaking of polluting the web - this is exactly what google is doing by twisting your arm to fill pages with unnecessary content just so they don't get hit by Panda. Many times the answer to a query is just a sentence or two, but I end up digging through 1000-word essays to find the actual answer.
My thought is that google labeling a page in WMT as a "soft 404" is google giving it's opinion.
|you could just land there via search. |
That's a very good point but it's a bit of a slippery slope too isn't it? I mean, really if that's all the content that's on the page then it would be likely that the search engine's snipit would be "No steeler's transfer rumors" and there would be no reason to click through to your page.
Meh, I don't find them that impressive. They report a lot of our pages that are heavy in JS (google maps implementation). There is plenty of unique content on those pages, but not much in the HTML. So Google calls it a soft 404. They aren't 404s at all.
Taken to adding a rel=canonical on all those pages now to stop the Google from being upset.
All of the above is assuming that Google is even looking at the code. They use image recognition and likely scan for things based on what they believe the page is about. ie: price, ratings, etc. How it's coded may not matter anymore.
just saw one of these for a page that was basically the usual site template and "There are currently no posts in this category." but i can't find the link to this url from the internal pages on which GWT claims this url was discovered and these pages haven't changed since it was "detected".
| This 40 message thread spans 2 pages: < < 40 ( 1  ) |