Welcome to WebmasterWorld Guest from 18.205.109.82

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Thin Content Manual Action (Partial Site Match): How to Handle?

     
4:56 pm on Jul 18, 2016 (gmt 0)

New User

joined:July 15, 2016
posts:8
votes: 0


A portion of our website was hit with a "Thin Content" Manual Action which we would like to resolve. Unfortunately, the pages that are cited to be in violation still generate a significant amount of traffic from other search engines, so we do not want to remove / 404 these pages completely. Furthermore, I am slightly afraid to file a Reconsideration Request because there are a few other income-generating sections of the site that could be interpreted as "thin" from Google's perspective (although user engagement stats say differently, but that's unimportant).

For those who have experience with these penalties - do you think it would be enough to place a "<meta name="googlebot" content="noindex">" tag on the affected pages to avoid completely removing these pages, and submit a reconsideration request citing this change? Or, do you think that anything less than complete removal from the site wouldn't suffice? The volume of pages is too large to "beef up" the content in any meaningful form.

Also, because it's a Partial Site Match penalty, we will only be citing certain directories for reconsideration. Is there a risk that other parts of the site will then be reviewed, or do the reviewers simply stick to the matter (URLs) in question when processing these requests?

Maybe doing nothing about the action is the right move if there is a risk of exposing other sections of the site?

Thanks in advance for your opinions and help.
5:51 pm on July 19, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


Welcome to Webmasterworld!

Manual penalties do expire. How long is the expiration date depends on how severe the perceived infraction. If you do have a time to wait, you could noindex these pages for googlebot only and firstly just wait for the penalty to expire and see if it will come back (although Google says you should not wait for expiration and that webmaster should submit reconsideration request, but if you are concerned with other parts of site coming under scrutiny, then maybe this is the first option you should try).
6:38 pm on July 19, 2016 (gmt 0)

New User

joined:July 15, 2016
posts:8
votes: 0


Hi aakk9999,

Thank you for your reply and kind welcome.

Interesting... I didn't realize these actions could expire. The traffic decline appearing around March ~14th, with the manual action message first appearing March 21st. They added another section of the site with another message on April 8th... nothing since then.

It's a multi-lingual site, and only the Japanese version was affected, so I am wondering if the various Google global divisions have different definitions of what "thin content" constitutes. I think we will indeed try to noindex the pages not not only a page level, but via robots.txt disallow directive and see if the issue does resolve itself.

I really appreciate your response and advice on this issue.
6:51 pm on July 19, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2112
votes: 122


Why not add more relevant content to the affected pages?
7:52 pm on July 19, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12365
votes: 403


It's a multi-lingual site, and only the Japanese version was affected...
How were these pages translated?

A machine translation, eg, would be very likely to trip the filter.
8:30 pm on July 19, 2016 (gmt 0)

New User

joined:July 15, 2016
posts:8
votes: 0


Why not add more relevant content to the affected pages?

There's simply too many pages to expand the content much further. We do have roughly ~2-3 paragraphs worth of unique content per page, but relative to the article lengths of roughly ~1500-2000 words, so it must not enough in Google's eyes. It's very difficult to speak at length about each of the topics focused on each page.

Users search for these specific topics, so we have a page about each specific topic. Our users are very unsophisticated about the topic, and not very computer saavy, so we find it's helpful to have a dedicated page about that specific topic. Obviously, they land on these pages direct from a Google search query.

I know that "best practices" would dictate having a single listing page with links to the specific topics (and using rel=canonical for the articles), but our users would have difficulty navigating that way, quite honestly. In our opinion, having them land on the page directly from their search query is a much better user experience than having to click two or three times get to the same place.

Also, just to clarify - the pages are basically troubleshooting type articles, not an e-com site with widget purchase pages whose sole intent is generating sales. The troubleshooting steps for each topic are nearly identical, so much of the content ends up being the same. I should also mention that we do not employ any outside advertising such as Adsense or other display advertising... only very "soft" sells of our related products that automate the troubleshooting process that these manual troubleshooting steps explain.

All of our competitors (50+ plus sites) copied us and employ a similar strategy but with garbage content, yet we were targeted for some reason.

How were these pages translated?

Human only, specifically by translators whose expertise is in our industry. What's funny is that a large competitor of ours uses Google Translate(d) pages, that are actually a mix of automated translations and English, yet still ranks very highly for several thousand terms.
1:50 am on July 20, 2016 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5834
votes: 159


I think we will indeed try to noindex the pages not not only a page level, but via robots.txt disallow directive


Whoa ... there's a trap here. If you block a page via robots.txt, the spiders won't be able to read the noindex instruction and won't know to take the page out of the index!

What you'll end up with is the pages still indexed but with this for the snippet:

"A description for this result is not available because of this site's robots.txt"

Blocking via robots.txt won't remove a page that's already in the index. And it won't necessarily keep a page from getting into the index if Google discovers it by following someone else's link. It will end up in the index with that dorky snippet.

If you want a page out of the index, use the noindex code snippet in the <head>...</head> of the page and let the spiders read it freely.

<meta name="robots" content="noindex, follow" />

"Follow" is the default but I like to specify it anyhow.
3:15 am on July 20, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12365
votes: 403


johnnycache, as buckworks correctly describes it, robots.txt and the robots noindex metatag don't play well together, so you'd want one or the other.

What you've described, though, are essentially doorway pages, and IMO neither noindex nor robots.txt, nor rel=canonical, are going to solve your problems. In your current setup, taking these pages out of the index or removing them from spidering, after all, defeats how users are going to find them.

Though you've portrayed these doorway pages as your effort to be "user friendly", I assume that you've simply got too many of them, trying to capture too many different keyword variants.

I'm also guessing that as RankBrain understands the essential queries behind a lot of these variants, more and more of your phrase-based doorways are likely to be seen as superfluous.

Google is going to be looking at your pages via a variety of lenses, and it's likely that the pages with the most link votes will survive the longest, but I'm assuming that Google's going to nibble away at what it sees as unnecessarily similar pages, probably section by section in your site.

My advice would be to try to eliminate the duplication by combining pages. Make your content richer, and make navigating your site more rewarding on a variety of levels. Assume a range of user intent and experience, and provide good background material for all likely users.

About a year ago, Google's intention to go after doorway pages was discussed here and elsewhere on the web....

Doorway Page Algorithm To Be Launched By Google
Mar 17, 2015
https://www.webmasterworld.com/google/4743151.htm [webmasterworld.com]

Not much had developed specifically by the time the thread had closed. The topic of combining pages was discussed only fleetingly, near the end of the thread, and as was noted in a March 20 2015 post by rish3, chances are that combining pages is going to cost some rankings. He noted...

The pattern is pretty undeniable. Optimized, separate pages work better.

Nevertheless, combining pages is where I think the solution is... but you're going to have to create ways within your site of highlighting important variations. It's going to be a lot harder than what you've been doing.

Also, for reference, this video by Matt Cutts on thin content, which cites doorways as the classic thin content example. In your case, the doorways are perhaps less extreme than what Matt describes, but I suspect they're doorways nonetheless....

Thin content with little or no added value
Matt Cutts - Aug 8, 2013 - trt 7:36
https://www.youtube.com/watch?v=w3-obcXkyA4 [youtube.com]
3:59 am on July 20, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10320
votes: 1058


so I am wondering if the various Google global divisions have different definitions of what "thin content" constitutes.


Could be. Might check on whether there's enough variation between the original and the translated pages. Some words simply will NOT translate in all "variations". In which case that certainly could show a thin content.

Remove the pages (410 Gone) if you are giving up on the indexing. This would go a long way toward strengthening site confidence.
3:09 pm on July 20, 2016 (gmt 0)

New User

joined:July 15, 2016
posts:8
votes: 0


Whoa ... there's a trap here. If you block a page via robots.txt, the spiders won't be able to read the noindex instruction and won't know to take the page out of the index!

Ahh okay, I see - thank you. The intent was to speed up the process, but I can see that this would be counterproductive. Alternately, I think we'll just re-submit a sitemap specific to those areas of the site and see if it accelerates the spidering of those pages.

My advice would be to try to eliminate the duplication by combining pages. Make your content richer, and make navigating your site more rewarding on a variety of levels. Assume a range of user intent and experience, and provide good background material for all likely users.

Agreed, this is what we've been trying to figure out for quite some time. The problem is that the volume of queries is so large, it's been challenging to figure out a structure to still capture these queries, yet dramatically reduce the number of pages.

Remove the pages (410 Gone) if you are giving up on the indexing. This would go a long way toward strengthening site confidence.

If Google were the only game in town, we'd certainly do that. We get a substantial amount of traffic from other global engines to completely remove the pages from the site.
10:01 pm on July 30, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


> The troubleshooting steps for each topic are nearly identical, so much of the content ends up being the same.

There's your culprit. Noindex those steps pages in robots meta and you're back in business. I think.
7:58 am on Aug 1, 2016 (gmt 0)

Preferred Member from BG 

5+ Year Member Top Contributors Of The Month

joined:Aug 11, 2014
posts:547
votes: 174


Hi Johnnycache (nice nick by the way),

With regards to combining multiple duplications. You have to bite the bullet on that. What I'd suggest is to see whether you have good links (or a good amount of them, or both) on the pages you want to consolidate. If you do - 301 them to the page that is shown for the most generic keywords relevant to the topic. More often than not you will retain a good portion of the long-tail or niche keywords, used by users familiar with your website, for the rest - it's anyone's guess, but what I would suggest is to go after the niche "money keywords" and redirect everything else.

However what I am surprised with is that you have pretty decent volume of text already, yet you still get manual penalty. I have done some research for this in the past and the reason behind such penalty is often due to big chuck of text being loaded on your page from the server and repeating on all pages of the same category. Say for example you have 300 unique, relevant keywords for the page and another 2000 that are coming from your "description" or "ToU" sections. This will dilute the uniqueness of the page. Consider placing all large quantities of repeating text in a separate page and link to that page from your landing pages. It is a hassle to do, but this way you will guarantee uniqueness and not hurt the UX as much.
4:21 pm on Aug 1, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14923
votes: 492


I agree with Nutterum about getting rid of the granular longtail pages and keeping the higher level (in the site architecture) pages that can serve as catchalls.

The root of your problem is that you're optimizing for keywords as if it's 2004. Search engines don't rank web pages by matching user queries to keywords on a web page.

Search engines can recognize what's topically relevant without the keywords being there. So there's no need to get granular and optimize for all the search queries. That's optimizing for a keyword to search query search engine and that kind of search engine no longer exists.

A concept or topic are not the same as a synonym
Some SEOs advise that we should optimize with synonyms but that's na´ve (for thinking the search engine is looking for multiple synonyms), thoughtless (for proposing a back of the napkin solution to an algorithm conceived and tested over millions of web pages) and ultimately it is a ham fisted and ignorant way of spamming (Caveman like. Caveman hit with club. Drag you back to cave). Sprinkling web pages with synonyms is a simplistic solution to a complex problem and only someone who does not understand the problem would suggest such a thing.

We do have roughly ~2-3 paragraphs worth of unique content per page, but relative to the article lengths of roughly ~1500-2000 words, so it must not enough in Google's eyes.


The solution is closer to what Nutterum suggested. Where I differ is that I am not surprised at all that you were caught. Varying a 2000 word web page with three paragraphs of original content does not make the web page unique. There are processes dating from the early 2000's that can easily spot those. It's simply the easiest thing to catch.

Shoot for 2000 unique words (sans stop words) and you're good.

The problem is that the volume of queries is so large, it's been challenging to figure out a structure to still capture these queries


It's challenging because you're laboring under the false notion that you must optimize for every keyword variation. Setting aside local-search keyword phrases (as that's an entirely different discussion), you don't need to optimize for granular long tail variations. It's a different situation for highly competitive three-word or less phrases but when it comes to granular long tail variations you can still rank with a page of original content that scores a direct hit in satisfying the user intent of a keyword phrase.

Good luck!
10:55 am on Aug 4, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 29, 2007
posts:2014
votes: 215


What aakk9999 said above, add a google only noindex meta tag to those pages if Google hates them but other search engines like them. I feel Google owes me money for every time they've said jump and do something when it wasn't entirely necessary. Giving an entire site a penalty because you don't like some pages is silly, just don't rank the pages you dislike. I apologize to Google fanatics but I'm the webmaster and Google is still just the ranking system, they can choose to rank and not rank but don't get a say in how I create content(anymore).
2:08 pm on Aug 9, 2016 (gmt 0)

Preferred Member from BG 

5+ Year Member Top Contributors Of The Month

joined:Aug 11, 2014
posts:547
votes: 174


I have yet to see a page/website creation I've been involved with that is slapped by Google but loved on Bing or DuckDuckGo. In fact more often than not I am usually 1-3 average possition higher on Google compared to the other search engines. Now if you are running adheavy spam site with 23 paginated images with ads in between in the hopes of barrel-scrap a dollar or two.. you might have better luck somewhere outside of Google. (not saying you do, but saying I can still see crap UX websites rank well on other SEs)
10:12 am on Aug 14, 2016 (gmt 0)

Junior Member from AU 

10+ Year Member Top Contributors Of The Month

joined:June 28, 2003
posts: 178
votes: 24


Regarding thin content: is it only a problem if Google applies a manual action, or does Google silently mark you down anyway?

Reason I ask is that my website contains URLs like /myindustry/news/new-cheese-announced and /myindustry/news/brie-is-good-for-you - where the first of those stories might be a full article, and the second of those stories might be a short paragraph from an RSS feed and a 'read article in full' button which takes the user to the story.

Am I best adding <meta name="robots" content="noindex, follow" /> to stories which are sourced via an RSS feed and just contain a leading paragraph? Would that help my website's score as a whole?

And does <meta name="robots" content="noindex, follow" /> stop AdSense from working properly?
2:07 pm on Aug 14, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3467
votes: 778


Giving an entire site a penalty because you don't like some pages is silly, just don't rank the pages you dislike.

It isn''t silly at all because it discourages bad behavior (or bad practices, if you prefer). When spewing out doorway pages carries no risk, a lot of people are going to think it's worth a try.

Also, Google may well have found that there's a correlation between large numbers of doorway pages (or other questionable SEO practices) and overall site quality.
6:56 am on Aug 16, 2016 (gmt 0)

Preferred Member from BG 

5+ Year Member Top Contributors Of The Month

joined:Aug 11, 2014
posts:547
votes: 174


What @EditorialGuy said. Penalties are placed to discourage not police. When there is no financial benefit of doing something you don't do it. There will still be some who will by accident, or misinformation or something else, but those cases will be to few and far in between to be considered a thing.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members