|Removing Thin Content - 410 vs robots.txt|
We have several folders on our site that at one time contained thin content. Everything within has since been set to return a 410, but due to the high volume of pages involved, we are looking for a better way to tell Google everything has been deleted rather than wait for it to recrawl each page.
Google advise using robots.txt to deny access to certain paths, but we have also heard that this can look suspicious, as if trying to hide thin content from them.
Most of the pages were 410ed last year after a Panda strike, the rest earlier this year. WMT is still discovering them. Recent events have caused us to investigate the possibility that Panda’s tolerance is decreasing and we are falling back into its clutches due to the perception of our site, not the reality.
We would be most grateful for any guidance from others who have experienced similar. Thank you.
It might be a good idea to verify that pages in those folders do actually return a 410 response. If it does you do need to wait for G to crawl them. The only way they offer to remove content requires that you do not block those pages or folders in robots.txt.
Look in your Google Webmasters Tools account to get instructions from Google on how to remove pages, folders and directories from their index. If the pages exist you need to make sure they don't have metatags of "index, follow" and do not block them in robots.txt or they can't crawl to see that the pages have noindex tags. When you have fixed the metatags, then notify Google via GWT to remove that content from their index. It does not always work with one try, they still bug me about directories that were properly removed years ago and do not exist anywhere except their imagination.
Edited to add: Make sure that pages you want to disappear are not showing up in your sitemaps.
Always looking for the easy life, once the URL requests correctly invoke a 410 Gone response I feel there's nothing more that needs doing and let Google reindex at their own rate.
What g1smd said + welcome to WebmasterWorld!
410, no doubt about it. If the pages are gone that is the only message you want to send Google. If the pages happen to be in a directory you can use your GWT account and submit a removal request for the entire directory so that you are not causing search visitors to bounce but it's 100% up to Google to catch up with your pages at this point.
Welcome to WW.
Thank you all for taking the time to reply.
The pages are long gone, replaced by a script that sends a 410 header and redirects visitors to a Page Gone notice. The sitemap doesn’t include them, nor do any pages on the site link to them. The pages in question do not show in serps either, just WMT, so not sure if the Remove URL tool would apply.
Comforting to hear not2easy say this can take a while. Google discover a fresh batch of 410s daily so it seems all we need apply is a little more patience.
Thanks again, you’ve put my mind at ease.
|sends a 410 header and redirects visitors to a Page Gone notice. |
If the URL shown in the browser address bar changes (is "redirected") then your implementaion is broken.
Hopefully, you meant to say "sends a 410 header and shows a notice informing that the page has Gone at the originally requested URL".
Don't take this as a criticism but we are hot on using the right terminology here: a redirect is a very specific thing, one that causes the browser to make a new request for a different URL after receiving a 301, 302 or 307 response to the original request.
g1smd, you are correct, I meant to say it shows a notice, not redirects to one. My wording was poor, I appreciate you seeking confirmation.
Thanks for the clarification. There's a lot of times when people have been found to 301 redirect and then return 404 or 410 at the second URL. That's a disaster. Glad you're not affected.
why remove thin content? Google loves it.