Bad Content: If You Can't NOINDEX, Will Robots.txt Be Ok?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Bad Content: If You Can't NOINDEX, Will Robots.txt Be Ok?

Planet13

4:54 pm on Mar 16, 2011 (gmt 0)

Hi there, Everyone:

i am using a CMS directory and many of my pages don't have any content on them.

I can't unfortunately add a NOINDEX meta tag to those pages without noindexing the whole site.

Will blocking them via robots.txt work if they have ALREADY been indexed by google and the other search engines?

I am doing this because of the farmer / panda update, I would not want google to see a lot of "low quality" pages.

Thanks in advance.

tedster

6:13 pm on Mar 16, 2011 (gmt 0)

Disallowing in robots.txt is one of the suggestions from Google, recommended when you plan to eventually improve that content. See these in-depth recommendations from Google's JohnMu: [google.com...]

As to whether it will "work" - no one seems to have any reports so far of any change helping them recover lost rankings.

goodroi

11:06 am on Mar 17, 2011 (gmt 0)

Using robots.txt will eventually get those pages out of Google's index. Since you mention they are low quality I doubt Google is crawling them often so it may take weeks or months for all of the low quality pages to be de-indexed. Which ios probably why (as tedster points out) no one has yet confirmed this made their rankings bounce back for the rest of the site.

aakk9999

1:25 pm on Mar 17, 2011 (gmt 0)

Using robots.txt will eventually get those pages out of Google's index

Yes, but only if there are no external links to these pages, otherwise they may hang in the index with no meta description shown.

FredOPC

12:15 am on Mar 21, 2011 (gmt 0)

I feel I must chime in here. Read John Mu's comments carefully. He is NOT suggesting disallowing with robots.txt. He's saying to add a meta robots tag with NOINDEX in cases where you are working on improving content. Those are very different! The former blocks crawling of pages or wildcard-matching of pages. The latter tells Google to remove the pages completely, and must be applied on a page-by-page basis.

John actually suggests NOT disallowing crawling of those pages, because if you do, Googlebot is blind to them (including the NOINDEX meta tag).

tedster

12:35 am on Mar 21, 2011 (gmt 0)

Thank you, Fred. I had not read those comments closely enough - you are correct. In fact, he specifically says "make sure that they're not disallowed by the robots.txt file."

Now re-thinking the opening question, I now assume that robots.txt will NOT be OK. It sounds like Google might be scoring a site based on the past record of URLs. It's still a bit ambiguous because that answer has a certain specific context, but my assumption now is we need to remove the URLs or enhance their content. If enhancing the content takes time to do, then us noindex during the process.