Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Pages are indexed even after blocking in robots.txt


Robert_Charlton - 8:20 pm on Sep 9, 2012 (gmt 0)


That's one reason I like robots.txt for a quick control on query string "sort" parameters and the like. Sorted product URLs are very easily inserted into social media links by well-intentioned fans, The robots.txt file is a down and dirty way to stop crawling from generating a mess of duplicate content as well as messing up the quality of your site's googlebot crawl altogether.

I completely agree that robots.txt is a good way to keep such pages from being crawled, particularly on a large site where crawl budget is an issue. And yes, generally "a robots.txt blocked URL is HIGHLY unlikely to get much if any search traffic."

Where I've encountered problems, though, are in very different areas, with different concerns... eg, syndicated co-branded mirrors of an entire site placed in its own subdirectory on large daily newspapers. In this kind of situation, the pages did attract links, and we found that urls were being returned in the serps for competitive searches.

I've also encountered situations where development pages or information pages that clients wanted to keep out of the index, away from the eyes of competitors, were showing up on site:domain searches... not ranking competitively, but definitely not private.

These are areas, I feel, where you should not use robots.txt. I think it's helpful to understand both the situation and what you're trying to do... whether you want to prevent crawling a page's contents, or to prevent urls or "references" from appearing in the index... and to choose your methods accordingly.

There's no easy way to do both at once, because the references/links to a page can occur anywhere on the web.


Thread source:: http://www.webmasterworld.com/google/4490125.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com