Page is a not externally linkable
RichTC - 8:40 am on Apr 26, 2011 (gmt 0)
Talk about dancing to Googles ever changing guidelines.
As i see it the whole point of Robots.txt is to prevent bots crawling content that doesnt need to be crawled or indexed.
If you have an ecommerce site with say a section on blue widgets with 400 blue widgets available and you have 10 pages of 40 items, i would list your first page with the 40 on, no follow/ no index the pages 2-40 and exclude in the robots txt the individual pages to the buying cart as they are duplications.
Using robots.txt is the simplist way to do this and is what its designed for, I cant believe for one minute that google would give some kind of negative signal for a site that has thousands of blocked pages.
For years, I have simply blocked the long query string URLs via the robots.txt file
I would continue doing just that. IF it were found that google is for some strange reason using it as a negative signal, the net result would be webmasters not blocking pages and letting google bot crawl zillions of extra pages on the net that it doesnt need to - not a bright idea or likely imo