Forum Moderators: Robert Charlton & goodroi
Adding a Disallow rule in the Robots.txt file based on the whole directory of /widget-thing/. Although, I heard that Google may not remove pages, only not re-visit them again to update.
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.
Ever seen "A description for this result is not available because of this site's robots.txt"?
You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).
<snip>
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.
Google seems to be contradicting themselves. What to do if you don't want your content crawled or indexed? It appears there is no clear policy that is respected.
your site includes *content* that you don't want search engines to index.
[edited by: JD_Toims at 9:03 pm (utc) on Apr 3, 2014]
These automated pages hold little value as users to do not directly use these pages as part of their navigational path. I then want to remove this section from being link-able on my site.
Google seems to be contradicting themselves. What to do if you don't want your content crawled or indexed? It appears there is no clear policy that is respected.
Wording, meaning and execution is not clear.
Google won't crawl or index the content of pages
I could find unique text on pages which I told them not to crawl and decided enough was enough
You need a robots.txt file only if your site includes content that you don't want search engines to index.
Would it be safe to 301 this entire section of automated query pages back to the ... page?
the ultimate goal is getting these pages out of Google's Indexed and Google does not see these URLs as part of my main site.