Forum Moderators: goodroi
User-agent: *
Disallow: /d/
Disallow: /?g2_view=core.DownloadItem*
When I put the directory /d/ in Google's robots.txt validator it reports it as being "allowed", even though I am sure that when i tested it the first time it was disallowed. Has something changed? I would like to remove all of these url-only image listings from google's index as well as all the other search engines. Any ideas?
Thanks
Has something changed? I would like to remove all of these url-only image listings from google's index as well as all the other search engines. Any ideas?
I believe the suggested method is to remove the Disallow: and place a robots metadata element on the pages not to be indexed. I can confirm that this works as I've been doing it that way now for a few years.
Typically the only way you can find those listings is through a site: search. They will not be shown for regular search queries, or at least I've not been able to get them to.
Googlebot has obeyed your robots.txt file and it does not index that content. But, it does index the URI, hence the reason for the URI only listing.
However, there are a lot of clever people out there who could do a site search and find all of those URLs which means they can download every image off my site.
All they would have to do is browse to your robots.txt file unless of course you were cloaking that file. ;)
If there is something you don't want indexed, I find using a simple...
<meta name="robots" content="none"> ...does what it is intended to do. Keep the page from getting indexed and no links are followed.
Listing anything in the robots.txt file is pretty much providing a map to that content. It shouldn't get indexed and usually doesn't. URI listings are just that, URIs only. You've alerted Google to their existence through the robots.txt file.