Why is google indexing this banned directory?

Forum Moderators: goodroi

Message Too Old, No Replies

Why is google indexing this banned directory?

ichthyous

3:26 pm on Jan 25, 2007 (gmt 0)

I banned my site's image download directory in robots.txt a while ago, but I am seeing all the contents of that directory showing up in the index again. The listings in the index are url only. Here is my robots.txt code:

User-agent: *
Disallow: /d/
Disallow: /?g2_view=core.DownloadItem*

When I put the directory /d/ in Google's robots.txt validator it reports it as being "allowed", even though I am sure that when i tested it the first time it was disallowed. Has something changed? I would like to remove all of these url-only image listings from google's index as well as all the other search engines. Any ideas?

Thanks

goodroi

2:15 pm on Jan 26, 2007 (gmt 0)

that should block google from accessing example.com/d/ UNLESS you have specific instructions for googlebot. if you do list googlebot in your robots.txt then googlebot will ignore the general instructions and only follow the instructions specific to it.

cheers

pageoneresults

2:19 pm on Jan 26, 2007 (gmt 0)

Has something changed? I would like to remove all of these url-only image listings from google's index as well as all the other search engines. Any ideas?

I believe the suggested method is to remove the Disallow: and place a robots metadata element on the pages not to be indexed. I can confirm that this works as I've been doing it that way now for a few years.

Typically the only way you can find those listings is through a site: search. They will not be shown for regular search queries, or at least I've not been able to get them to.

Googlebot has obeyed your robots.txt file and it does not index that content. But, it does index the URI, hence the reason for the URI only listing.

ichthyous

4:31 pm on Jan 26, 2007 (gmt 0)

Oh that's interesting. I have been trying to find any of those URLs come up through searches and haven't seen it yet...just through a site: search. However, there are a lot of clever people out there who could do a site search and find all of those URLs which means they can download every image off my site. Unfortuately these pages are dynamically generated and I can't add any robots intructions in the meta tags just for those pages.

pageoneresults

4:55 pm on Jan 26, 2007 (gmt 0)

However, there are a lot of clever people out there who could do a site search and find all of those URLs which means they can download every image off my site.

All they would have to do is browse to your robots.txt file unless of course you were cloaking that file. ;)

If there is something you don't want indexed, I find using a simple...

<meta name="robots" content="none">

...does what it is intended to do. Keep the page from getting indexed and no links are followed.

Listing anything in the robots.txt file is pretty much providing a map to that content. It shouldn't get indexed and usually doesn't. URI listings are just that, URIs only. You've alerted Google to their existence through the robots.txt file.