homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Google indexing blocked content

 6:53 am on Jul 12, 2011 (gmt 0)

A while ago I had the following directive in robots.txt

User-Agent: *
Disallow: /cgi-bin/

but i had a problem with adsense not showing adverts on pages below the cgi-bin

for example cgi-bin/links/showpicture.cgi?ID=14063

I didn't want any content on the site under the cgi-bin indexed as it is all dupe content and the previous directive seemed to work just fine.

I changed the directive to
User-Agent: *
Disallow: /cgi-bin/

User-Agent: MediaPartners-Google
allow: /cgi-bin/

to allow adsense bot

Now google has started to index 80,000 pages under the cgi-bin.

Is my directive wrong ? I've searched and searched but i can't find a reason why they are indexing these pages...



 7:41 am on Jul 12, 2011 (gmt 0)

You have allowed Google in. Try adding

User-agent: Googlebot
Disallow: /cgi-bin

to what you already have or add the
meta robots noindex tag to all of the pages in /cgi-bin.

 6:49 pm on Jul 20, 2011 (gmt 0)

Then, after your robots.txt and sitemap file and meta tags are good to go, use Google Webmaster Tools to confirm everything. And then remove a directory whole-hog from Googlebot's reach:

-> Site configuration
--> Crawler access (where you can test robots.txt)
---> Remove URL

See also the following page linked-to from "Crawler access": "Do I need to make changes to content I want removed from Google?"


 8:17 pm on Jul 20, 2011 (gmt 0)


Thanks for the replies. What i don't understand is this...

I had cgi-bin blocked from all robots with an exception for User-agent: Mediapartners-Google.
When i tested a url under cgi-bin in webmaster tools it said


Blocked by line 59: Disallow: /cgi-bin

Googlebot Mediapartners-Google
Allowed by line 21: Allow: /cgi-bin/

so you would think that would be o.k.

I've made some changes but a week later there are still over a 100,000 cgi-bin pages in a site: command

as for removing them with the "Remove URL" tool.

Will that work for a directory ?

Do I really need to remove them if they are banned in robots.txt ?


 9:17 pm on Jul 20, 2011 (gmt 0)

Yes, you can remove whole directories, but not right this second because they're fixing a teeny weeny little bug in the "Remove" tool (different thread, I think over in the Google subforum). You don't have to remove them, but if you don't, they will stick around for months if not years.

The googlebot goes by its own rules ;)


 5:52 pm on Sep 10, 2011 (gmt 0)

try like this

You have allowed Google in.
User-agent: Googlebot
Disallow: /cgi-bin

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved