Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: goodroi

Message Too Old, No Replies

Google indexing blocked content

     
6:53 am on Jul 12, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2002
posts: 143
votes: 0


A while ago I had the following directive in robots.txt

User-Agent: *
Disallow: /cgi-bin/

but i had a problem with adsense not showing adverts on pages below the cgi-bin

for example cgi-bin/links/showpicture.cgi?ID=14063

I didn't want any content on the site under the cgi-bin indexed as it is all dupe content and the previous directive seemed to work just fine.

I changed the directive to
User-Agent: *
Disallow: /cgi-bin/

User-Agent: MediaPartners-Google
allow: /cgi-bin/

to allow adsense bot

Now google has started to index 80,000 pages under the cgi-bin.

Is my directive wrong ? I've searched and searched but i can't find a reason why they are indexing these pages...
7:41 am on July 12, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You have allowed Google in. Try adding

User-agent: Googlebot
Disallow: /cgi-bin


to what you already have or add the
meta robots noindex
tag to all of the pages in
/cgi-bin
.
6:49 pm on July 20, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Then, after your robots.txt and sitemap file and meta tags are good to go, use Google Webmaster Tools to confirm everything. And then remove a directory whole-hog from Googlebot's reach:

-> Site configuration
--> Crawler access (where you can test robots.txt)
---> Remove URL

See also the following page linked-to from "Crawler access": "Do I need to make changes to content I want removed from Google?"
8:17 pm on July 20, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2002
posts: 143
votes: 0


Hi

Thanks for the replies. What i don't understand is this...

I had cgi-bin blocked from all robots with an exception for User-agent: Mediapartners-Google.
When i tested a url under cgi-bin in webmaster tools it said


www.?.com/cgi-bin/?ID=?

Googlebot
Blocked by line 59: Disallow: /cgi-bin

Googlebot Mediapartners-Google
Allowed by line 21: Allow: /cgi-bin/


so you would think that would be o.k.

I've made some changes but a week later there are still over a 100,000 cgi-bin pages in a site: command

as for removing them with the "Remove URL" tool.

Will that work for a directory ?

Do I really need to remove them if they are banned in robots.txt ?
9:17 pm on July 20, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12700
votes: 244


Yes, you can remove whole directories, but not right this second because they're fixing a teeny weeny little bug in the "Remove" tool (different thread, I think over in the Google subforum). You don't have to remove them, but if you don't, they will stick around for months if not years.

The googlebot goes by its own rules ;)
5:52 pm on Sept 10, 2011 (gmt 0)

Junior Member

joined:Sept 10, 2011
posts:50
votes: 0


try like this

You have allowed Google in.
User-agent: Googlebot
Disallow: /cgi-bin