Welcome to WebmasterWorld Guest from 54.226.46.6

Forum Moderators: goodroi

Message Too Old, No Replies

Google indexing blocked content

     

fatpeter

6:53 am on Jul 12, 2011 (gmt 0)

10+ Year Member



A while ago I had the following directive in robots.txt

User-Agent: *
Disallow: /cgi-bin/

but i had a problem with adsense not showing adverts on pages below the cgi-bin

for example cgi-bin/links/showpicture.cgi?ID=14063

I didn't want any content on the site under the cgi-bin indexed as it is all dupe content and the previous directive seemed to work just fine.

I changed the directive to
User-Agent: *
Disallow: /cgi-bin/

User-Agent: MediaPartners-Google
allow: /cgi-bin/

to allow adsense bot

Now google has started to index 80,000 pages under the cgi-bin.

Is my directive wrong ? I've searched and searched but i can't find a reason why they are indexing these pages...

g1smd

7:41 am on Jul 12, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You have allowed Google in. Try adding

User-agent: Googlebot
Disallow: /cgi-bin


to what you already have or add the
meta robots noindex
tag to all of the pages in
/cgi-bin
.

Pfui

6:49 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Then, after your robots.txt and sitemap file and meta tags are good to go, use Google Webmaster Tools to confirm everything. And then remove a directory whole-hog from Googlebot's reach:

-> Site configuration
--> Crawler access (where you can test robots.txt)
---> Remove URL

See also the following page linked-to from "Crawler access": "Do I need to make changes to content I want removed from Google?"

fatpeter

8:17 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Hi

Thanks for the replies. What i don't understand is this...

I had cgi-bin blocked from all robots with an exception for User-agent: Mediapartners-Google.
When i tested a url under cgi-bin in webmaster tools it said


www.?.com/cgi-bin/?ID=?

Googlebot
Blocked by line 59: Disallow: /cgi-bin

Googlebot Mediapartners-Google
Allowed by line 21: Allow: /cgi-bin/


so you would think that would be o.k.

I've made some changes but a week later there are still over a 100,000 cgi-bin pages in a site: command

as for removing them with the "Remove URL" tool.

Will that work for a directory ?

Do I really need to remove them if they are banned in robots.txt ?

lucy24

9:17 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Yes, you can remove whole directories, but not right this second because they're fixing a teeny weeny little bug in the "Remove" tool (different thread, I think over in the Google subforum). You don't have to remove them, but if you don't, they will stick around for months if not years.

The googlebot goes by its own rules ;)

mikeavery11

5:52 pm on Sep 10, 2011 (gmt 0)



try like this

You have allowed Google in.
User-agent: Googlebot
Disallow: /cgi-bin
 

Featured Threads

Hot Threads This Week

Hot Threads This Month