Welcome to WebmasterWorld Guest from 22.214.171.124 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Subscribe and Support WebmasterWorld
Google indexing blocked content fatpeter msg:4338356 6:53 am on Jul 12, 2011 (gmt 0) A while ago I had the following directive in robots.txt User-Agent: * Disallow: /cgi-bin/ but i had a problem with adsense not showing adverts on pages below the cgi-bin for example cgi-bin/links/showpicture.cgi?ID=14063 I didn't want any content on the site under the cgi-bin indexed as it is all dupe content and the previous directive seemed to work just fine. I changed the directive to User-Agent: * Disallow: /cgi-bin/ User-Agent: MediaPartners-Google allow: /cgi-bin/ to allow adsense bot Now google has started to index 80,000 pages under the cgi-bin. Is my directive wrong ? I've searched and searched but i can't find a reason why they are indexing these pages...
g1smd msg:4338369 7:41 am on Jul 12, 2011 (gmt 0)
You have allowed Google in. Try adding
Disallow: /cgi-bin to what you already have or add the
meta robots noindex
tag to all of the pages in
Pfui msg:4341857 6:49 pm on Jul 20, 2011 (gmt 0)
Then, after your robots.txt and sitemap file and meta tags are good to go, use Google Webmaster Tools to confirm everything. And then remove a directory whole-hog from Googlebot's reach: -> Site configuration --> Crawler access (where you can test robots.txt) ---> Remove URL See also the following page linked-to from "Crawler access": "Do I need to make changes to content I want removed from Google?"
fatpeter msg:4341921 8:17 pm on Jul 20, 2011 (gmt 0)
Hi Thanks for the replies. What i don't understand is this... I had cgi-bin blocked from all robots with an exception for User-agent: Mediapartners-Google. When i tested a url under cgi-bin in webmaster tools it said www.?.com/cgi-bin/?ID=? Googlebot Blocked by line 59: Disallow: /cgi-bin Googlebot Mediapartners-Google Allowed by line 21: Allow: /cgi-bin/ so you would think that would be o.k. I've made some changes but a week later there are still over a 100,000 cgi-bin pages in a site: command as for removing them with the "Remove URL" tool. Will that work for a directory ? Do I really need to remove them if they are banned in robots.txt ? lucy24 msg:4341950 9:17 pm on Jul 20, 2011 (gmt 0)
Yes, you can remove whole directories, but not right this second because they're fixing a teeny weeny little bug in the "Remove" tool (different thread, I think over in the Google subforum). You don't have to remove them, but if you don't, they will stick around for months if not years. The googlebot goes by its own rules ;) mikeavery11 msg:4360940 5:52 pm on Sep 10, 2011 (gmt 0)
try like this You have allowed Google in. User-agent: Googlebot Disallow: /cgi-bin