Welcome to WebmasterWorld Guest from 18.104.22.168 , register , free tools , login , search , pro membership , help , library , announcements , recent posts , open posts Pubcon Platinum Sponsor 2014
Can't succeed in blocking directories in robots txt The paths are already in robots txt but google still crawles those pages. stephang msg:3850502 9:41 am on Feb 16, 2009 (gmt 0) Hello everybody!
I am trying to block some specific url on my website using the robots txt but when I checked in google webmaster tools.. I can see that those urls are still crawled by robots.
So I'm trying to block the directory "contact_us" taking into consideration that aaa and bbb are variable.
And I've inserted the following in my robots txt but does not seem to be working.
Or should I insert this one?
Thank you all for your kind replies! :)
jdMorgan msg:3850574 12:45 pm on Feb 16, 2009 (gmt 0)
Your second snippet should work -- but only for search engines that explicitly state on their "webmaster help" page that they support wild-cards in robots.txt.
This is NOT part of the Standard for Robot Exclusion, but is a semi-proprietary extension. The standard implementation uses prefix-matching and does not support wild-card URL-paths. For those search engines not supporting wild-card extensions, you will need to state the "aaa" and "bbb" URL-path-parts explicitly, or re-architect your URL structure so that those variables occur at the end of your URL-paths instead of at the beginning. This is something to consider for your next new site or existing site re-design.
For the search engines that *do* support wild-carding, you can try something like:
User-agent: googlebot User-agent: slurp User-agent: msnbot Disallow: /*/*/contact_us/ # User-agent: * Disallow: / This would tell the "big three" not to fetch your /contact_us subdirectories, while telling all others not to fetch anything on your site.
stephang msg:3851205 4:09 am on Feb 17, 2009 (gmt 0)
Thank you jdMorgan.
But why would I block all other robots from fetching anything from my site?
tangor msg:3851206 4:13 am on Feb 17, 2009 (gmt 0)
All robots are not equal... and some are worse than others!
In reality you allow the bots that bring benefit to your site, ie. visitors. All others need not apply.
choster msg:3851885 10:29 pm on Feb 17, 2009 (gmt 0)
Operators of bad bots are unlikely to conform to robots.txt directives. If a bot is badly behaved, you'll need to block it at the server level with other means. stephang msg:3852126 6:27 am on Feb 18, 2009 (gmt 0)
I just came back to tell you that the following works perfectly.
Thank you all! :)