Forum Moderators: goodroi
I am trying to block some specific url on my website using the robots txt but when I checked in google webmaster tools.. I can see that those urls are still crawled by robots.
So I'm trying to block the directory "contact_us" taking into consideration that aaa and bbb are variable.
http://www.example.com/aaa/bbb/contact_us/
And I've inserted the following in my robots txt but does not seem to be working.
User-agent: *
Disallow: /contact_us/
Or should I insert this one?
User-agent: *
Disallow:/*/*/contact_us/
Thank you all for your kind replies! :)
This is NOT part of the Standard for Robot Exclusion, but is a semi-proprietary extension. The standard implementation uses prefix-matching and does not support wild-card URL-paths. For those search engines not supporting wild-card extensions, you will need to state the "aaa" and "bbb" URL-path-parts explicitly, or re-architect your URL structure so that those variables occur at the end of your URL-paths instead of at the beginning. This is something to consider for your next new site or existing site re-design.
For the search engines that *do* support wild-carding, you can try something like:
User-agent: googlebot
User-agent: slurp
User-agent: msnbot
Disallow: /*/*/contact_us/
#
User-agent: *
Disallow: /
Jim