Forum Moderators: goodroi
User-agent: Googlebot
Disallow: /forum
Disallow: /index.php
Disallow: /nigeria?
Disallow: /?
Disallow: /*msg
---------------
I expected the last directive to prevent Googlebot from downloading any url with the string 'msg' in it, but googlebot seems to be ignoring that directive and downloading files like /forum/topic-12.msg45454.html. What could be wrong?
There are various mistakes you have made.
1) It should be Disallow: /forum/ (Note the trailing slash).
2) Wild cards such as /? and /*msg, /nigeria? are not part of the standard (although some bots do allow it, but Google isn't one of them). See relevent section from the above mentioned standard
"Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif"."
3) It probably pays to put the User-agent: * section last as some bots at either the first match for the User-agent or the *
disallow: /forums
will disallow a directory called forums and a file called forums in the root directory
disallow: /forums/
will disallow a directory called forums but not a file called forums.
I expected the last directive to prevent Googlebot from downloading any url with the string 'msg' in it, but googlebot seems to be ignoring that directive and downloading files like /forum/topic-12.msg45454.html. What could be wrong?
never use wildcards in the directives under user-agent: * because most bots can't handle wildcards in the disallow line but googlebot definitely can.