tangor

msg:4351884 | 5:56 am on Aug 16, 2011 (gmt 0) |
Either are correct, but bad bots will ignore... and will also use those "hints" as to what to rip. feed all by itself will work for bots that honor...
|
lucy24

msg:4351885 | 6:08 am on Aug 16, 2011 (gmt 0) |
:: cough, cough :: Both $ forms are incorrect in robots.txt [robotstxt.org], because it doesn't "do" Regular Expressions. | Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". |
| So if Disallow: /feed/ isn't working, you need to bring out the heavy artillery, starting with .htaccess.
|
tangor

msg:4351894 | 7:18 am on Aug 16, 2011 (gmt 0) |
Correct, the regex $ is not required. Done. Otherwise, it correct. Best method is to disallow ALL BOTS then list which bots ARE ALLOWED, but that put me in the minority (called whitelisting)...
|
phranque

msg:4352315 | 4:42 am on Aug 17, 2011 (gmt 0) |
while the robots exclusion protocol doesn't support globbing or regular expressions, many search engines (including G) support pattern matching extensions: http://www.google.com/support/webmasters/bin/answer.py?answer=156449 [google.com] the more important issue for your problem statement is that the Disallow syntax matches the url path left-to-right. therefore if you want to take advantage of REP extensions to pattern matching you can disallow a url ending with "feed" using: Disallow: /*feed$ however if you also/instead want to disallow a "feed" subdirectory url (i.e. ending with "feed/") you need a different rule: Disallow: /*feed/$ also note that without the end anchor in the pattern (the "$") you will match more than intended, such that disallowing the pattern "/*feed" will disallow urls such as "/feedme" and disallowing the pattern "/*feed/" will disallow urls such as "/feed/me"
|
|