homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
how to disallow feed in robot.txt readable by all bots
bhavana



 
Msg#: 4351875 posted 5:34 am on Aug 16, 2011 (gmt 0)

how to disallow feeds in robots.txt i tried disallow /feed/ it is not working how disallow url ending with feed
is it disallow /feed$ or disallow /feed/$

 

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4351875 posted 5:56 am on Aug 16, 2011 (gmt 0)

Either are correct, but bad bots will ignore... and will also use those "hints" as to what to rip.

feed all by itself will work for bots that honor...

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4351875 posted 6:08 am on Aug 16, 2011 (gmt 0)

:: cough, cough ::

Both $ forms are incorrect in robots.txt [robotstxt.org], because it doesn't "do" Regular Expressions.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot".


So if

Disallow: /feed/

isn't working, you need to bring out the heavy artillery, starting with .htaccess.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4351875 posted 7:18 am on Aug 16, 2011 (gmt 0)

Correct, the regex $ is not required. Done. Otherwise, it correct. Best method is to disallow ALL BOTS then list which bots ARE ALLOWED, but that put me in the minority (called whitelisting)...

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4351875 posted 4:42 am on Aug 17, 2011 (gmt 0)

while the robots exclusion protocol doesn't support globbing or regular expressions, many search engines (including G) support pattern matching extensions:
http://www.google.com/support/webmasters/bin/answer.py?answer=156449 [google.com]

the more important issue for your problem statement is that the Disallow syntax matches the url path left-to-right.

therefore if you want to take advantage of REP extensions to pattern matching you can disallow a url ending with "feed" using:
Disallow: /*feed$

however if you also/instead want to disallow a "feed" subdirectory url (i.e. ending with "feed/") you need a different rule:
Disallow: /*feed/$

also note that without the end anchor in the pattern (the "$") you will match more than intended, such that disallowing the pattern "/*feed" will disallow urls such as "/feedme" and disallowing the pattern "/*feed/" will disallow urls such as "/feed/me"

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved