Welcome to WebmasterWorld Guest from 54.167.40.25

Forum Moderators: goodroi

Message Too Old, No Replies

Regular Expressions, robots.txt, and robotstxt.org

     
8:26 pm on Aug 12, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 11, 2008
posts: 50
votes: 0


Googlebot page references robotstxt.org as a de-facto 'robots exclusion protocol'. However, this organization does not list anything regular-expression related, although Googlebot properly understands astericks in robots.txt:

Disallow: *add_to_cart*

robotstxt.org is dead...

8:59 pm on Aug 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


The robots.txt 'protocol' does not support regular expressions, but neither does Google. They support 'wild-cards' as denoted by asterisks, but not formal regular expressions.

Several search engines support various 'extensions' to the robots.txt protocol. Webmasters must take care that these proprietary extensions are only used in robots.txt policy records which apply to those specific robots that support them.

The effects of using a wild-card URL-path in a policy record for a robot that doesn't understand wild-cards might range from 'no effect' to 'disastrous'.

Jim

9:57 pm on Aug 12, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 11, 2008
posts:50
votes: 0


I know it. However, _Google_ calls it 'regular expressions':
"Using regular expressions in your robots.txt file can allow you to easily block large numbers of URLs." (see bottom of page)
[google.com...]

'protocol' is de-facto what people call it; it does not have any associated RFC:
[en.wikipedia.org...]

robotstxt.org was born as a supporting website for (closed now) robots-request@nexor.co.uk mailing list; their database and info is extremely outdated.

7:24 pm on Aug 14, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0



You don't the second * in the rule.

The rule matches from the left anyway.

Wildcards are only needed at the beginning or in the middle, not at the end.