homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Regular Expressions, robots.txt, and robotstxt.org
Funtick




msg:3721433
 8:26 pm on Aug 12, 2008 (gmt 0)

Googlebot page references robotstxt.org as a de-facto 'robots exclusion protocol'. However, this organization does not list anything regular-expression related, although Googlebot properly understands astericks in robots.txt:

Disallow: *add_to_cart*

robotstxt.org is dead...

 

jdMorgan




msg:3721458
 8:59 pm on Aug 12, 2008 (gmt 0)

The robots.txt 'protocol' does not support regular expressions, but neither does Google. They support 'wild-cards' as denoted by asterisks, but not formal regular expressions.

Several search engines support various 'extensions' to the robots.txt protocol. Webmasters must take care that these proprietary extensions are only used in robots.txt policy records which apply to those specific robots that support them.

The effects of using a wild-card URL-path in a policy record for a robot that doesn't understand wild-cards might range from 'no effect' to 'disastrous'.

Jim

Funtick




msg:3721488
 9:57 pm on Aug 12, 2008 (gmt 0)

I know it. However, _Google_ calls it 'regular expressions':
"Using regular expressions in your robots.txt file can allow you to easily block large numbers of URLs." (see bottom of page)
[google.com...]

'protocol' is de-facto what people call it; it does not have any associated RFC:
[en.wikipedia.org...]

robotstxt.org was born as a supporting website for (closed now) robots-request@nexor.co.uk mailing list; their database and info is extremely outdated.

g1smd




msg:3723229
 7:24 pm on Aug 14, 2008 (gmt 0)


You don't the second * in the rule.

The rule matches from the left anyway.

Wildcards are only needed at the beginning or in the middle, not at the end.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved