homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
wildcards in page names
Disallow: /store/scripts/emailFriend.asp*
jimsthoughts




msg:1527694
 4:58 pm on Jun 15, 2005 (gmt 0)

I've got a new client whom in his robots text has these 2 lines with wildcards at the end:

Disallow: /store/scripts/emailFriend.asp*
Disallow: /store/scripts/contactUs.asp?emailSubject*

All pages under the "store/scripts/" have been put in the supplemental results, and have no cache.

Could that wildcard at the end confuse Google (problems wht Yahoo too, not MSN) into not knowing what "emailFriend.asp*" or "contactUs.asp?emailSubject*" means, and thus, just not indexing everything under "scripts/"?

 

Sanenet




msg:1527695
 2:01 pm on Jun 16, 2005 (gmt 0)

* doesn't work as a wildcard in robots.txt

the syntax to disallow a directory is simply:

UserAgent:*
Disallow:/store/scripts/

Which would block everybody from indexing anything in scripts.

Disallow: /store/scripts/emailFriend.asp*
Disallow: /store/scripts/contactUs.asp?emailSubject*
Is either going to be ignored, or will lead to just those two pages being ignored.

www.robotstxt.org

jimsthoughts




msg:1527696
 8:28 pm on Jun 16, 2005 (gmt 0)

Is either going to be ignored, or will lead to just those two pages being ignored.

or could it wipe out the whole directory under that page?

Sanenet




msg:1527697
 10:20 pm on Jun 16, 2005 (gmt 0)

It COULD... but it shouldn't. * Should be ignored according to specs.

Reid




msg:1527698
 3:54 pm on Jun 21, 2005 (gmt 0)

a wildcard at the end of a line is pointless.

Disallow: /store/scripts/emailFriend.asp*
is no different than
Disallow: /store/scripts/emailFriend.asp

since any string matching "/store/scripts/emailFriend.asp" will be disallowed anyway.

Only googlebot (and a few select others) allow a wildcard in the disallow line, this should be directed only at specific robots. You should never use a wildcard in the disallow feild of user-agent: *

This robots.txt would cause an error for all bots except googlebot and for googlebot it would be pointless since any characters after the end of the query string are included in a match anyway.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved