Welcome to WebmasterWorld Guest from 54.144.107.83

Forum Moderators: goodroi

Message Too Old, No Replies

wildcards in page names

Disallow: /store/scripts/emailFriend.asp*

     
4:58 pm on Jun 15, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator 10+ Year Member

joined:Sept 26, 2002
posts:153
votes: 5


I've got a new client whom in his robots text has these 2 lines with wildcards at the end:

Disallow: /store/scripts/emailFriend.asp*
Disallow: /store/scripts/contactUs.asp?emailSubject*

All pages under the "store/scripts/" have been put in the supplemental results, and have no cache.

Could that wildcard at the end confuse Google (problems wht Yahoo too, not MSN) into not knowing what "emailFriend.asp*" or "contactUs.asp?emailSubject*" means, and thus, just not indexing everything under "scripts/"?

2:01 pm on June 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 22, 2002
posts:1001
votes: 0


* doesn't work as a wildcard in robots.txt

the syntax to disallow a directory is simply:

UserAgent:*
Disallow:/store/scripts/

Which would block everybody from indexing anything in scripts.

Disallow: /store/scripts/emailFriend.asp*
Disallow: /store/scripts/contactUs.asp?emailSubject*
Is either going to be ignored, or will lead to just those two pages being ignored.

www.robotstxt.org

8:28 pm on June 16, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator 10+ Year Member

joined:Sept 26, 2002
posts:153
votes: 5


Is either going to be ignored, or will lead to just those two pages being ignored.

or could it wipe out the whole directory under that page?
10:20 pm on June 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 22, 2002
posts:1001
votes: 0


It COULD... but it shouldn't. * Should be ignored according to specs.
3:54 pm on June 21, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


a wildcard at the end of a line is pointless.

Disallow: /store/scripts/emailFriend.asp*
is no different than
Disallow: /store/scripts/emailFriend.asp

since any string matching "/store/scripts/emailFriend.asp" will be disallowed anyway.

Only googlebot (and a few select others) allow a wildcard in the disallow line, this should be directed only at specific robots. You should never use a wildcard in the disallow feild of user-agent: *

This robots.txt would cause an error for all bots except googlebot and for googlebot it would be pointless since any characters after the end of the query string are included in a match anyway.