Welcome to WebmasterWorld Guest from 54.221.73.104

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and wildcards

robots.txt wildcarda

     

chms

10:22 am on Jul 3, 2010 (gmt 0)

5+ Year Member



Hello,

I want to block urls like /search/?t=blahblah

I have two options but I don't know which is the correct:

/search/*t*
/search/?t*

Thank you

goodroi

1:32 pm on Jul 3, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



wildcards aka pattern matching is not officially part of the robots.txt protocol. this means most of the big search engines support it but most of the smaller one won't.


According to Google's page [google.com...]
To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:

User-agent: Googlebot
Disallow: /private*/

phranque

10:45 am on Jul 4, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



according to the robots exclusion protocol (which doesn't include any wildcard extensions as supported by google) the matching occurs left-to-right and the correct option would be:
/search/?t

chms

3:14 pm on Jul 4, 2010 (gmt 0)

5+ Year Member



Without * at the end?

Dijkgraaf

9:20 pm on Jul 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, without the * at the end.
The standard is that any URL which starts with the string you specified will be matched.

chms

9:36 pm on Jul 4, 2010 (gmt 0)

5+ Year Member



Ok, thank you

chms

2:08 pm on Jul 7, 2010 (gmt 0)

5+ Year Member



Hello,

Finally Google took the wildcards.

Thank you