Welcome to WebmasterWorld Guest from 54.157.222.62

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and wildcards

robots.txt wildcarda

   
10:22 am on Jul 3, 2010 (gmt 0)

5+ Year Member



Hello,

I want to block urls like /search/?t=blahblah

I have two options but I don't know which is the correct:

/search/*t*
/search/?t*

Thank you
1:32 pm on Jul 3, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



wildcards aka pattern matching is not officially part of the robots.txt protocol. this means most of the big search engines support it but most of the smaller one won't.


According to Google's page [google.com...]
To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:

User-agent: Googlebot
Disallow: /private*/
10:45 am on Jul 4, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



according to the robots exclusion protocol (which doesn't include any wildcard extensions as supported by google) the matching occurs left-to-right and the correct option would be:
/search/?t
3:14 pm on Jul 4, 2010 (gmt 0)

5+ Year Member



Without * at the end?
9:20 pm on Jul 4, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yes, without the * at the end.
The standard is that any URL which starts with the string you specified will be matched.
9:36 pm on Jul 4, 2010 (gmt 0)

5+ Year Member



Ok, thank you
2:08 pm on Jul 7, 2010 (gmt 0)

5+ Year Member



Hello,

Finally Google took the wildcards.

Thank you
 

Featured Threads

Hot Threads This Week

Hot Threads This Month