Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

wildcards in the Disallow field,

non standard?

6:01 pm on Jun 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 26, 2001
votes: 0

I have site that is built in CF that I rebuild the pages in static html. So, on the server, we have 2 sets of identical pages with one url looking like:


And the other looking like:


As a result I blocked the cold fusion pages in robots.txt like this:


Anyhow, Y! has gone and crawled these pages and I have a pretty good hunch that I have set off a dupe penalty of some sort as the site is absolutely buried in the serps.

Brett's validator shows that wildcards in the disallow field are nonstandard. If that is the case, how can I block the CF pages easily.

3:33 pm on June 11, 2004 (gmt 0)

Administrator from CA 

WebmasterWorld Administrator bakedjake is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 8, 2003
votes: 61

You can't do it reliably through robots.txt.

Use a 403.

4:19 pm on June 11, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 27, 2003
votes: 0

The wildcard is recognized by Google but not generally - it's not in the robots.txt standard.

You could disallow the .cfm pages by disallowing a left-justified substring that matches the pages that you want to disallow but no other pages. For example, if there is no other page or directory beginning with /p then

Disallow: /p

will do the trick.