homepage Welcome to WebmasterWorld Guest from 54.227.11.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
wildcards in the Disallow field,
non standard?
pmac




msg:1528002
 6:01 pm on Jun 10, 2004 (gmt 0)

I have site that is built in CF that I rebuild the pages in static html. So, on the server, we have 2 sets of identical pages with one url looking like:

/products.cfm?CatID=38

And the other looking like:

/keyword.html

As a result I blocked the cold fusion pages in robots.txt like this:

Disallow:/*.cfm

Anyhow, Y! has gone and crawled these pages and I have a pretty good hunch that I have set off a dupe penalty of some sort as the site is absolutely buried in the serps.

Brett's validator shows that wildcards in the disallow field are nonstandard. If that is the case, how can I block the CF pages easily.

 

bakedjake




msg:1528003
 3:33 pm on Jun 11, 2004 (gmt 0)

You can't do it reliably through robots.txt.

Use a 403.

tschild




msg:1528004
 4:19 pm on Jun 11, 2004 (gmt 0)

The wildcard is recognized by Google but not generally - it's not in the robots.txt standard.

You could disallow the .cfm pages by disallowing a left-justified substring that matches the pages that you want to disallow but no other pages. For example, if there is no other page or directory beginning with /p then

Disallow: /p

will do the trick.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved