Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt help

         

Northstar

4:46 pm on Sep 10, 2006 (gmt 0)

10+ Year Member



I would like to block some duplicate pages that my script is producing via robots.txt.

I want to block this page: http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

But want to keep this page: http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

Would this work to block the first URL without hurting the second one?

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgilv

Or would it be better to write out the full URL for each page I want to block like this.

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

I need to be very careful not to block the second URL (dirs2.cgi). Would there be any danger of blocking the second URL with any of the above robots.txt disallow's?

goodroi

7:50 pm on Sep 20, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi Northstar,

I would not recommend using robots.txt to block specific pages. If you have a very large site you could end up having a 1mb robotos.txt file and trust me search engines don't like that. Have you thought about using .htaccess to resolve the situation?