Welcome to WebmasterWorld Guest from 54.147.20.131

Forum Moderators: goodroi

Message Too Old, No Replies

block all except in robots.txt

     

visualscope

8:40 pm on May 26, 2007 (gmt 0)

5+ Year Member



hi all,
i have a site with more webpages (duplication content issues) to block than to allow.
is there a way in robots.txt to achieve this?

I do know how to block pages from being crawled, but since I have more to block than allow, I was thinking it is probably easier to do the opposite.

thanks in advance

goodroi

2:26 pm on May 28, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the easiest way would be to put all of your blocked pages into one directory and your good pages into another directory. then you could simply have one line in your robots.txt file blocking the entire bad directory.

if you do want to list individual pages on your robots.txt file be careful that your file doesn't get too big. i once had a client with a robots.txt file several hundred kb and the spiders had a hard time reading it. so avoid the extreme sizes and you'll be ok.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month