Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

block all except in robots.txt

8:40 pm on May 26, 2007 (gmt 0)

New User

10+ Year Member

joined:Aug 17, 2006
votes: 0

hi all,
i have a site with more webpages (duplication content issues) to block than to allow.
is there a way in robots.txt to achieve this?

I do know how to block pages from being crawled, but since I have more to block than allow, I was thinking it is probably easier to do the opposite.

thanks in advance

2:26 pm on May 28, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
votes: 217

the easiest way would be to put all of your blocked pages into one directory and your good pages into another directory. then you could simply have one line in your robots.txt file blocking the entire bad directory.

if you do want to list individual pages on your robots.txt file be careful that your file doesn't get too big. i once had a client with a robots.txt file several hundred kb and the spiders had a hard time reading it. so avoid the extreme sizes and you'll be ok.