homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

block all except in robots.txt

5+ Year Member

Msg#: 3350803 posted 8:40 pm on May 26, 2007 (gmt 0)

hi all,
i have a site with more webpages (duplication content issues) to block than to allow.
is there a way in robots.txt to achieve this?

I do know how to block pages from being crawled, but since I have more to block than allow, I was thinking it is probably easier to do the opposite.

thanks in advance



WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 3350803 posted 2:26 pm on May 28, 2007 (gmt 0)

the easiest way would be to put all of your blocked pages into one directory and your good pages into another directory. then you could simply have one line in your robots.txt file blocking the entire bad directory.

if you do want to list individual pages on your robots.txt file be careful that your file doesn't get too big. i once had a client with a robots.txt file several hundred kb and the spiders had a hard time reading it. so avoid the extreme sizes and you'll be ok.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved