Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Basic question

600 rules

         

omoutop

8:19 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi to all!

Is there any limit for the robots.txt file, I was thinking about creating approximately 600 unique rules to avoid spiders viewing some of my pages. Would this be a problem? I know the best way is to use a wildcard but in this case it can be used for certain resaons. What would be the problems that can arise if I use a robot.txt file with 600 or more rules (no idea how kb will this be)?

Any ideas will be appreciated, thank you in advance

dmorison

8:50 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As far as I know there is no recommended upper limit; but I would never use that many rules in robots.txt. If you want to get crawled, don't go there. Remember that a well behaved crawler has got to test every URL that it is considering fetching against your robots.txt; and if has to do 600 tests per URL that's a lot of processing time required and it's quite likely that a crawler just won't bother and not crawl anything.

Instead, you should look at changing the URL of everything you don't want crawled into a sub-directory and just block the directory with robots.txt. It sounds like that might be a lot of work on your end; but if you want the rest of your site crawled that's what I think you've got to do.

I could be wrong of course; but if I was googlebot there would without doubt be an upper limit to how much robots.txt processing I am prepeared to do in order to crawl your domain.

omoutop

9:46 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thank you very much dmorison,

So my idea for so many rules falls apart cause i dont want to prevent my site from geting crawled at the end of the day. I think you are right

Thanks again