Forum Moderators: phranque

Message Too Old, No Replies

robots.txt like .htaccess?

         

scorpion

11:37 pm on Feb 20, 2004 (gmt 0)

10+ Year Member



Does anybody know if robots.txt works per directory like .htaccess? I have multiple domains that I rewrite to various subdirectories but want to limit spidering of only one subdirectory, however, I'm not sure if placing a robots.txt in the given subdirectory will affect the entire IP...

Krapulator

12:44 am on Feb 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NO, robots.txt must be placed in the root directory of your site. Theres a good resource here: [searchengineworld.com...]

jdMorgan

1:13 am on Feb 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can, however, use mod_rewrite to redirect requests for robots.txt to different robots.txt files depending on the requested hostname, URL, filepath, user-agent, or any other variable(s) available to mod_rewrite.

I use just such a technique to internally redirect poorly-implemented robots to an alternate robots.txt file on my sites.


# Redirect some robots to simplified robots.txt file
RewriteCond %{HTTP_USER_AGENT} ^(NationalSpider¦Scatter/¦TeraBot/¦LinkTrekker$)
RewriteRule ^robots\.txt$ /robots_alt.txt [L]

All you need to do is to redirect based on {HTTP_HOST} (i.e. domain name), or based on {REQUEST_URI} or {REQUEST_FILENAME}, depending on how you want to do it.

Replace all broken pipe "¦" characters above with the solid pipe character from your keyboard.

Jim

scorpion

7:23 pm on Feb 21, 2004 (gmt 0)

10+ Year Member



Suppose you just wanted to block ia_spider or something, you wouldn't have to rewrite it to another spider file right? You could just use .htaccess forbid for that spider - would that work?

Uzil

6:50 pm on Feb 22, 2004 (gmt 0)

10+ Year Member



Hi scorpio
Like jdMorgan said, redirecting or not, wich robot.txt is valid, to block especific spider is:

User-agent: ia_spider
Disallow: /

Tell "ia_spider" robot, to leave this site alone. All other robots are welcome

scorpion

8:43 pm on Feb 22, 2004 (gmt 0)

10+ Year Member



How about:

RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.*
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$
RewriteRule ^/~quux/foo/arc/.+ - [F]

From the rewrite guide, although I am not sure what the remote_addr is for...