Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- robots.txt


awoyo - 7:35 pm on Jul 13, 2001 (gmt 0)


mod-rewrite and mod_access are modules that are, or can be, compiled into the Apache web server and accessed by .htaccess. They allow for the testing of User Agent in the form of ... (for mod_rewrite)

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro
RewriteRule ^.*$ x.html [L]

where EmailSiphon would be the User Agent and x.html would be the file the User Agent is redirected to.

or mod_access will simply deny the user based on User Agent, or IP address.

SetEnvIf User-Agent EmailWolf GoAway
SetEnvIf User-Agent ExtractorPro GoAway
SetEnvIf User-Agent Wget GoAway
Order Allow,Deny
Allow from all
Deny from env=GoAway
Deny from 202.
Deny from 203.

Here EmailWolf is set to env=GoAway and env=GoAway is denied access.

Also, as you can see at the bottom, we're denying access to two entire sets of IP blocks. This type of access control will allow you to deny access to just one IP address, as in 202.21.45.169, or worm your way down the octet, as in 202.21.45., which would deny access to 255 ip addresses belonging to that block.

If you aren't sure what's compiled into your server software you can do httpd -l from a Telnet connection. This should work even if you don't have root. If not, just ask your admin.

If you're not running Apache, but perhaps, IIS, then, I'm sorry for the long huff-n-puff. I know absolutely nothing about IIS. :)

Jim


Thread source:: http://www.webmasterworld.com/robots_txt/107.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com