Forum Moderators: bakedjake

Message Too Old, No Replies

Will this fix my htaccess problem?

my mod_rewrite fix was blocking robots.txt

         

berli

7:12 pm on Aug 16, 2003 (gmt 0)

10+ Year Member



I had the following lines in my .htaccess file:

# block no referrer, no u_a 
RewriteCond %{HTTP_REFERER} ^-?$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC]
RewriteRule .* - [F]

This works great -- except when bots decide to ask for my robots.txt file without give a user agent or referrer. Then they proceed to pull files from my site giving a user agent and referrer -- as if my robots.txt didn't exist. This is not good. So, I decided to waive the block if the bot is asking for robots.txt.

So, before I mess up my .htaccess, will the following code work?

# block no referrer, no u_a, unless robots.txt 
RewriteCond %{HTTP_REFERER} ^-?$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC]
RewriteCond %{HTTP_HOST}!^mywebsite.TLD/robots.txt$ [NC]
RewriteRule .* - [F]

Thanks in advance.

wkitty42

10:17 pm on Aug 16, 2003 (gmt 0)

10+ Year Member



close...

here is the block that i use for this...

# this ruleset is to stop blank user agents with blank referrers
# they are, however, allowed to access /robots.txt
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REQUEST_URI}!^robots\.txt$ [NC]
# let grub.org have access for local crawling setup
RewriteCond %{REQUEST_URI}!^/grub\.txt$ [NC]
RewriteCond %{REMOTE_ADDR}!^64\.241\.243\.83$
# let webmasterworld.com thru by name and/or ip
RewriteCond %{REMOTE_ADDR}!^westhost32\.westhost\.net$ [NC]
RewriteCond %{REMOTE_ADDR}!^216\.71\.84\.181$
RewriteRule .* /cgi-bin/noagent.cmd [L,T=application/x-httpd-cgi]

jdMorgan

10:25 pm on Aug 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



berli,

Using your code as an example:

# block no referrer, no u_a, unless robots.txt
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule !^robots\.txt$ - [F]

Jim
<edited> You don't need the [NC] (NoCase) flag for the RewriteConds, either. </edit>

berli

11:43 pm on Aug 16, 2003 (gmt 0)

10+ Year Member



Neat! That looks pretty slick. Thanks.

berli

11:50 pm on Aug 16, 2003 (gmt 0)

10+ Year Member



I noticed how you used the RewriteRule to introduce the exception (robots.txt).

I hope this isn't going to far off topic, but I have a folder which contains images that I store for a site on another server (I gave that server an exception in my hotlinking block code). However, I want to redirect anyone who tries to go straight to the folder, let's call it images/ to be redirected to the other server's home page. I tried this:

RewriteRule images/?$ [someotherserver...] [R]

but it completely failed to work. Any suggestions?

jdMorgan

12:56 am on Aug 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Berli,

The definition of "anyone who tries to go straight to the folder" is critical.

This may be sort of a "reverse image block" - you may want to block the images from being accessed unless the {HTTP_REFERER} is that remote site. In this case, the code would be identical to your existing image blocking (anti-hotlinking) code, except that you would allow the remote server by putting it's domain name in the RewriteCond which contains "%{HTTP_REFERER}" instead of the local server's name.

Jim

berli

4:41 pm on Aug 17, 2003 (gmt 0)

10+ Year Member



Yes, I think I explained myself poorly. I've already done what you described; looks a bit like this:

# prevents hotlinking 
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^-?$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?thatothersite [NC]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mysite [NC]
RewriteRule \.(jpe?g¦gif¦png¦bmp¦mov¦midi?)$ - [NC,F]

What I'm trying to do is, suppose somebody realizes the image files are in "images" on my website and decides to type in "mywebsite.TLD/images/". By default, that person would get a directory listing. Can I instead redirect that specific request to "someotherserver/index.html"?

jdMorgan

7:08 pm on Aug 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



berli,

The problem is that there is no way for a server to distinguish between a "direct" request and one referred from one of your acceptable referrers. If you allow blank referrers, then anyone can type in the URI, and gain access.

You could move the images you're hosting for the remote site to a separate subdirectory, and then allow ONLY the remote server to refer to them, but you'd still have a hole if you allowed blank referrers. And disallowing blank referrers will lead to many problems with users behind firewalls and corporate proxies.

Short of using an inteface where the remote server actually requests the images itself, and then serves them to the original requestor, I can't think of any bulletproof way of doing what you want to do.

Jim

berli

3:41 pm on Aug 18, 2003 (gmt 0)

10+ Year Member



Well, I could put a file called "index.html" in that directory and then put in a line 301'ing images/index.html to someotherserver, or even put a meta refresh on images/index.html doing the same thing, but I was hoping there was a more elegant solution.

Isn't there a crude method for blocking directory listings out there? That's all I'm looking for. (Well, I'd rather send them to someotherserver then send out a 403.) Unfortunately, the only documentation I could find online told you how to bounce ALL the contents of a directory somewhere else, which would result in a recursive loop (bad...).

jdMorgan

9:11 pm on Aug 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



berli,

If all you want to do is prevent someone from seeing a directory index listing, add


Options -Indexes

to your .htaccess file.

And if you want to redirect attempts to get that directory listing, then you'd use


RewriteRule ^images/?$ http://www.someplace-else.com/ [R=301,L]

Sorry, I thought you were trying to protect the images themselves by referrer, and was trying to point out the difficulty of doing it if the user already knows the URL *and* you allow blank referrers -- I guess I just didn't understand the question. This isn't the first time, and won't be the last! :)

Jim

berli

4:19 am on Aug 21, 2003 (gmt 0)

10+ Year Member



Worked like a charm!

I have no idea why adding the carat before "images" and changing the R to R=301,L worked, but it did. Cool!