Forum Moderators: phranque

Message Too Old, No Replies

Proper RegEx for URL Subdomains?

RegEx

         

edelen

2:43 am on Apr 20, 2007 (gmt 0)

10+ Year Member



I've been trying to setup hotlink protection for my images, but I've run into a series of issues with what constitutes the proper regular expression for matching a portion of a URL.

I wish to allow access to a few sites and their subdomains, say "http://bw.example.com" or "http://color.example.com" and a few other domains that also use subdomains.

My .htaccess file has contained the following "RewriteCond %{HTTP_REFERER}" options for blocking hotlinks:

!^http://(.+\.)?example\.com/ [NC] -- This failed
!^http://([-a-z0-9]+.)?example\.com [NC] -- Failed, also

Here's what the rest of the code look like:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://([-a-z0-9]+.)?example\.com [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ - [F]

I copied those two failed examples from respected sites, so I don't know what the deal is.

What regular expression would do the job?

Thanks in advance!

[edited by: jdMorgan at 3:22 am (utc) on April 20, 2007]
[edit reason] Example.com. Please see TOS. [/edit]

jdMorgan

3:35 am on Apr 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you don't say exactly what you mean by "doesn't work," but I'd suggest:

RewriteEngine on
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)*example\.com
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)*your_own_domain\.com
RewriteRule \.(jpe?g¦gif¦bmp¦png)$ - [F]

and be sure to flush your browser cache after any change to this access-control code and/or any successful load of any given image -- If the image is in your browser cache, the browser will show it from there instead of fetching it from your server, so no amount of server-side code will stop that.

The principal difference between the code above and what you had is that this accounts for sub-sub-domains as well, such as www.color.example.com or color.www.example.com, or even www.not.color.example.com. It also uses more compact notation for several things, eliminating wasted CPU cycles.

Replace the broken pipe "¦" characters above with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

edelen

4:20 am on Apr 20, 2007 (gmt 0)

10+ Year Member



Jim,

First of all, thank you for your commitment to this forum. I've seen how much time you put into helping other people here and I deeply appreciate that kind of effort.

As for the failures...

I have two sites on the same server. One is an older iteration of my current site. I'm routing all the traffic to the new site. The old site's .htaccess handles hotlinking like a pro. The new site doesn't. I can't tell you why. I am using the identical rewrites.

The new site's hotlink code simply isn't stopping hotlinkers. I don't know why. It's an exact copy of the code at the other site that seems to work fine. Not a space or period out of place.

The new site is a WordPress blog with an images folder at the same root level as the main WordPress files. The .htaccess file at that level simply doesn't work, no matter where I put the hotlink-blocking code in relation to other code in the file. So I went to the images directory and altered its .htaccess file instead. That works, but not always. That's what I don't get. It lets some hotlinks through that should be caught by the blocker. But not all hotlinks.

That made me wonder if the regex wasn't up to snuff for the traffic I'm getting at the new site. Perhaps the old site leaked, too, but its traffic is so low now that I never get examples of a leak. The new site is very leaky. Can't tell you why. I'm limiting the access to my current site, the old site, a couple feed sites, and the major three search engines, yet I'm seeing my images on sites totally unrelated to those, sites that should have been blocked.

That's why I wondered about my regex setup.

edelen

4:35 am on Apr 20, 2007 (gmt 0)

10+ Year Member



Oh, forgot to mention...

Yes, I'm flushing my cache. I'm also checking images that aren't cached anywhere. Each check is with a new link, just to reduce that possibility of checking a cached file.