Forum Moderators: phranque

Message Too Old, No Replies

Using Several Rules in one htaccess file

redirecting and stopping hotlinking

         

grandma genie

6:52 pm on Aug 11, 2010 (gmt 0)

10+ Year Member



Hello,

If your htaccess file has a rule in place, such as blocking hotlinking:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?websiteA.com(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://(www\.)?websiteA.com(/)?.*$ [NC]
RewriteRule \.(jpe?g|gif|bmp|png|ico)$ images/hotlinked.gif [L,NC]

And you want to add another one for redirecting external traffic:

RewriteEngine on
#
# Externally redirect certain non-canonical hostnames to canonical hostname, preserving http/https protocol
# non-www xyz.plus.com hostname
RewriteCond %{HTTP_HOST} ^xyz\.plus\.com [NC,OR]
# FQDN-format www.websiteA.com or with appended port number
RewriteCond %{HTTP_HOST} ^www\.websiteA\.com(\.|\.?:[0-9]+)$ [NC,OR]
# www- or non-www websiteB.com
RewriteCond %{HTTP_HOST} ^(www\.)?websiteB\.com [NC]
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.*)$ http%2://www.websiteA.com/$1 [R=301,L]

What is the proper way to set this up so Apache handles both requests. Should all requests have the Last flag at the end of each request? Also, in the redirect example, how do you indicate that the xyz.plus.com is on a secure server (https)?

Thank you.

Jeannie

jdMorgan

7:37 pm on Aug 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



General guidelines:

Access-control rules first -- There is no use wasting server resources redirecting unwelcome requests. These rules should end with an [F] flag (to generate 403-Forbidden response).

External URL-to-URL redirects next, in order from most-specific (fewest URL requests affected) to least-specific (more URL requests affected). These redirect rules specify a protocol and hostname in the destination and/or specify an [R=301] or [R=302] and an [L] flag as well.

Internal URL-to-filepath rewrites last, again in order from most- to least-specific. End these rules with a [L] flag unless you know a very good reason not to.

Delete your browser cache after making any changes to server-side code.

RewriteEngine on
#
# This rule needed if you use a custom 403 error page to prevent looping on Forbidden responses
RewriteRule ^path-to-my-custom-error-page\.html$ - [L]
#
# Return 403-Forbidden to unwelcome/malicious user-agents
RewriteCond %{HTTP_USER_AGENT} ^Morfeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Toata\ Dragostea [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ZmEu [NC]
RewriteRule ^ - [F]
#
# Return 403-Forbidden response for hotlinked image requests
# (See alternate internal-rewrite rule at the bottom)
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?websiteA.com [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?websiteB.com [NC]
RewriteRule \.(jpe?g|gif|bmp|png|ico)$ - [F]
#
# (Example only) Redirect specific old page URL to new page URL
RewriteRule ^old.html$ http://www.websiteA.com/new.html [R=301,L]
#
# Externally redirect old mod_userdir-format requests to canonical hostname
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /~adminxx/[^\ ]*\ HTTPS/
RewriteCond %{HTTP_HOST} ^(www\.)?our-big-hosting-company\.com [NC]
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.*)$ http%2://www.websiteA.com/$1 [R=301,L]
#
# Externally redirect certain non-canonical hostnames to canonical hostname, preserving http/https protocol
# FQDN-format www.websiteA.com or with appended port number
RewriteCond %{HTTP_HOST} ^www\.websiteA\.com(\.|\.?:[0-9]+)$ [NC,OR]
# www- or non-www websiteB.com
RewriteCond %{HTTP_HOST} ^(www\.)?websiteB\.com [NC,OR]
# www- or non-www our-big-hosting-company.com hostname
RewriteCond %{HTTP_HOST} ^(www\.)?our-big-hosting-company\.com [NC]
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.*)$ http%2://www.websiteA.com/$1 [R=301,L]
#
# If you like informing people who are doing things that they shouldn't be
# doing about your Web site's protections (not really a good idea), then
# use this rule instead of the Forbid-hotlinking rule above:
# Return alternate image for hotlinked image requests
RewriteCond %{REQUEST_URI} !^/images/hotlinked\.gif$
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?websiteA.com [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?websiteB.com [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?our-big-hosting-company\.com [NC]
RewriteRule \.(jpe?g|gif|bmp|png|ico)$ images/hotlinked.gif [NC,L]

Jim

[edited by: jdMorgan at 6:03 pm (utc) on Aug 12, 2010]

grandma genie

8:38 pm on Aug 11, 2010 (gmt 0)

10+ Year Member



Yeah! Hallelujah! It's working. Now when I click on those old links in Google searches, I get the correct URL. Awesome! I did use the other forbid hotlinking rule above, as suggested. I do not have a custom 403 error page. I do have a custom 404 error page. Do I need a custom 403 forbidden page? I tried to find some examples online, but most of the examples were for 404 pages. Thank you for your great patience with me, Jim. You are the man!
Jeannie

Hedgehog_UK

11:13 pm on Aug 11, 2010 (gmt 0)

10+ Year Member



Slightly off topic, so I hope I'll be forgiven.

403s are responses to attempts to do what they're not supposed to. In my case, some of the people using the site may only just have started school, so it seemed sensible to offer a helping hand - just in case they figured out how to get where they're not supposed to be.

In practice, 99 percent of the time it's only the unwelcome visitors (bad bots etc) who receive it.

grandma genie

3:57 am on Aug 12, 2010 (gmt 0)

10+ Year Member


One more question. I've found another URL for my website that I was not aware of. I just found it in my server logs. This one is using my IP address, plus my website name like this: http://44.44.44.44/mywebsite.com. I'd like to add this to the rewrite rules shown above. Is it normal for one website to have so many different URLs? How could this have happened. It only happened after I moved my site to a new host.
--Jeannie

jdMorgan

4:02 pm on Aug 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does this "http://44.44.44.44/mywebsite.com" URL-set include the "admix" path as well?

Some hosts add these "back-door" hostnames so that new users on name-based shared servers can access their files on the server before the DNS for their new domain propagates. It is intended for the convenience of new account holders.

They're only a problem if a search engine somehow 'discovers' them. Unfortunately, search engines are getting better at this discovery.

Jim

grandma genie

5:43 pm on Aug 12, 2010 (gmt 0)

10+ Year Member



No, and I was surprised to find it on the server logs. Here is the first log entry for this visitor:

98.16.222.130 - - [11/Aug/2010:20:42:23 -0400] "GET /osc/images/spacer.gif HTTP/1.1" 403 312 "http://44.444.44.44/mywebsite.com/osc/index.php?cPath=238" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)".

I can't tell how this visitor found the link. But I know my host did make a link available for me to find my site in the beginning using the IP. I see the link has a 403 Forbidden response. The server logs showed a long series of entries and it also showed that the visitor was redirected to the correct URL, without the IP in front, by clicking on any link in the initial entry page. -- Jeannie

jdMorgan

6:05 pm on Aug 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Put this new rule after the "mod_userdir" rule above:

# Externally redirect old IP-address/mywebsite.com/ -format requests to canonical hostname
RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3}
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^mywebsite\.com/(.*)$ http%2://www.websiteA.com/$1 [R=301,L]

Also note the correction of "%{HTTPS_HOST}" in that rule to "%{HTTP_HOST}" -- This was a typo.

Jim

grandma genie

3:19 am on Aug 14, 2010 (gmt 0)

10+ Year Member



Hi Jim,
I tried this but it did not work, but I have decided not to bother with this because there are so few log entries using the IP in the URL, that it is not worth the trouble. Also, my host assigned a different IP to each different URL. Thank you for the suggestion. I am trying the amazonaws block you suggested. I'll let you know what happens. Thanks again.
Jeannie

jdMorgan

11:30 am on Aug 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Also, my host assigned a different IP to each different URL

That should not matter, as the "IP address" pattern is only looking for numbers -- any numbers.

Try removing the "mywebsite.com/" from the rewriterule pattern, and see if that helps -- I have no idea what your host does to the URL-path before passing the request to this code, and that prefix may not be necessary.

Raise your "stick-to-itiveness factor" by just a tad, as code rarely works the first time out when the requirements are obscure -- as they are in cases like this where the host does something "strange." And frankly, having invested time in a question, I'd like to see it resolved... :)

Jim

grandma genie

5:07 pm on Aug 14, 2010 (gmt 0)

10+ Year Member


This is what I did:

# Externally redirect old mod_userdir-format requests to canonical hostname
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /~adminxx/[^\ ]*\ HTTPS/
RewriteCond %{HTTP_HOST} ^(www\.)?our-big-hosting-company\.com [NC]
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^(.*)$ http%2://www.websiteA.com/$1 [R=301,L]
#
# Externally redirect old IP-address/websiteA.com/ -format requests to canonical hostname
RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3}
RewriteCond %{SERVER_PORT}s ^(443(s)|[0-9]+s)$
RewriteRule ^websiteA\.com/(.*)$ http%2://www.websiteA.com/$1 [R=301,L]

mywebsite is websiteA, so I replaced the mywebsite or websiteA to the actual name in all instances

I also tried:

# Externally redirect old IP-address/ -format requests to canonical hostname

and

# Externally redirect old IP-address format requests to canonical hostname

Then I put the URL in my browser and clicked on it, and it does not redirect. It just stays the same. I also cleared the browser cache in all instances. Here is the URL:

http://44.444.44.44/websiteA.com/osc/index.php?cPath=238

This is the original server log entry:

98.16.222.130 - - [11/Aug/2010:20:42:23 -0400] "GET /osc/images/spacer.gif HTTP/1.1" 403 312 "http://44.444.44.44/websiteA.com/osc/index.php?cPath=238" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"

Thank you, Jim. Other ideas?

Jeannie

jdMorgan

1:56 am on Aug 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Always delete your browser cache before testing new code. (Note that the request was for an image -- one that your browser should not have yet known that it needed to fetch without first fetching the page. That is, unless it was loading from a previously-cached copy of the page...)

Jim

Pfui

1:54 am on Aug 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't mess around with anyone messing around with site name- or address-included referers because most are log-spamming data miners. Thus regardless of whether similar 'visitors' want an image or a page:

RewriteCond %{HTTP_REFERER} 444\.44
RewriteRule .* [F]

Or maybe:

RewriteRule ^(.*)444\.44(.*) - [F]

(And Jim, if/when you correct one or both of those, TIA:)