Forum Moderators: phranque

Message Too Old, No Replies

rewrite not working

rewrite domain landing-page

         

revrob

12:03 pm on Nov 29, 2011 (gmt 0)

10+ Year Member



I'm trying to redirect requests with a referrer from a certain domain to a landing page.

I have no problem with rewrites to a landing page based on IP address, thanks to help from here.

But I'm finding rewrites based on referrers are troublesome, whether to a 403 [F] or to a landing page.

I have successfully rewritten "blank user agent string" to [F[
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR]
RewriteCond %{HTTP_REFERER} ^-$
RewriteRule ^.* - [F]

I have succeeded with redirects using a regex for the IP range, including to a landing page and that is working fine with a 302 code being returned and the redirection proceeding to the named resource.

But this type of referring domain code fails every time - I just get the normal 200 response on my logs - I've been testing it on one of my sites, to put in an .htaccess block for referrers from the other site. So far, the link just goes through to the site and isn't blocked or redirected.

I've tried these variations, and tweaked as many of the optional alternatives as I can think of.

Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^one word string from referrer domain name$
RewriteRule ^.* full url of landing page [L]

and

Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} (one word string from domain name)
RewriteRule ^.* full url of landing page [L]

and

Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^domain\.org\.uk$
RewriteRule ^.* full url of landing page [L]

and
Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^domain\.org\.uk$
RewriteRule .* full url of landing page [L]
and

Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^http://www\.domain\.org\.uk$
RewriteRule .* full url of landing page [L]

and
Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^http://domain\.org\.uk$
RewriteRule .* full url of landing page [L]

Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} ^domain\.org\.uk [NC,OR]
short list of similar ... each with [NC][OR]
RewriteRule ^.* full url of landing page [L]

I've tried all sorts of variations on the above but none work even the other types of rewrite work fine.


The referring domain I actually want to block appears in my logs as referrer with a url in the format
"http:/ /www . domain.info/lots/of/sub/directories/strings_separated_by_underscores"
(/ / and www . broken by me)

- but the tests above are simply based on my own "domain.org.uk" as referrer

All suggestions welcome - I've hit a dead end with my own research. It's frustrating because I've managed to get so many of the other types of rewrite to work.

Thanks in advance.

wilderness

3:51 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How are these any different than your recent inquiry and answer [webmasterworld.com]?

1) if Options +FollowSymLinks is required at ALL, than only once per htaccess file

2) You seem determined to use both the begins with and ends with anchors simultaneously on the same line
a) very rarely will a refer begin and end with the name of domain and not include a trailing sub-folder and/or file path. As a result, your combined begins with and ends with anchors lines fail.

Please use example.com for the syntax per the forum charter, and you will not be required to use spaces to break links

The referring domain I actually want to block appears in my logs as referrer with a url in the format
"http://www example.com/lots/of/sub/directories/strings_separated_by_underscores"
(/ / and www . broken by me)


In a refer based regex, your NOT required to use the entire referring page, rather you may use any keyword portion of either the domain name, sub-folder, or page0name. ONCE AGAIN portion

HOWEVER and when using a portion keyword of the referring page, you are utilizing the contains anchor and do NOT use either the begins with or ends with anchors in your Cond.

wilderness

4:13 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mod_Rewrite & Regular Expressions [webmasterworld.com]

^ defines the begining of a 'line' (starting anchor). Remember, ^ also designates 'not' in a regular expression, so please don't get confused.

$ defines the ending of a 'line' (ending anchor), and when followed by a number from 1 to 9, also references a variable defined in the RewriteRule pattern (used for variables on the right side of the equation or to match a variable from the rule in a condition, see example below).

contains is the absence of any anchors

revrob

8:02 pm on Nov 29, 2011 (gmt 0)

10+ Year Member



I do take the point, and I have tried to get this to work on basis of what I have learned here already.

Here's the referrer in my logs that I want to deal with.
"http://www.example.com/lots/of/sub/directories/strings_separated_by_underscores"

You ask:
How are these any different than your recent inquiry and answer
[webmasterworld.com...] ?

Absolutely - good point - I wish I knew.
That is exactly where I started, trying adaptations of that and I can't get it to work.
Because it didn't work - I'm back!

That thread offered this solution

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (sitereview|bluecoat)
RewriteRule .*$ - [F]

which is where I started with this current project

and I just can't seem to get it to work when I test it on my own sites, by putting in a string from the domain name of one of my sites and try and block referral from that domain to the other of my sites.

If referring site is example.org.uk and the .htaccess is for example2.org.uk and I write code thus in the .htaccess file for example2.org.uk

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (example)
RewriteRule .*$ - [F]

then I can still get to second site by clicking a link in the first site, despite the rewrite.

I really want to get the thing working to redirect to a landing page, based on the referring domain, but I've never managed to crack that.
If it is based on IP ranges (regex)- fine - I can do whatever I like.
Based on useragent - fine - ditto
Based on referrer - nope.

Thanks for the help so far.

lucy24

8:44 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Overlapping...

Aha! That explains the seemingly random post about anchors that I just saw next door ;)

In addition to everything else, note that on rare occasions the referer and/or UA will be genuinely blank, so you should cover yourself with ^-?$ I'd assumed my server was supplying the - so the entry wouldn't be empty, but then one time I met a bona fide "".

Are you sure you want to exclude all blank referers? Yes, they are mostly robots, but some humans also come through this way. And sometimes it is out of the human's control.

then I can still get to second site by clicking a link in the first site, despite the rewrite

But once you're on your first site-- even if it's only the custom 403 page-- if you follow a link from there, the referer will no longer be Evil Outside Site will it?

wilderness

9:12 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"http://www.example.com/lots/of/sub/directories/strings_separated_by_underscores"

any of these terms or combination of these terms will and should function, while using the contains cond.

Perhaps your issue is that your attempting to use some type of invalid expression to name the "lots of sub directories", without actually naming the sub-directories?

g1smd

9:40 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirect responses are cached by the browser.

Be sure to clear the cache before each test.

revrob

10:01 pm on Nov 29, 2011 (gmt 0)

10+ Year Member



NO its nothing that complicated - because I am testing on my own 2 sites first to get the syntax right, before even thinking of trying something on an external site.

So this is my TEST scenario, using my own sites for the moment...

example1.org.uk - my first site with a link to site 2
example2.org.uk - my second site.

I want to ban anyone arriving at second site, if they have used a link on the first site.

I want to have two alternatives avaiable (for different situations)
1 - a customised landing page
or
2 - returning a 403 access denied

site 1 is example1.org.uk
site 2 is example2.org.uk

Yet, if I put this in the htaccess file for site 2, it doesn't stop me using a link in site 1 which takes me to site 2, and I was hoping it would.

this doesn't work

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (example1)
RewriteRule .*$ - [F]

and I am struggling to find out what the problem is.

It also doesn't work if I try and create a landing page version which directs to a landing page on site 2

#Refer contains either, than deny access
RewriteCond %{HTTP_REFERER} (example1)
RewriteCond %{REQUEST_URI} !^/landingpage\.php$
RewriteRule .* [example2.org.uk...] [L]

I've remembered to include the intended landing page in my list of "not to be redirected" pages on site 2 using the ! expression - nearer the beginning of .htaccess than the Rewrite argument. that list works fine in conjunction with the other working Rewrite commands I use so I'm assuming the problem isnt there.

I've uploaded the landing page to example2.org.uk/landingpage.php
I've put the landing page in robots.txt as Disallow

The RewriteRule line works in other sections of the .htaccess
The RewriteCond rule works with source IP regex ranges, and with UserAgent terms

But this particular REFERER one won't work and its got me beat.

The log entry for example2.org.uk site logs which results from the failed command is this - perfectly normal

Source.I.P.address - - [29/Nov/2011:22:47:19 +0100] "GET /index.html HTTP/1.1" 200 12373 www.example1.org.uk "-" "user agent string" "-"


One other thing that did puzzle me - why is REFERER spelt with only one R in the middle? ( tried spelling it with two Rs but that didn't work either)

Picking up one other point mentioned above - yes - I do have a ban on blank user agents and it works fine for me giving a lot of unwanted stuff a 403

lucy24

11:01 pm on Nov 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



why is REFERER spelt with only one R in the middle?

Because long ago, someone at Apache couldn't spell. Sorry. That's all the explanation you will ever get, unless someone wants to step forward and come clean.

You do realize, I hope, that rewrites will not show up in your logs. The only way you can identify them is by the filesize: when a robot asks for a certain file that is really 600-plus K, and the htaccess says "200 578", then I know that they have been successfully rewritten. Also that the request won't be immediately followed by requests for CSS and images, which are called from the real page.

But 12-plus K sure sounds like the intended destination, unless you've got a whacking enormous "I don't like your face" page.

Oh, and whether it's a rewrite or a redirect, it actually makes no difference whether the requested file really exists or not. Unless, ahem, you've got one of those boilerplate htaccess files that throws in a !-f every single time ;)

revrob

10:19 am on Nov 30, 2011 (gmt 0)

10+ Year Member



Thanks for the answer about the spelling!

Re Rewrites showing up in my logs - they seem to show up in mine - at least the ones that work do - a 302 response to the original request for the page and then the record of the visit to the redirected location on my webserver.
Or maybe I'm misunderstanding you?

For example:
IP Address - - [**/Nov/2011:**:50:** +0*00] "GET /example.html HTTP/1.1" 302 230 www.example.com "http://www.examplesource.com/example/example/example_example" "useragent" "-"
IP Address - - [**/Nov/2011:**:50:** +0*00] "GET /landingpage.html HTTP/1.1" 200 5732148 www.example3.com

And of course when I'm testing on my own 2 sites I have the browser experience to tell me whether the redirect is working - and at the moment - I can't get the thing to work on the basis of a REFERER string. I wish I knew why.

wilderness

12:41 pm on Nov 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can't get the thing to work on the basis of a REFERER string.


To determine if this failure is a server problem, rather than a syntax error?

1) make a backup copy of your websites current htaccess.
2) create the simple htaccess listed below
3) rename you current websites htaccess
4) upload the below htaccess

#refer contains any character; deny access
RewriteEngine on
RewriteCond %{HTTP_REFERER} .
RewriteRule .* - [F]

The result should be that any visitor that contains a refer will be denied access.

5) view your raw logs and all the current 403's, as a result of this new file.
6) Restore your old htaccess
7) locate the syntax error

lucy24

9:51 pm on Nov 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Re Rewrites showing up in my logs - they seem to show up in mine - at least the ones that work do - a 302 response to the original request for the page and then the record of the visit to the redirected location on my webserver.
Or maybe I'm misunderstanding you?

Oh, dear, you're not getting the difference between Rewrite and Redirect, and it really is important. 301 and 302 are Redirects. They will show up in your logs. And if it's a human visitor, they will promptly show up again at the redirected address. (Robots may choose not to. "Oh, www.example.com. I was there five minutes ago.")

Rewrites-- the ones that don't carry a [R=something] flag-- will not show up in your logs. Only the original request will be there. A simple example is the 404 page that humans see. The logs will show 404, but they will not show a separate request for the Error Document. (The Error Logs might, if there is a problem with the 404 page itself.) In fact that's exactly what %{THE_REQUEST} means in a RewriteCond.

Another experiment you can do: Open your www site and ask for assorted gibberish names. You'll be shown the Error Document over and over. Now ask for your Error Document by name. When you look at your logs, you will see only one request for the error page. All the others will just be listed as 404 with the original request name.