Forum Moderators: phranque

Message Too Old, No Replies

redirect based on referer

         

lucy24

1:26 am on Oct 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I gotta put some stuff in htaccess and I don't have time to read the 87 identical posts from the past week, so I urgently need someone to write my code for me right away.
.
.
.
JUST KIDDING ;)

I've noticed in logs that some robots put the page they're asking for in the referer slot. Or, for variety's sake, the top-level index (which links only to the next level's index files). Leading to a couple of questions:

#1 Is there any situation where a bona fide human using a real browser would result in the requested page also showing up as the referer? (I mean other than, ahem, execrable linking on my part.) Links to local # fragments happen within the browser, right?, so I don't need to think about those.

#2 Assuming for the sake of discussion that I'm not concerned about hypothetical variants like terribleexample.com or example.org ... which format makes less work for the server?

!^(www\.)?example.com$ {anchored at both ends}
or simply
!example {no anchors}
?

btw, I said "redirect" but really it will be either a flat-out [F] or a rewrite to the "I don't like your face" page. I just need to get the Condition(s) sorted.

coopster

11:23 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



For #1 I would say no. And you are correct in regards to the fragment identifier.

#2, I always anchor my regular expressions when I can. I haven't tested performance but I would like to believe that if the engine is being told it starts here and ends here, it won't continue any further fruitless processing.

btherl

2:30 am on Oct 28, 2011 (gmt 0)

10+ Year Member



For #1, I often click on the site logo on my favorite news site to refresh the page. Or I click on a breadcrumb link to get back to the same page. Would that get caught by that rule?

For #2, logic says that anchoring it is better but benchmarking doesn't always agree. I benchmarked some expressions in perl and found an anchor at the start only was the worst, and anchoring at the start AND end was best. No anchors came in the middle. I don't know why.

lucy24

3:54 am on Oct 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would that get caught by that rule?

Maybe, if the page contains links to itself. Seems a kinda silly thing for a page to do, though, when the browser's Refresh button is only about an inch away. Refreshing a page by itself doesn't change the referer. (I once experimented with this in a different context.)

I benchmarked some expressions in perl and found an anchor at the start only was the worst.

Wow. That really is counter-intuitive.

btherl

9:30 pm on Oct 28, 2011 (gmt 0)

10+ Year Member



Here's the benchmarking I used

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark;

my $str2="example.com";

timethese(10_000_000, {
'startanchor' => sub {
1 unless $str2=~m/^example\.com/;
},
'anchor' => sub {
1 unless $str2=~m/^example\.com$/;
},
'endanchor' => sub {
1 unless $str2=~m/example\.com$/;
},
'float' => sub {
1 unless $str2=~m/example\.com/;
},
});


And the results from that:

Benchmark: timing 10000000 iterations of anchor, endanchor, float, startanchor...
anchor: 5 wallclock secs ( 3.81 usr + 0.00 sys = 3.81 CPU) @ 2624671.92/s (n=10000000)
endanchor: 3 wallclock secs ( 2.08 usr + 0.00 sys = 2.08 CPU) @ 4807692.31/s (n=10000000)
float: 1 wallclock secs ( 2.22 usr + 0.00 sys = 2.22 CPU) @ 4504504.50/s (n=10000000)
startanchor: 5 wallclock secs ( 3.43 usr + 0.00 sys = 3.43 CPU) @ 2915451.90/s (n=10000000)


And with an optional expression at the start:

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark;

my $str2="example.com";

timethese(10_000_000, {
'startanchor' => sub {
1 unless $str2=~m/^(?:www\.)?example\.com/;
},
'anchor' => sub {
1 unless $str2=~m/^(?:www\.)?example\.com$/;
},
'endanchor' => sub {
1 unless $str2=~m/(?:www\.)?example\.com$/;
},
'float' => sub {
1 unless $str2=~m/(?:www\.)?example\.com/;
},
});



Benchmark: timing 10000000 iterations of anchor, endanchor, float, startanchor...
anchor: 4 wallclock secs ( 3.61 usr + 0.00 sys = 3.61 CPU) @ 2770083.10/s (n=10000000)
endanchor: 3 wallclock secs ( 3.76 usr + 0.00 sys = 3.76 CPU) @ 2659574.47/s (n=10000000)
float: 3 wallclock secs ( 3.33 usr + 0.00 sys = 3.33 CPU) @ 3003003.00/s (n=10000000)
startanchor: 4 wallclock secs ( 3.47 usr + 0.00 sys = 3.47 CPU) @ 2881844.38/s (n=10000000)


Numbers varied on each run but an anchor at the start consistently made things slower. With an optional match at the start of the expression the difference was very small though.