Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite is a tricky thing, isn't it?

I could sure use a fresh set of eyes here...

         

theapeman

3:45 am on Jun 7, 2007 (gmt 0)

10+ Year Member



Hi all,

Please believe me when I say that I wouldn't post something as simple as this unless it was a last resort, but as it is I really am about to go out of my mind! It's no exaggeration to say that I've spent about 10 hours a day for the past three days on nothing but this:

Options +FollowSymLinks
RewriteEngine On
RewriteRule ^index\.html$ newpage.html

I kid you not, it's as simple as that. That is the entirety of an .htaccess file that is located in the same directory as the two html files in question. On my test server at home it works perfectly. But when I try it on the VDS it does not.

Here's the deal: There is a website out there that links to my page, let's say "startpage.html." Upon loading the .htaccess file above into the directory where both html files are located, and clicking on the link on startpage, instead of opening newpage.html, I find myself at index.html. However, from index.html if I click the "home" link (which links to itself), then newpage.html opens correctly.

I'm telling you, I'm about to lose it. If anyone can help point me in the right direction you'd be making me (and everyone who's had to be around me these last three days) very happy indeed. Thanks for reading.

jdMorgan

5:13 am on Jun 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you completely flushing your browser cache between tests?

If so, have you tried changing the rewrite into a redirect, pointed at an external site, just to see if that works? (This would expose problems such as other rules, Alias directives, or possibly even scripts interfering.)

Jim

theapeman

5:35 am on Jun 7, 2007 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for the reply. I discovered sometime around the middle of Day 2 that clearing the browser cache made a difference. I thought for sure that was what my probem was, because I could think of no other reason for it not to work.

And if by pointing it at an external site, you mean something like


RewriteRule ^index\.html$ http://mysite/newpage.html

then yes, I have tried that and many variations on it as well; even tried IP address e.g. [123.456.7.8...] but to no avail.

At the moment I have resorted to the ugliest hack you can imagine:

DirectoryIndex newpage.html

But after 3 days of this nonsense I don't care how ugly it is. Is there a way to use Conditionals elsewhere besides RewriteRule? Because it would be nice if employing this awful hack didn't have the unfortunate effect of sending users who are already on my site and click the "home" button to newpage! For instance, if the .htaccess file were a perl script, I'd say:

$DirectoryIndex = 'index.html';
if ($referer =~ /startpage/){ $DirectoryIndex = 'newpage.html'; }

but maybe I can only declare DirectoryIndex once in .htaccess and not change it?

And I understand that that is basically what RewriteCond and RewriteRule are supposed to do, but I've tried all the stuff that is supposed to work and now I'll try anything at all. Heck, I'll go to each user's computer and physically type in the URL for them if that's what it takes!

jdMorgan

6:01 am on Jun 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I mean pointing it at google.com -- as a test only.

You can change the 'old' index.html page URL as well, again as a test:

RewriteRule ^foo\.html$ http://www.google.com [R=301,L]

and then if that works, try:

RewriteRule ^index\.html$ http://www.google.com [R=301,L]

On the other hand, if index.html is already declared as your DirectoryIndex, then either deleting the old index.html file and renaming newpage.html to index.html, or changing DirectoryIndex (as you have done) is the correct thing to do, and the latter is not really an ugly hack.

Compare the DirectoryIndex directives in the httpd.conf files of your test and hosted servers, and you'll likely find the reason that it works on your test server, but not on the hosted server.

It's not entirely clear why you haven't simply renamed the file. If this is part of some larger problem, it might do to back up a step and ask about that instead.

You can get around some problems with internal rewrites such as those done by mod_rewrite, mod_alias, and DirectoryIndex, by using RewriteCond to examine %{THE_REQUEST} to check the URL originally requested by the client when it started the current HTTP transaction. This can be used to avoid rewriting or redirecting a previously-rewritten URL (again, with 'previous' meaning 'within this current HTTP request').

An example is avoining an infinite loop' when trying to redirect requests for "index.html" to "/" -- a problem when "/" is being internally rewritten by DirectoryIndex to "index.html":


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

Here, the RewriteRule is only invoked if index.html is directly requested by the client, and it is not invoked if index.html is requested as a result of Directory index rewriting a client request for "/" to "index.html." It works because THE_REQUEST is the complete client HTTP request header, and is not updated as the result of any internal server activity. Example:
GET /index.html HTTP/1.1

Jim

theapeman

6:35 am on Jun 7, 2007 (gmt 0)

10+ Year Member



Jim, that is all incredibly helpful and totally on point as far as addressing the immediate gaps in my knowledge about Apache as it relates to this issue.

And of course, you're right. There's more to it than the simple scenario I sketched out. The full scenario is that I want visitors referred from this one specific domain to be taken to a special page on my site, while everyone else continues to see the normal home page. And this, too, I have achieved on the test server (though not from the actual external domain, of course).

It's interesting, though, because just writing that line of perl in my previous post made me realize that maybe I'd caught a bit of tunnel vision these past 3 days, so I dashed off the following:


#!/usr/bin/perl -w
# index.cgi
use strict;
use CGI qw(referer redirect);
my $site = 'http://mysite.com/';
my $newpage = 'newpage.html';
my $home = 'home.html';
my $q = new CGI;
if ($q->referer =~ /thatdomain/){
print redirect($site . $newpage);
}
else { print redirect($site . $home); }

OK, and then in my .htaccess I say

DirectoryIndex index.cgi

then rename my old index.html to 'home.html' and I'm done!

Now if I've learned anything these past 3 days (and I have), it's that I shouldn't get too proud of myself before it actually works... but at the least, it's good to know I've got more options than I thought.

And you make a great point about examining the different config files, as that should definitely narrow down what's going on.

By the way, I love this site. I had another question that I asked the one and only other time I posted here, which is: is it possible to view the whole thread while composing a reply? As it is, I can only see my original post but not either of your subsequent replies -- in fact, I'm not even 100% sure if you've replied once or twice, but I think it was twice :) -- which makes it kind of hard to properly respond to what you wrote. Am I missing a button somewhere?

Thank you, Jim. I appreciate you lending a hand.

jdMorgan

7:01 am on Jun 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On HTTP Referrers: Be careful; HTTP/1.0 requests will not have a HTTP_REFERER header, and neither will requests made through any kind of caching proxy, corporate or ISP... such as *all* of AOL.

At best, referrer-based solutions will work 50% of the time, and should only be used for applications (such as hotlink protection) where that is "good enough" to stop the undesired behaviour. For example, the reason that it's 'good enough' to stop image hotlinking is that if a Webmaster codes his page to load an image from your server, and then finds that half of *his* visitors report a broken image on his site because you're blocking referrals from his site, then he'll probably look elsewhere for an image to steal. And even if he doesn't, you've at least cut your bandwidth loss to less that it would have been.

Anyway, decide for yourself how to handle the case of a blank referrer, because it will be a common case, and you'll need your design to 'degrade gracefully' in the case where the referrer is not present in the request.

As to viewing the whole thread when replying, I'm afraid the best way is to pop another window or browser tab. The forum's behaviour of showing only the first post is intentional, and is meant to keep threads here on-topic. We treat the original poster as "the owner" of a thread, and replies should be made to his/her question, and not to subsequent posts that wander off-topic... It does have its limitations, especially in a technical discussion, but since the work-around is simple, the benefits outweigh the negatives.

Jim

jdMorgan

7:14 am on Jun 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW, the behaviour you've implemented with PERL can also be coded in mod_rewrite in .htaccess as:

# If referrer is my site
RewriteCond %{HTTP_REFERER} ^http://www\.my-site\.com
# serve /my-site-home-page for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /my-site-home-page.html [L]
# else serve /not-my-site-home-page.html for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /not-my-site-home-page.html [L]

Opposite handling of blank referrer case:

# If referrer is my site
RewriteCond %{HTTP_REFERER} ^http://www\.my-site\.com [OR]
# or if it is blank
RewriteCond %{HTTP_REFERER} ^$
# serve /my-site-home-page for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /my-site-home-page.html [L]
# else serve /not-my-site-home-page.html for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /not-my-site-home-page.html [L]

Or alternately, using negative logic:


# If referrer is NOT my site (or if referer is blank)
RewriteCond %{HTTP_REFERER} !^http://www\.my-site\.com
# serve /not-my-site-home-page for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /not-my-site-home-page.html [L]
# else serve /my-site-home-page.html for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /my-site-home-page.html [L]

Opposite handling of blank referrer case:

# If referrer is NOT my site
RewriteCond %{HTTP_REFERER} !^http://www\.my-site\.com
# and if referrer is non-blank
RewriteCond %{HTTP_REFERER} .
# serve /not-my-site-home-page for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /not-my-site-home-page.html [L]
# else serve /my-site-home-page.html for "/" or "/index.html" requests
RewriteRule ^(index\.html)?$ /my-site-home-page.html [L]

The basic trick is to not use "index.html" as either real file, since if you do, then DirectoryIndex will unconditionally rewrite "/" to that file before mod_rewrite can run.

Jim

theapeman

7:22 am on Jun 7, 2007 (gmt 0)

10+ Year Member



Ah! Good point. Without specifying what to do if there is no referer string, my little script would, what -- send those users to a 404? Thanks for the tip. Are there other possibilities besides

1) there is a referer string and it matches
2) there is a referer string and it doesn't match
3) there is no referer string

?

As for the posting thing: I very much applaud that philosophy, and the fact that that amount of thought went into the site's design. Well done.

theapeman

7:31 am on Jun 7, 2007 (gmt 0)

10+ Year Member



Upon reflection, I may have been mistaken: that script should direct everyone, other than those whose referer string matches the pattern, to the regular home page, even those with no referer string at all. Only those with the matching referer receive special handling.

I should also mention that nothing bad would happen if visitors from that domain aren't properly redirected. They would just see the regular home page like everyone else. I just want to redirect as many of them as possible, and the referer is the only way I know of to do that.

jdMorgan

5:54 pm on Jun 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your script will either serve the regular or the 'special' page for blank referers, based on whether you code it to treat a blank referrer as a match or a non-match.

So, you decide what to do with blank referers and implement the code to do what you want. I just wanted to make you aware of the issue... ;)

Jim