Forum Moderators: phranque

Message Too Old, No Replies

redirect directory except some files

.htaccess, redirect, directory, except

         

monie

2:08 am on Jan 7, 2010 (gmt 0)

10+ Year Member


Hi,
so I've just written my very first server-side code! It still looks like a bunch of gobbledy-gook to me (it's like all punctuation marks to boot... weeeird) so making it do something makes me feel like an evil genius in a secret lair.
I suppose in time I'll learn that I'm writing the apache equivalent of <font color="red">, then leaning back in my chair and going, "BUAH HA HA." Dear future self: remember this moment and be humble.
In summary: When you explain things, please remember I'm completely new. :)

I want to redirect most of the files in http://www.oldsite.com/html/ to http://www.newsite.com/tutorials/html/. However, I want maybe five files in the old directory to redirect to http://www.newsite.com/tutorials/css/.

Right now I put this in oldsite's .htaccess file:
redirect /html http://www.newsite.com/tutorials/html

and this in newsite's .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.newsite.com$
RewriteRule ^tutorials\/html\/1filename\.html$ "http\:\/\/www\.newsite\.com\/tutorials\/css\/1filename\.html" [R=301,L]
...etc...
RewriteRule ^tutorials\/html\/5filename\.html$ "http\:\/\/www\.newsite\.com\/tutorials\/css\/5filename\.html" [R=301,L]

Basically: the exceptions get redirected twice, and the second redirect is written out for each filename. This works, but I'm interested in learning the smart way to do it.

The reason I'm asking here (going as far as to create this account) is that I'm hoping you can not only write the code to do what I want, but explain the syntax afterwards: a little paragraph saying what each punctuation mark does in this case.

monie

7:11 am on Jan 7, 2010 (gmt 0)

10+ Year Member



Helping you help me:

1. I now have one .htaccess file with

RewriteCond %{REQUEST_FILENAME} ^(file1.html or file2.html or file3.html)$

What symbol should I put in place of "or"?

2. How do I access that string variable (which may be either file1.html or file2.html...) in RewriteRule?

for ex, RewriteRule oldfolder/(FILENAME) [newsite...]

jdMorgan

7:12 am on Jan 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To avoid the double redirect, put the 'smart code' on the old server. Search engines in general will pass PageRank and link-popularity through one redirect, but not two.

We don't provide a free code-writing service here. There are only a handful of contributors here, but hundreds of question-askers. Therefore, we will be happy to answer very-specific questions and to help you get your code working.

"Punctuation marks" -- See the regular-expressions tutorial and the mod_rewrite documentation cited in our Forum Charter [webmasterworld.com] for a good start.

Having done that, here are some additional hints:
Change the redirect on your old-domain server from a mod_alias Redirect directive to a mod_rewrite RewriteRule directive. Get that working (note that one or two additional directives may be needed to 'set up' mod rewrite on that old-domain server), then add RewriteConds testing either %{REQUEST_URI} or a back-reference to the URL-path captured in the RewriteRule pattern and comparing it against a *negative* pattern to create exceptions to your rule. That is, "Rewrite /html to newsite if NOT this URL-path and NOT that URL-path" etc.

For your reference, here is a cleaned-up version of one of your rules, which you can analyze using the resources cited above:


RewriteCond %{HTTP_HOST} ^www\.newsite\.com
RewriteRule ^tutorials/html/1filename\.html$ http://www.newsite.com/tutorials/css/1filename.html [R=301,L]

Note that the major change here is that only characters which otherwise would have meaning as regular-expressions tokens or operators need to be escaped. The URL on the 'right side' of the RewriteRule does not need to be escaped at all (except in very rare circumstances) because it is a literal string, and not a regular-expressions pattern.

I also removed the end-anchor from your hostname pattern in the RewriteCond, so that hostnames requested in FQDN format and/or with port numbers appended will also match and get redirected. For example, http://www.example.com.:80 is a perfectly-valid value for %{HTTP_HOST}.

The end goal here is that for each unique URL requested from any of your sites, the result should be either the requested content with a 200-OK or 304-Not Modified response code, or a single 301-Moved Permanently redirect to a canonical URL that will return the originally-requested content with a 200-OK or 304-Not Modified response code. For each unique piece of 'content' -- whether it be a 'page' or an image or a media file, only one URL should be usable to access that content: Any change whatsoever in any of the characters seen in your browser's address bar constitutes an entirely-different URL, and any such URL variations should always result in a 301 redirect to the single correct/canonical URL for a given resource.

Please do have a look at our Apache Forum Charter, our Apache Forum Library, and the site-search feature. These links are all at the top of this page. Oh, and...

Welcome to the lair.

Jim

monie

7:52 am on Jan 7, 2010 (gmt 0)

10+ Year Member



For anyone out there searching this,
1. the format is %VARIABLE [option1, option2, ..., optionN]
note: make sure the string openers and closers are inside the brackets, [^around$, ^each$, ^option$]

2. %1 didn't seem to work for accessing one of the options, but you can achieve the same thing with
RewriteRule ^oldpath/?(.*)$ "http://www.newsite.com/tutorials/css/$1" [R=301,L]

[edited by: monie at 8:09 am (utc) on Jan. 7, 2010]

monie

8:05 am on Jan 7, 2010 (gmt 0)

10+ Year Member



Jim- thanks for responding to my "write my code" question (not meant sarcastically). I saw a post where you asked someone else not to do that a while after writing mine and felt bad, but it was too late to edit mine. Obviously you spent a lot of your time anyway.

The punctuation marks link is very helpful, thanks for pointing me that way!

g1smd

8:25 am on Jan 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The URL on the right does not need to be escaped.

jdMorgan

8:26 am on Jan 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No need to 'feel bad' -- We just do things a bit differently here.

Jim

monie

5:44 am on Jan 9, 2010 (gmt 0)

10+ Year Member



:) Thanks Jim; I actually see the benefit of the "different way" of doing things. The archives are really helpful, which can't be said for most forums.

And thanks g1smd!

Actually, your comment reminded me: for anyone searching this on google or something who is similarly new, my general lesson-learned advice is: think differently. Do you need RewriteCond? As someone with a Java background, 1. I was imagining the structure of my code like it was object oriented and 2. I was trying to use RewriteCond like the Java if-else statement.

I got it to work as such:
RewriteCond checks if filename = x [or]
RewriteCond checks if filename = y [or]
etc...
RewriteRule rewrites html/(.*) to http://www.example.com/directory/css/$1

RewriteCond checks if filename != x
RewriteCond checks if filename != y
etc...
RewriteRule rewrites (.*) to http://www.example.com/directory/$1

However, as I read the tutorial JD pointed me too, I used my .htaccess redirect to practice the regular-expressions syntax I was learning. My files happen to have numbers at the beginning according to what lesson they are: 3link.html, 5web.html, 16padmar.html.... so later files, which covered CSS and which were moved to a different directory, had larger numbers at the beginning of the filename.

Without using RewriteCond (except to check HTTP_HOST), I could say
RewriteRule html/(1[2-7]{1}.*) http://www.example.com/css/$1

I hadn't tried something sneaky like this before because, thinking in Java-mode, my gut-instinct was, "Any character any number of times? That must take FOREVER!"

But when I actually ran the second code, using RewriteCond less made pages redirect, and non-redirected pages load, MUCH faster. (As in, before the load-time was noticeable, about 2 seconds; now pages redirect immediately.) In retrospect, when I think about the hardware, that makes sense.

Reading through the archives, I've noticed that almost every time someone asks a question about .htaccess and their code includes RewriteCond, someone (usually JD or g1smd, ironically) swoops in and points out a way that they could have achieved the same thing using only RewriteRule.

So remember: any time you use a new programming language, you have to think according to and understand the STRUCTURE of that language.

jdMorgan

4:58 pm on Jan 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you read the "processing" section of the mod_rewrite docs, you'll find that RewriteConds are not processed at all unless the RewriteRule pattern matches. Further, any URL-path sub-string(s) matched by the RewriteRule pattern are available as a back-references to the RewriteConds (as $1 - $9), and test-strings matched by RewriteConds become available to both the subsequent RewriteConds in this rule-set and for use in the RewriteRule substitution URL/filepath (as %1 - %9).

So in essence, the order of processing is:
1) RewriteRule pattern
2) RewriteCond pattern(s) and/or condition(s)
3) RewriteRule substitution (if steps 1 & 2 match successfully)

It's unlikely that getting rid of one (or even ten) RewriteConds would speed up your code noticeably, so there may have been some other underlying problem there. But it never hurts to optimize the code from the start, so you don't have to do it "under duress" at a later time...

Be aware that you've used the "if filename =" terminology rather loosely above. To minimize confusion and errors, always keep in mind that RewriteRule examines requested URL-paths, and not the filepaths that those URLs may (later) resolve to. mod_rewrite effectively works at the time when requested URL-paths are being translated into server filepaths, so we don't yet really have 'filenames' to look at in this stage of the processing. RewriteConds, when configured to check %{REQUEST_FILENAME} or %{SCRIPT_FILENAME} (which are synonyms), are actually looking at the filepath/directory-path to which the requested URL-path would resolve by default without benefit of any rewriting that *may* occur as a result of running this rule.

Anyway, it's critical to maintaining sanity that URLs and filepaths be understood to be two completely-different and distinct things: URLs are used "out there on the Web" and filepaths are only used "inside this server." And mod_rewrite's job (*part of it) is to assist in the mapping of requested URLs to server filepaths.

* mod_rewrite can also do URL-to-URL redirects, invoke reverse-proxy through-puts, set server variables or client-side cookies, etc.

Jim

monie

11:51 am on Jan 10, 2010 (gmt 0)

10+ Year Member



"If you read the "processing" section of the mod_rewrite docs, you'll find that RewriteConds are not processed at all unless the RewriteRule pattern matches."
Ah, thanks. I do think in my case the RewriteCond's were always being called--if the pattern was .*, it would always match, correct?

"any URL-path sub-string(s) matched by the RewriteRule pattern are available as a back-references to the RewriteConds (as $1 - $9)"
ooooh, that's helpful.

"It's unlikely that getting rid of one (or even ten) RewriteConds would speed up your code noticeably, so there may have been some other underlying problem there"
I wouldn't be surprised :) Perhaps the problem was in that the RewriteRule pattern always matched (even if it didn't substitute b/c of the RewriteConds)? Or would that not slow it down noticeably on this scale either?

"Anyway, it's critical to maintaining sanity that URLs and filepaths be understood to be two completely-different and distinct things: URLs are used "out there on the Web" and filepaths are only used "inside this server." And mod_rewrite's job (*part of it) is to assist in the mapping of requested URLs to server filepaths."
I actually didn't know that before; as I said, mostly a browser-side person. Thanks for clearing it up.

g1smd

6:00 pm on Jan 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do think in my case the RewriteCond's were always being called--if the pattern was .*, it would always match, correct?

Yes, so one thought process you add to the 'requirements' phase is to ask whether you really need to match *all* URL requests, or whether a more restrictive pattern (such as only match a particular extension, or only match a certain folder, or only match requests without an extension) might be in order.