homepage Welcome to WebmasterWorld Guest from 50.17.162.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Need some help with 301 Redirect Simplification
redirect 301 redirectmatch simplify
hottrout




msg:4091572
 5:49 pm on Mar 4, 2010 (gmt 0)
Firstly can I just say a big thank you to everyone on this forum. Having just started to learn Apache for my dedicayed server a month ago, this forum has been a light of truth in the mire of the web.

I have ready many many topics here regarding a good htaccess file and as a result have been able to control my site better than ever before. In addition to that I have been trying to simplify my code and it is at this stage that I have come unstuck as it were.

The following code does not perform as I expect it to. Even after checking and reading more posts I am convinced that there is a better way. Could anyone please advise on what is wrong and also how I could improve this code.


## Correct old misspelled folder to existing folder
Redirect 301 "/Libararys/" http://www.mydomain.com/Libraries/
Redirect 301 "/Libarary's/" http://www.mydomain.com/Libraries/
Redirect 301 "/Libarary27s/" http://www.mydomain.com/Libraries/
Redirect 301 "/Libarary%27s/" http://www.mydomain.com/Libraries/

## Redirect old static pages to new php page
RedirectMatch 301 ^/Libraries/stuff/a/ http://www.mydomain.com/libs/index.php?folder=stuff/a
RedirectMatch 301 ^/Libraries/stuff/b/ http://www.mydomain.com/libs/index.php?folder=stuff/b
RedirectMatch 301 ^/Libraries/stuff/c/ http://www.mydomain.com/libs/index.php?folder=stuff/c

## Force use of php for downloads
RewriteCond %{REQUEST_FILENAME} .*zip$|.*int$|.*bin$|.*7z$|.*fds$|.*tzx$|.*sv$|.*gz$|.*rar$|.*dsk$|.*exe$|.*mov$|.*rm$ [NC]
RewriteCond %{HTTP_REFERER} !mydomain\.com/getfile.php [NC]
RewriteCond %{HTTP_REFERER} !google\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]
RewriteRule (.*) http://www.mydomain.com/getfile.php?file=$1 [R,NC]

Basically the first part corrects an old misspelled folder to its new folder name.

The second part should then point requests for anything after Libraries/stuff/a/ to a php script in /libs.

The third part forces outside leechers to use the getfile.php to download the content. This means they get the content but only after I have displayed my page and adverts.

If I type in the incorrect spelling /Libarary's/ it does correct it to /Libraries/ for what ever follows it except when what follows is a file extension from part 3.

For example

www.mydomain.com/Libarary's/stuff/a/index.php resolves to www.mydomain.com/Libraries/stuff/a/index.php

however

www.mydomain.com/Libarary's/stuff/a/file.zip does not run the /libs/index.php page instead it falls through and runs mydomain.com/getfile.php.

I hope I have explained myself properly. I would love someone to explain why and also help me tidy the code.

Thank you in advance.

 

hottrout




msg:4091584
 6:00 pm on Mar 4, 2010 (gmt 0)

I should just also add that.

If from a browser I type

www.mydomain.com/Library's/(anything other than the RedirectMatch in part 2) then it does simply change the /Libarary's/ to /Libraries/ correctly.

If from a browser I type

www.mydomain.com/Libraries/stuff/a/whatever it also does then go to www.mydomain.com/libs/index.php?folder=stuff/a correctly

The issues only occurs when the request is for *.zip or *.int or any of the other filenames shown in part 3

Thanks again.

jdMorgan




msg:4091664
 7:56 pm on Mar 4, 2010 (gmt 0)

First, don't mix mod_alias Redirect and RedirectMatch directives with mod_rewrite RewriteRules; This causes problems because the server --and not you-- decides whether mod_rewrite directives or mod_alias directives are processed first -- Yes, you read that right: The directives in your .htaccess code *are not* executed sequentially.

Rather, each module takes its turn scanning your config or .htaccess file and executes only the directives that it recognizes. The order of module execution is set by Apache, and on Apache 2.x is beyond your control (on Apache 1.x, modules are executed in the reverse order that they were loaded in). But either way, your "code" cannot be regarded as a linear, sequential program.

Next, you have a whole bunch of errors in that first RewriteCond, because "*" doesn't mean what you apparently think it means, and if it did, then it would be necessary to use in your pattern anyway. Spend some time with the concise regular expressions tutorial cited in our Apache Forum Charter.

I also suggest that you actually spend at least half a day (but maybe not all at once) studying the mod_rewrite documentation at Apache.org. This is *not* a typical smug "RTFM" response, but rather one based on reality: If you don't study and understand what's in that module's documentation, then your chances of getting correct, efficient, and robust code are essentially zero. And having one little typo or error in your code can do some very serious damage to your site's search ranking... We've fixed a few problems here that almost put companies out of business due to their negative effects on search ranking and therefore revenue...

Another observation I'll put forward is that the chances of copying-and-pasting code and having it work are also essentially zero, unless that code is completely trivial and does in fact exactly match your needs and is compatible with your current server configuration.

Lastly, for any of the requested-URL errors at any level that you describe above, you want to "fix" the error with one and only one client redirect; Stacked or chained redirects are slow for the visitor, and search engines typically don't pass ranking factors through more than one redirect.

Taking all of the above into consideration, and after spending awhile with the docs, I hope you'll re-code your directives, test them, and post back here either for help with a specific problem or for review.

Jim

hottrout




msg:4091768
 10:49 pm on Mar 4, 2010 (gmt 0)

Thanks for your pointers. I am just beginning to learn this but unfortunatly need to keep a new server up and running. For this reason I hav had to get my hands dirty quickly. That said I really enjoy this and want to understand it fully. Having spent the last 3 hrs reading would this version of the commend be more accurate

RewriteCond %{REQUEST_FILENAME} .*zip$|int$|bin$|7z$|fds$|tzx$|sv$|gz$|rar$|dsk$|exe$|mov$|rm$ [NC]

Could you also suggest the best command to use if I can only stick to one type within the htaccess.

g1smd




msg:4091775
 11:11 pm on Mar 4, 2010 (gmt 0)

Use RewriteRule for all of them.

A better pattern might be:

\.(zip|int|bin|...|xyz)$

with the most likely matches listed first.

jdMorgan




msg:4092237
 6:05 pm on Mar 5, 2010 (gmt 0)

"... get my hands dirty quickly."

I understand being in a hurry, but if you've got six hours to chop down a tree, it's wise to spend three of them sharpening the axe. Tiny mistakes in server config can lead to disastrous results, some of which can put a company out of business. So again, I suggest research before coding.

I'll let this aspect of your project rest now.

Jim

hottrout




msg:4093862
 10:01 am on Mar 9, 2010 (gmt 0)

Jim,

I understand excatly what you mean and I agree. However the fact remains that I was put into a situation where I did not have time to simply study the art of good htaccess design. I had to very quickly correct some major site flaws and leeching as well as try and correct losses in google rank from misdirected pages. In an ideal world I would have planned this htaccess but time was not on my side.

I realise that the above code is clunky and by no means a good example, but it has been working (all be in badly). I posted the code here for that very reason, to get some advice and pointers from people like you who fully understand this facit of Apache Server.

I have been involved with computers since 1979. Owned my own ICT company since 1991 and between family life, my business and other social commitments I unfortunatly do not have enough time to hack away at Apache as I would like.

I was hoping to learn quickly from experienced direction, examples and a guiding hand.

I continue to read and will untimatly produce a htaccess that will suffice for now.

hottrout




msg:4093894
 10:29 am on Mar 9, 2010 (gmt 0)

Does this work better in order to replace an old index.htm and an even older (but linked to a lot) MainMenu.htm to the new index.php

RewriteCond %{REQUEST_FILENAME} ^/MainMenu.htm$ [NC]
RewriteCond %{REQUEST_FILENAME} ^/index.htm$ [NC]
RewriteRule . /index.php [L,R=301]

Is this the best way? Smallest code? Most accurate delivery of the required 301 for google?

BTW : This is not cut and paste so it is probably pants.

g1smd




msg:4093911
 11:09 am on Mar 9, 2010 (gmt 0)

I never link to index files by name.

The URL ends at the last trailing slash.

The standard 'redirect index files to trailing slash for any folder depth' code can be modified for your situation:


# Index Redirect for /folder/index.html and /folder/index.php to /folder/ (or to / for root)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(html?|php)(\?[^\ ]*)?\ HTTP/ [NC]
RewriteRule ^(([^/]*/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,NC,L]
#
# Redirect non-canonical non-blank hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


becomes:

# Index Redirect for /folder/index.html and /folder/index.php
# [b]and for /folder/MainMenu.html (and /folder/MainMenu.htm and /folder/MainMenu.php too)[/b]
# to /folder/ (or to / for root)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*[b]([/b]index[b]|mainmenu)[/b]\.(html?|php)(\?[^\ ]*)?\ HTTP/ [NC]
RewriteRule ^(([^/]*/)*)[b]([/b]index[b]|mainmenu)[/b]\.(html?|php)$ http://www.example.com/$1 [R=301,NC,L]
#
# Redirect non-canonical non-blank hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

[edited by: g1smd at 11:25 am (utc) on Mar 9, 2010]

hottrout




msg:4093917
 11:22 am on Mar 9, 2010 (gmt 0)

Will this expression work for the correction of the misspelt Libraries folder in the example above

RewriteCond %{REQUEST_FILENAME} [^/]/Libarary[^/]/ [NC]
RewriteRule ^([^/]+)/Libarary[^/]+/(.*) $1/Libraries/$2

I am unsure of the use of the [] in the RewriteCond and am also unsure of how to correctly execute this in the RewriteCond /Libarary[^/]

Would I be better to use

RewriteCond %{REQUEST_FILENAME} ^/Libarary(s|\'s|\%27s|27s) [NC]
RewriteRule ^([^/]+)/Libarary(s|\'s|\%27s|27s)/(.*) $1/Libraries/$3

Thanks for any advice.

hottrout




msg:4093918
 11:29 am on Mar 9, 2010 (gmt 0)

Many thanks for this it is all very valuable.

In your example, would the RewriteRule work on its own without the RewriteCOnd before?

Can you expand ^(([^/]*/)*) for me, I think I follow it but want to be sure.

g1smd




msg:4093919
 11:29 am on Mar 9, 2010 (gmt 0)

[^/] means the single character 'not a slash'.

You'll need [^/]+ for multiple characters.

Add [L] to every rule.

If the substitution pattern starts with $1 you MUST precede it with a slash.

If the pattern is for a redirect you must include the protocol and domain name and [R=310,L] flags.


^(([^/]*/)*) matches the folders to any folder depth: (('[not a slash] zero or more times', followed by slash) 'the whole, repeated zero or more times'.

The RewriteCond is vital. It ensures the URL pattern for redirects only matches for direct client requests, not prior rewrites.

hottrout




msg:4093930
 11:49 am on Mar 9, 2010 (gmt 0)

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*(index|mainmenu)\.(html?|php)(\?[^\ ]*)?\ HTTP/

Trying to fully understand this I have a few questions.

{3,9} what is this for? I can find any info on curly brackets

If I know that there are no paramaters passed to index.php from the root folder, can I remove (\?[^\ ]*)?\

Thanks.

hottrout




msg:4093937
 11:58 am on Mar 9, 2010 (gmt 0)

RewriteCond %{REQUEST_FILENAME} [^/]+/Libarary[^/]+/ HTTP/ [NC]
RewriteRule ^[^/]+/Libarary[^/]+/(.*) http://www.example.com/Libraries/$1 [R=301,L]

If the substitution pattern starts with $1 you MUST precede it with a slash. Do you mean / or \

Is this then a better version and would it work?

hottrout




msg:4093944
 12:19 pm on Mar 9, 2010 (gmt 0)

If I just want to redirect the index.htm and MainMenu.htm from the root to index.php on the root only, would this be suffice.

RewriteCond %{THE_REQUEST} ^/(index|mainmenu)\.html? HTTP/ [NC]
RewriteRule ^/(index|mainmenu)\.html?$ http://www.example.com/index.php [R=301,NC,L]

g1smd




msg:4094169
 6:32 pm on Mar 9, 2010 (gmt 0)

The
{3,9} part matches between 3 and 9 characters, literally the GET part of the THE_REQUEST.

Do include the
(\?[^\ ]*)?\ part because you do want to redirect all requests for the filename. It would be an error if a stray parameter on the end of the URL meant the request failed to be redirected. The redirect should strip the parameter; add a question mark to the end of the target URL to strip any parameters.

The
$1/Libraries/$2 should be [b]/[/b]$1/Libraries/$2 to prevent path injection from a malicious request.

As for the last question, be aware that RewriteRule cannot see the leading slash of a request. Also do not redirect to a named index file. Redirect to a new URL that ends with a trailing slash. This will serve the content of the index file (as long as you have
DirectoryIndex index.php set), but the name of the file should not appear in the URL.
hottrout




msg:4094216
 7:14 pm on Mar 9, 2010 (gmt 0)

Thank you for a great explaination. I take it you do not name the file to increase security. What are the possible risks with doing this?

In my htaccess I used this

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

to prevent anyone from reading the contents, does this help?

Once again, many thanks for explaining.

hottrout




msg:4094231
 7:31 pm on Mar 9, 2010 (gmt 0)

Using your code

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*(index|mainmenu)\.(html?|php)(\?[^\ ]*)?\ HTTP/ [NC]
RewriteRule ^(([^/]*/)*)(index|mainmenu)\.(html?|php)$ http://www.example.com/$1 [R=301,NC,L]

I have noticed that at another part of the website if I pass a folder name that has a space in its name to a php script then it fails. Can you explain why and what can be done to fix that.

g1smd




msg:4094258
 8:03 pm on Mar 9, 2010 (gmt 0)

That's simple. Never use a space in a URL.

Use hyphens as word separators in URLs.

You can try various workarounds, redirects, and fixes, including encoding the space as %20 but you'll end up chasing your tail in ever decreasing circles, especially if the %20 gets double-encoded as %2520 or worse.

hottrout




msg:4094741
 3:50 pm on Mar 10, 2010 (gmt 0)

I have started the huge task of renaming all folders to not have any spaces or indeed odd characters.

Thanks for the help. I will return once finished.

g1smd




msg:4094893
 6:58 pm on Mar 10, 2010 (gmt 0)

Although that's a lot of work now, it is the best long term solution.

jdMorgan




msg:4095101
 1:48 am on Mar 11, 2010 (gmt 0)

As for all the regex questions, see the concise regular-expressions documentation linked-to in our Forum Charter.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved