homepage Welcome to WebmasterWorld Guest from 54.166.95.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
infinite redirect loop caused by .htaccess mods
I made some changes to my .htaccess and created an infinite redirect loop
tmacadam




msg:4609398
 9:07 pm on Sep 12, 2013 (gmt 0)

I added some code to my htaccess file in an attempt to:
1) redirect from www to non-www
2) remove the trailing slash from all URLs
3) block some bots

Unfortunately, the code I used for the first 2 objectives created a redirect loop. I commented out the code for the trailing slash, so the redirect loop is taken care of, but I'd like to figure out how to remove the trailing slash w/o causing a redirect loop.

Here's the code for removing the trailing slash:

RewriteRule ^(.*)/$ /$1 [L,R=301]

Here's the code for redirecting from the www domain to the non-www domain:

RewriteCond %{HTTP_HOST} ^www\.example\.tld$ [NC]
RewriteRule ^(.*)$ http://example.tld/$1 [R=301,L]


Any suggestions on how to properly accomplish both objectives?

 

lucy24




msg:4609401
 9:40 pm on Sep 12, 2013 (gmt 0)

2) remove the trailing slash from all URLs

Huh? Trailing slash = directory. No trailing slash (with or without extension) = page. Do your URLs represent real, physical files, or are they the result of rewrites, maybe involving a CMS?

If a request for
/blahblah/
is redirected to
/blahblah
and /blahblah/ is a real, physical directory, then mod_dir will step in and issue a "trailing slash redirect" (that is its official name). You do not want to override this behavior.

Is the redirect loop internal or external? I'm guessing external, meaning that your browser puts up a message saying "this request will redirect forever". An internal redirect loop leads to a 500-class error from the server.

RewriteRule ^(.*)/$ /$1 [L,R=301]

Urk. In htaccess, ^/$ would only occur if the request is for
example.com//
with double slash. That's assuming the browser passed along the duplicate instead of quietly eating it. What you need is something like
^(([^/]+/)*[^/]+)/$
except that as already noted, I'm not sure you really want this at all.

Always give the full protocol-plus-domain in the target of a redirect. Otherwise some requests will get redirected twice if they came in asking for the wrong form of the domain name.

Speaking of which...
1) redirect from www to non-www
2) remove the trailing slash from all URLs
3) block some bots

I hope that isn't the actual order of your rules, because it's precisely backward. If they're all happening within mod_rewrite, the order is:
#1 access control (blocking bots)
#2 redirects, arranged from most specific to most general
#3 with/without www redirect, always the very last redirect (second-to-last is generally the "index.xtn" redirect)

JD_Toims




msg:4609405
 10:33 pm on Sep 12, 2013 (gmt 0)

I personally prefer no trailing slash [or extension] on any URL -- Visitors and search engines don't really care whether a page is technically the default page shown for a directory request or a physical page, so in-my-opinion there's no need to mix and match or have visitors wondering why they can't find http://www.example.com/widget on a type-in, because it's "technically" http://www.example.com/widget/ they wanted.

Sometimes, "technically correct" and "makes sense to normal people" collide and in those cases [this is one of them] I err on the side of "makes sense to normal people".

That said, to do this and not invoke a bunch of extra garbage via .htaccess using -d and/or -f I prefer to use PHP.

Here's code that although untested should be very close to working for removing the trailing slash fairly efficiently:

Note: I did exclude the send-it-to.php from any other rewrites / redirects, because I'm not sure if the extensions are being stripped to make the site truly extensionless or not, so I went with "works in that situation also" over "best for removing a / only".

### .htaccess
#
DirectorySlash Off
#
RewriteEngine on
#
RewriteRule ^send-it-to\.php$ - [L]
#
RewriteRule ^(([^/]+/)*[^/]+)/$ http://example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(example\.tld)?$
RewriteRule .? http://example.tld%{REQUEST_URI} [R=301,L]
#
RewriteRule !^([./])+$ /send-it-to.php?url=$1 [L]
#
###

### send-it-to.php
<?php
error_reporting(0);

if(!isset($_GET['url'])
|| empty($_GET['url'])
|| filter_var('http://example.com'.$_GET['url'].'ext', FILTER_VALIDATE_URL)===FALSE)
) {
header('HTTP/1.1 404 Not Found');
include_once '/the-server-path/to-your/host-directory/404-error-page.ext';
exit;
}

else {
$its_a_file=file_get_contents('/the-server-path/to-your/host-directory/'.$_GET['url'].'.ext');
$its_a_directory=file_get_contents('/the-server-path/to-your/host-directory/'.$_GET['url'].'/index.ext');

if($its_a_file!==FALSE) {
echo $its_a_file;
exit;
}

else if($its_a_directory!==FALSE) {
echo $its_a_directory;
exit;
}

else {
header('HTTP/1.1 404 Not Found');
include_once '/the-server-path/to-your/host-directory/404-error-page.ext';
exit;
}
}

Edit: Made some adjustments to the original where I noticed issues when reading back through -- Still might not be perfect, but it's fairly close to working.

Also: Welcome to WebmasterWorld!

tmacadam




msg:4609440
 4:19 am on Sep 13, 2013 (gmt 0)

Heh...thanks for the replies. I'm not a programmer, so unfortunately I was unable to successfully implement the suggestions.

I did previously read the same comments about using a trailing slash for directories, but in all honesty I think the only people that care are webmasters with lots of coding knowledge (which is a very small percentage of the world). Although I agree with JD_Toims' commments about making sense to normal people, my reasons for always removing the trailing slash are much more simplistic - I figured it'd be easier to code if I ALWAYS removed them ;)

JD, I assume that I'm to put 'send-it-to.php' into my root directory? (have I mentioned yet that I'm a programming n00b?). I'm also not entirely sure what I'm to replace this with:

/the-server-path/to-your/host-directory/

I assume it's:

/home/name/public_html/

but I'm not 100% sure. I also don't have a 404-error-page.ext file that I can find. I do have an error page, but I handle it through Joomla Admin, so I have no clue where it's stored.

Your solution does seem a lot more complicated than my original 1 line of code in .htaccess. Of course, my 1-line of code doesn't work, so complicated may be what is required ;)

Since I'm not a programmer, I've just copyied and pasted code from people who I assume know what they're doing. I'm sure the issue I ran into stems from the fact that I copied & pasted 2 (or more) sets of instructions that were not meant to work together (and yes, I had them listed in my .htaccess in the order I listed, but I've fixed that).

I should also mention that I have the following line of code below "RewriteEngine on":

# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
RewriteBase /
##

I have no clue what this actually does, but I thought it might be another source of conflict with my effort to remove the trailing slash.

It's probably too much to ask to have someone troubleshoot my .htaccess file for me, but I'm going to list it in a follow-up post anyway since it's probably easier to just tell me what to change than try to explain it. I know the car goes faster when I step on the gas...I really don't need to know why ;)

Most of the code I added to my .htaccess file comes from:

[perishablepress.com...]

and

[stackoverflow.com...]

Thanks again for any help you can provide!

tmacadam




msg:4609441
 4:22 am on Sep 13, 2013 (gmt 0)

Here's what I've got in my .htaccess file. I apologize if it looks like a dog's breakfast to those of you who actually understand this stuff:

##
# @packageJoomla
# @copyrightCopyright (C) 2005 - 2012 Open Source Matters. All rights reserved.
# @licenseGNU General Public License version 2 or later; see LICENSE.txt
##

##
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations. It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file. If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's. If they work,
# it has been set by your server administrator and you do not need it set here.
##

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

## Mod_rewrite in use.
RewriteEngine on
##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
RewriteBase /
##

## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Return 403 Forbidden header and show the content of the root homepage
#RewriteRule .* index.php [F]
#
## End - Rewrite rules to block out some common exploits.

# deny access to evil robots site rippers offline browsers and other nasty scum
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
# send em to a virtual blackhole of fake email addresses
RewriteRule ^.*$ http://s2.spampoison.com [R,L]

# blocking a scummy domain
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.example\.com.* - [F,L]

# BLOCK BOTS/SPAMMERS BASED ON THEIR USERAGENT
RewriteCond %{HTTP_USER_AGENT} (sogou|baiduspider|sosospider|larbin) [NC]
RewriteRule .* - [F,L]

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
# remove trailing slash
#RewriteRule ^(.*)/$ /$1 [L,R=301]
#
# permanently redirect from www domain to non-www domain
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
#
## End - Custom redirects

## Improving Performance
# pass the default character set
AddDefaultCharset utf-8
# preserve bandwidth for PHP enabled servers
<ifmodule mod_php4.c>
php_value zlib.output_compression 16386
</ifmodule>

# minimize image flicker in IE6
ExpiresActive On
ExpiresByType image/gif A2592000
ExpiresByType image/jpg A2592000
ExpiresByType image/png A2592000

## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %{REQUEST_URI} !^/index\.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
#php_value memory_limit 256M;

## End - Joomla! core SEF Section.


# secure htaccess file
<Files .htaccess>
order allow,deny
deny from all
</Files>

<IfModule mod_suphp.c>
suPHP_ConfigPath /home/example/public_html
<Files php.ini>
order allow,deny
deny from all
</Files>
</IfModule>

[edited by: phranque at 10:28 pm (utc) on Sep 14, 2013]
[edit reason] unlinked url [/edit]

tmacadam




msg:4609445
 4:25 am on Sep 13, 2013 (gmt 0)

Note that everything seems to work fine as long as I've commented out this line of code:

#RewriteRule ^(.*)/$ /$1 [L,R=301]

However, example.com/directory/ does not redirect to example.com/directory, which is what this code was supposed to do for me.

JD_Toims




msg:4609446
 4:47 am on Sep 13, 2013 (gmt 0)

Ah, didn't realize you were running Joomla -- It'll take me a bit to read through the code and make suggestions, but I'm about done for today -- I'll try to remember this thread and drop back by tomorrow, but no promises [sometimes I have things come up and others I just plain forget].

Fortunately, there are quite a few people here well qualified and willing to help you out even if I don't remember or have time to, so you should be able to get things working.

tmacadam




msg:4609450
 6:18 am on Sep 13, 2013 (gmt 0)

Thanks tons JD! Hopefully you'll get a chance to take a look tomorrow, but I totally understand if you've got other things on your plate!

lucy24




msg:4609458
 7:00 am on Sep 13, 2013 (gmt 0)

Unanswerable question: If the URLs are created by the CMS in the first place, how does the trailing slash even get there, and why can't requests with superfluous slash simply get the 404 they deserve?

RewriteRule !^http://[^/.]\.example\.com.* - [F,L]

HUH?
"If the request is not for
www.example.com/http://something.example.com
then throw it out"
That can't be right. What's missing?

Is the whole htaccess file joomla's boilerplate

:: looking vaguely around for g1smd, who knows this stuff ::

or are there spliced-in pieces from other sources? Flags like [F,L] make me uneasy. They're not wrong, just unnecessary: [F] in all environments implies [L].

CMS htaccess rules in general have a weird fixation with the -d and -f tests. You can get rid of a lot of issues simply by constraining rules:
RewriteRule ^([^.]+)$ et cetera
means the rule will ignore any incoming request with an extension. That covers any non-page file, and potentially also any stray .html files that aren't part of the CMS. It's got enough work to do without also handling misspelled requests for images.

# or the requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]

See? This doesn't belong in the middle of the list of conditions. Put it in the body of the rule-- and then only if your site actually uses every last one of those extensions in its URLs. (.vcf? .raw? Oh, come on. And a new site hasn't had time to pick up a random mix of htm and html.)

Note that everything seems to work fine as long as I've commented out this line of code:

#RewriteRule ^(.*)/$ /$1 [L,R=301]

Do all requests for names ending in / lead to infinite redirects? Or only the ones that happen to be real, physical directories?

g1smd




msg:4609462
 7:20 am on Sep 13, 2013 (gmt 0)

I can't see the code in the
# blocking a scummy domain section ever working.

Lucy, the .vcf etc exclusions are in the default Joomla htaccess file.

tmacadam




msg:4609586
 4:57 pm on Sep 13, 2013 (gmt 0)

Keep in mind that I'm using Joomla precisely because I'm not a programmer. The majority of the code in my .htaccess file is a copy & paste from other sources (mostly [perishablepress.com...] This is why the order is likely out of sync and why there may be conflicting messages.

What I am really looking for with regards to the trailing slash is a redirect. If someone links to example.com/page1/, I want that authority passed along to example.com/page1. If I manually type either URL, I get the same page, but each page has a different Page Authority according to MozRank. I want to combine that authority into a single page (preferably the one without the trailing slash). None of my internal site navigation creates a trailing slash. I got the code for removing the trailing slash here: [stackoverflow.com...]

tmacadam




msg:4609591
 5:04 pm on Sep 13, 2013 (gmt 0)

Do all requests for names ending in / lead to infinite redirects? Or only the ones that happen to be real, physical directories?


Actually, I didn't realize this until just now, but the only page that suffers a redirect loop is example.com/administrator/. All of the other pages redirect properly. Is there a way I can modify this code to redirect all pages EXCEPT /administrator/? Or is there a way to make /administrator a legitimate page?

lucy24




msg:4609623
 7:06 pm on Sep 13, 2013 (gmt 0)

Thought so. See above about mod_dir. Add a Condition to the directory-slash redirect

RewriteCond %{REQUEST_URI} !^/administrator
RewriteRule ^([^.]+)/$ http://www.example.com/$1 [R=301,L]

That's assuming you don't have literal . periods in the path part of any URL. (It's perfectly legal, but doesn't look nice and can lead to RegEx problems, so don't do it unless your name is apache dot org.) If necessary, make it a pipe-delimited group:

!^/(administrator|realdirectory|other-real-directory)

You cannot make
/administrator
a legitimate page, because, well, it isn't a page. The designation
/administrator/
means "The index page of the /administrator/ directory". The name implies that it's the directory where you do your own stuff. So the last thing you want to do is make it easier for others to find it.

A RewriteCond goes before the RewriteRule it belongs to, and it only applies to the immediately following rule.

tmacadam




msg:4609648
 8:44 pm on Sep 13, 2013 (gmt 0)

Thanks lucy!

Sorry to be a total n00b, but where exactly do I place those 2 lines of code? I've placed them before and after the line for removing the trailing slash (as well as several other locations), but no luck. If you could give me the line of code that they should immediately follow, that would be a HUGE help.

And no, I don't have periods in my URLs (other than .com)

lucy24




msg:4609653
 9:18 pm on Sep 13, 2013 (gmt 0)

Your original trailing-slash redirect is in the right place; I've just tweaked the wording. The Condition goes immediately before the rule.

other than .com

A RewriteRule sees only the "path" of an URL, meaning the part after the hostname
www.example.com/
and before the query string, if any
?blahblah=some-value
If you need to look at either of those elements-- for example in a with/without www. redirect-- it has to go in a RewriteCond.

If you are copying and pasting rules from other sources, note that the pattern of a RewriteRule in htaccess always begins
^directoryname
without leading slash. (The ^ anchor may or may not be present.) If you find a rule expressed as
^/directoryname
that's the form for rules lying loose in the config file. Remove the leading slash or the rule will never work. This does not apply to conditions involving %{REQUEST_URI}; those always have a leading slash, no matter where the rule is located.

Pay close attention to anything said by g1smd, as he knows more about joomla's htaccess than anyone else on this forum. This is objectively true.

tmacadam




msg:4609657
 9:41 pm on Sep 13, 2013 (gmt 0)

Perfect!

Thanks again lucy (and all others who contributed). I was adding your 2 lines in addition to the rule I already had. Didn't realize it was a replacement. I've tested it, and everything works as intended!

Concerning your other points about

# or the requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]


That's from the default .htaccess from Joomla 3. I can't imagine that I have any .htm, .vcf or .raw files. Would it be worth removing them from this line of code? Would it improve performance?

By the way, here's the default Joomla 3 .htaccess file (thought it might come in handy to help you answer other questions from other Joomla users). If there are lines of code that I should change in order to improve performance, I'm all ears!

##
# @packageJoomla
# @copyrightCopyright (C) 2005 - 2012 Open Source Matters. All rights reserved.
# @licenseGNU General Public License version 2 or later; see LICENSE.txt
##

##
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations. It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file. If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's. If they work,
# it has been set by your server administrator and you do not need it set here.
##

## Can be commented out if causes errors, see notes above.
#Options +FollowSymLinks

## Mod_rewrite in use.

RewriteEngine On

## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Return 403 Forbidden header and show the content of the root homepage
RewriteRule .* index.php [F]
#
## End - Rewrite rules to block out some common exploits.

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
## End - Custom redirects

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

# RewriteBase /

## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %{REQUEST_URI} !^/index\.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
#
## End - Joomla! core SEF Section.

lucy24




msg:4609689
 1:48 am on Sep 14, 2013 (gmt 0)

RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]

Sure, get rid of any extension that you absolutely don't use. Most people will consistently use either htm or html but not both. Also: unless you've got a weird mishmosh of imported files, drop the [NC] flag. Now the server only has to do half the work: Instead of checking for both C and c it only checks for c; instead of checking for both O and o it only ... et cetera. There's no flag meaning "This element might be either completely capitalized or completely lower-case", so when the server checks for .htm and .HTM it is also checking for .Htm, .hTm and so on. You can do the math :)

This bit
/component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$
is worth a closer look, because it's a lot of pipes-- and pipes have lower* priority than anything else. So the line means

/component/
OR
(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$

which in turn means
/[^.]*$
OR
\.(php|html?|feed|pdf|vcf|raw)$

Frankly this makes no sense to me, so I hope g1 can explain it. It means that the rule applies to
#1 any request for anything in the /component/ directory (which might be located anywhere in the path)
or
#2 anything anywhere that has no extension, or that is itself a directory name, including the root
or
#3 anything anywhere that has any of the listed extensions

What's in the /component/ directory? Seems like the rule would run more smoothly if any directory that is already known to exist is pulled out before you even start evaluating conditions. When a rule's pattern is
.*
the conditions have to be evaluated on every single request. If you could put something-- anything! --into the rule itself, then some requests can zip right past.

But then, the whole point of any CMS is that you don't have to know anything to use it. You must have done a bit of editing already, since the <IfModule...> envelope is missing :) That's probably a good sign. (If you don't have mod_rewrite, you can't use Joomla. Or Drupal, or WordPress, or ...)

Oh yes and: any time you have a target in the form
blahblah.php [L]
(no protocol-plus-domain) give it a leading slash as
/blahblah.php
This is a security measure which g1 will explain. Although possibly not right away, as we live in widely different time zones.


* This terminology is completely counter-intuitive to me but I'm quoting from [regular-expressions.info...] . It means "pipes are evaluated BEFORE everything else".

[edited by: incrediBILL at 1:53 am (utc) on Sep 14, 2013]
[edit reason] fixed link [/edit]

JD_Toims




msg:4609712
 5:01 am on Sep 14, 2013 (gmt 0)

Glad it looks like things got working -- Now just for fun let's see how "too much" some of the stock-standard stuff is and maybe we can make it even a little bit more efficient.

RewriteEngine on

## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
#
# If we don't have a valid query_string that contains base64 we can
# change the below -- No need to use () and store anything or even check
# for anything other than base64 in the query_string.

#
#RewriteCond %{QUERY_STRING} base64 [OR]
#
# Block out any script that includes a <script> tag in URL.
#
# If we don't have a valid query_string that contains < [I don't] we can
# change the below -- No need to store anything or even check
# for anything other than < within the query_string.

#
#RewriteCond %{QUERY_STRING} (?:<|%3C) [OR]
#
# Block out any script trying to set a PHP GLOBALS variable via URL.
#
# If we don't have a valid query_string that contains exactly GLOBALS we can
# change the below -- No need to use () and store anything or even check for
# anything else.

#
#RewriteCond %{QUERY_STRING} GLOBALS [OR]
#
# Block out any script trying to modify a _REQUEST variable via URL.
#
# If we don't have a valid query_string that contains _REQUEST we can
# change the below -- No need to use () and store anything or even check
# for anything other than _REQUEST in the query_string -- We can probably
# even just check for exactly _REQ and be done.

#
#RewriteCond %{QUERY_STRING} _REQ
#
# Return 403 Forbidden header and show the content of the root homepage
#
# If we're going to check every single request we don't need to match
# anything start to finish, so we can change the following to 0 or 1
# characters and then get on with things.

#
#RewriteRule .? - [F]
#
# Of course, if we got really crazy, we could delete all the above
# and do the same thing in two lines.

#
RewriteCond %{QUERY_STRING} base64|_REQ|(?:<|%3C)|GLOBALS
RewriteRule .? - [F]
## End - Rewrite rules to block out some common exploits.

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
## End - Custom redirects

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

# RewriteBase /

## Begin - Joomla! core SEF Section.
#
# We aren't storing anything below for back-reference by using ()
# so there's once again no reason to match anything more than 0 or 1.

#
RewriteRule .? - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
#
RewriteCond %{REQUEST_URI} !^/index\.php
#
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
#
# We can go a bit farther than eliminating non-existent file types here by
# simply being a bit more efficient with the groupings and not "capturing".

#
RewriteCond %{REQUEST_URI} /component/|(?:/[^.]*|(?:\.p(?:hp|df)|html|feed))$
#
#If you're only running Joomla you should be fine deleting the following two
# conditions and adding robots.txt to the index.php exclusion above.
#
# The !-f and !-d checks are *extremely* inefficient, but they're in the
# generic file so if you have "static" files or other files running on the site
# requests for them are not sent to Joomla for processing -- If you don't have
# anything except Joomla, then removing the following conditions and editing the
# RewriteCond %{REQUEST_URI} !^/index\.php
# line above to be
# RewriteCond %{REQUEST_URI} !^/(index\.php|robots\.txt)$
# should be fine and a good efficiency improvement.

#
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
#
# Once again, we aren't storing anything below for back-reference by using ()
# so there's no reason to match anything more than 0 or 1.

#
RewriteRule .? /index.php [L]
#
## End - Joomla! core SEF Section.

JD_Toims




msg:4609714
 5:49 am on Sep 14, 2013 (gmt 0)

Short version of the preceding:

RewriteEngine on
RewriteCond %{QUERY_STRING} base64|_REQ|(?:<|%3C)|GLOBALS
RewriteRule .? - [F]

RewriteRule .? - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

RewriteCond %{REQUEST_URI} !^/(?:index\.php|robots\.txt)$
RewriteCond %{REQUEST_URI} /component/|(?:/[^.]*|(?:\.p(?:hp|df)|html|feed))$
RewriteRule .? /index.php [L]

lucy24




msg:4609718
 7:17 am on Sep 14, 2013 (gmt 0)

# If you're only running Joomla you should be fine deleting the following two
# conditions and adding robots.txt to the index.php exclusion above.

That's another benefit to constraining RewriteRules to specific extensions (potentially including no extension, and/or directory alone) whenever possible. Requests for robots.txt never even touch the RewriteRules.

Which reminds me:
If you have any access-control rules that don't involve mod_rewrite-- most likely in the form "Deny from..." --make sure you have a <Files> envelope for robots.txt that says "Allow from all". Don't give them any excuse to say they couldn't obey robots.txt because they couldn't reach it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved