Forum Moderators: phranque

Message Too Old, No Replies

Correct order of .htaccess ?

         

leko

7:37 am on Nov 1, 2014 (gmt 0)

10+ Year Member



Bear with me, I am a beginner with .htaccess and WordPress. I found each .htaccess rule in google, but I don't know how to combine it into one .htaccess file.

What is the correct order for the following .htaccess ? Can you reply with a demonstration so I can visually see the correct order? It's a lot to read through I know, but I'd really appreciate any help.


 Options +FollowSymLinks

#Specify IP Allowed to Login
<Files wp-login.php>
order deny,allow
deny from all
allow from 12.345.67.891
</Files>

#STRONG HTACCESS PROTECTION
<Files ~ "^.*\.([Hh][Tt][Aa])">
order allow,deny
deny from all
satisfy all
</Files>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

<files wp-config.php>
Order allow,deny
Deny from all
</files>

#Prevent directory browsing
Options All -Indexes

#Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

#Redirect index.html
RewriteCond %{THE_REQUEST} ^.*/index.html [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

#Redirect IP Address
RewriteCond %{HTTP_HOST} ^123\.456\.789\.999$
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

#Force Trailing Slash
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

#Stop Hotlinking
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)example.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|bmp|zip|rar|mp3|flv|swf|xml|php|png|css|pdf)$ - [F]

[edited by: phranque at 8:26 am (utc) on Nov 1, 2014]
[edit reason] exemplified domain [/edit]

phranque

8:39 am on Nov 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, leko!


in general, you should order your directives in this order:
1 - blocking rules
2 - external redirects, from most specific to most general
3 - internal rewrites, from most specific to most general

your order is more like:
- blocking
- internal rewrite
- blocking
- external redirect
- blocking


#Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

...

#Redirect IP Address
RewriteCond %{HTTP_HOST} ^123\.456\.789\.999$
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

i would combine these rulesets and make this the final redirect ruleset:
#Redirect non-canonical hostname requests
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

leko

5:53 pm on Nov 1, 2014 (gmt 0)

10+ Year Member



Thanks for breaking it down for me, it helps me understand. Just curious, what is the consequence of using an "unorganized" .htaccess file - is it just less efficient for website? Will the stuff inside .htaccess still work?

Also, will I need to use Options +FollowSymLinks ? If yes, where would I put this?

lucy24

8:16 pm on Nov 1, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#Prevent directory browsing
Options All -Indexes

Ouch. Is someone in a high-profile position really publishing this? Just yesterday I looked this up
:: insert boilerplate about coincidences ::
and verified that it leads to either "unexpected results" or server not starting, depending on Apache version. Just say
Options -Indexes

This turns off the Indexes option if it was previously on (or leaves it off if it was already off).

Are you on shared hosting? The rule beginning
#STRONG HTACCESS PROTECTION

will do no harm-- but unless you have the world's worst host, it is already in place in the config file so it doesn't need to be repeated. In fact the host would have to be actively malign, because the envelope
<FilesMatch "^\.ht">

is part of the default config file. The case-insensitive parts aren't needed and will slow down the server (by nanoseconds) because you don't actually have a file called .hTAccess or similar.

About order
In Apache, each module is an island. That means that when mod_rewrite reads your htaccess, it looks only at lines beginning "RewriteSomething" and ignores all others. When mod_setenvif comes through, it looks only at SetEnvIf and BrowserMatch lines. And so on. Each module reads its own lines in order from top to bottom.

will I need to use Options +FollowSymLinks

Technically yes, but see above about hosts. If their config file did not already set this option, they would be inundated with support requests from people asking why their htaccess isn't working. The option is inherited. Repeating it will do no harm, but it isn't needed.

Aw, heck, let me just haul out the htaccess boilerplate. I need to fine-tune it anyway and it's been a few months.
Cleaning up an htaccess file

Step 1: Organize. Collect all the directives for each module in one place. The server doesn't care, but you-- and anyone who comes along after you-- will appreciate it.

Tip: Use a text editor with a "Find All" window to pull up all lines beginning with the element "Rewrite..." That takes care of mod_rewrite; dump them all at the end for now.

Step 2: Get rid of all <IfModule> envelopes. Not their contents, just the envelopes themselves. These envelopes are hallmarks of mass-produced htaccess files that have to work anywhere, on any server. You are now on your own site. Any given mod is either available to you or it isn't.

Step 3: Sort by module. The server doesn't care what order the directives are listed in, or even if rules from different modules are all garbled together. Each module works separately, seeing only its own directives. But humans need to be able to find things.

For most people it will be most practical to group one-liners at the beginning:

Options -Indexes


is a good start. If your htaccess file contains only one line, that's probably it. Other quick directives are ones starting with words like AddCharset or Expires. Then list your error documents.

If you have any very short Files or FilesMatch envelopes, put them near the top too. For example:
<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

<FilesMatch "\.(css|js)">
Header set X-Robots-Tag "noindex"
</Files>


Be sure to have an "Allow from all" envelope for your custom 403 page. If you are on shared hosting and they provide default error-document names such as "forbidden.html", this has probably already been done in the config file. But it does no harm to repeat it.

Step 4: Consolidate redirects.

Step 4a: Get rid of mod_alias. If your htaccess file contains any mod_rewrite directives, it can't use mod_alias (Redirect... by that name), or things may happen in the wrong order. For large-scale updating, use these Regular Expressions, changing \1 to $1 if that's what your text editor uses. Each of these can safely be run as an unsupervised global replace.

# change . to \. in pattern
^(Redirect \d\d\d \S+?[^\\])\.
TO
\1\\.

# now change Redirect to Rewrite
^Redirect(?:Match)? 301 /(.+)
TO
RewriteRule \1 [R=301,L]

# and if needed
^Redirect(?:Match)? 410 /(.+)
TO
RewriteRule \1 - [G]

^Redirect(?:Match)? 403 /(.+)
TO
RewriteRule \1 - [F]


Step 4b: Sort your RewriteRules. At the beginning is the single line

RewriteEngine on


A RewriteBase is almost never needed; get rid of any lines that mention it. Instead, make sure every target begins with either protocol-plus-domain or a slash / for the root.

Sort RewriteRules twice.

First group them by severity. Access-control rules (flag [F]) go first. Then any 410s (flag [G]). Not all sites will have these. Then external redirects (flag [R=301,L] unless there is a specific reason to say something different). Then simple rewrite (flag [L] alone). Finally, there may be a few rules without [L] flag, such as cookies or environmental variables.

Function overrides flag. If your redirects are so complicated that they've been exiled to a separate .php file, the RewriteRule will have only an [L] flag. But group it with the external redirects. If certain users are forcibly redirected to an "I don't like your face" page, the RewriteRule will have an R flag. But group it with the access-control [F] rules.

Then, within each functional group, list rules from most specific to most general. In most htaccess files, the second-to-last external redirect will take care of "index.html" requests. The very last one will fix the domain name, such as with/without www.

Leave a blank line after each RewriteRule, and put a
# comment

before each ruleset (Rule plus any preceding Conditions). A group of closely related rulesets can share an explanation.

Step 5: Notes on error documents.

Reminder: ErrorDocument directives must not include a domain name, or else everything will turn into a 302 redirect. Start each one with a / representing the root.

Caution: Since each module is an island, any module that can issue a 403 must have its own error-document override. "Allow from all" covers mod_authzzzz. If you have RewriteRules that end in [F], make sure your 403 documents can bypass these rules.


Final note: If you're using WordPress-- or any other major CMS-- there will be an htaccess section that begins and ends with comment lines saying
# blahblah WordPress blahblah
Leave this section unchanged. If you later fiddle with your WP installation, it will look for this piece. If it's found, they will leave the rest of your htaccess alone. Otherwise the whole htaccess might be overwritten.

leko

11:41 pm on Nov 1, 2014 (gmt 0)

10+ Year Member



Lucy thanks for your suggestions! Yay for finding mistakes! I am a newbie so did not understand all of your tips, but the tips I understood I have applied to an updated .htaccess file (see below). Note: I received help in combining the RewriteRules into one section so it looks different from my .htaccess above.

I have a few questions.

(1) I found this in a tutorial to protect php files, is this correct? I saw another version so I am not sure.

Create a separate .htaccess file and put it in your /wp-content directory.
This code will allow access to images, CSS, java-script and XML files, but deny it for any other type.

# Protect wp-content directory
order deny,allow
deny from all
<files ~ ".(xml|css|jpe?g|png|gif|js)$">
allow from all
</files>


(2) It seems that two of your tips are conflicting. You suggest to NOT change existing WordPress .htaccess section,
but you also suggest removing
<IfModule mod_rewrite.c>
</IfModule>
RewriteBase /

But these are within the original WordPress .htaccess section, so shall I remove or keep these?
Here is the original WordPress .htaccess file. See below for an updated .htaccess

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress


(3) Can you check for unnecessary code or mistakes in .htaccess below?
I do not fully understand .htaccess despite reading/watching tutorials and want to get this right.

Options -Indexes

<files wp-config.php>
Order allow,deny
Deny from all
</files>

<files error_log>
Order allow,deny
Deny from all
</files>

<files readme.html>
Order allow,deny
Deny from all
</files>

<files license.txt>
Order allow,deny
Deny from all
</files>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

#Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ [%{HTTP_HOST}...] [R=301,L]

#Redirect index.html
RewriteCond %{THE_REQUEST} ^.*/index.html [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

#Redirect IP Address
RewriteCond %{HTTP_HOST} ^999\.999\.999\.999$
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

#Force Trailing Slash
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

#Stop Hotlinking
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)example.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|bmp|zip|rar|mp3|flv|swf|xml|php|png|css|pdf)$ - [F]

# only allow 88.888.88.888 IP to access /wp-login.php or /wp-admin/
RewriteCond %{THE_REQUEST} /(wp-login\.php|wp-admin/) [NC]
RewriteCond %{REMOTE_ADDR} !=88.888.88.888
RewriteRule ^ - [F]

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

lucy24

12:39 am on Nov 2, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Create a separate .htaccess file and put it in your /wp-content directory.
This code will allow access to images, CSS, java-script and XML files, but deny it for any other type.

# Protect wp-content directory
order deny,allow
deny from all
<files ~ ".(xml|css|jpe?g|png|gif|js)$">
allow from all
</files>

Sure, you can do that if you like. Just remember that this extra htaccess file is there.

One quibble: Your source seems to be morbidly afraid of the "FilesMatch" locution. I'd replace the quoted rule with
<FilesMatch "\.(xml|css|jpe?g|png|gif|js)$">
Allow from all
</FilesMatch>

The ~ in the original rule is an alternative way to go into Regular Expressions mode. The Apache docs say that "FilesMatch is preferred". (Weird use of passive voice, there. But it's definitely preferred by me.) Either way, note that . should become \. (leading backslash) so it means "a literal period" rather than "any one character".

It seems that two of your tips are conflicting.

Well, that's why my boilerplate is dynamic ;) I've now added an "Exception" line in the original version. Seriously, though, it won't make any difference unless and until you upgrade your WP version, and possibly not even then. It's a question of whether WP then looks for an exact package containing this exact text from beginning to end-- or just looks for the opening and closing # comment lines. The presence or absence of the envelope has absolutely no effect on rule execution; the only question is what happens the next time WP needs to edit your htaccess.

I do not fully understand .htaccess

Nobody does at first. Use your own learning curve. It's always safest if you put nothing in a configuration file-- even htaccess-- that you don't really understand. That is: at least understand what the line does, even if you don't understand every syllable of each directive.

<files wp-config.php>

... et cetera. Who uses these files? Got a nasty feeling someone explained this to me within the past week or two, but it didn't stick. The four <Files> envelopes could also be combined into a single <FilesMatch> but I don't suppose it would save any time, and it's probably easier for you to read and understand if you leave them alone. Each of them means: "If there's a request for a file with this name, anywhere on my site, take such-and-such action".

#Redirect non-www to www

Ugh ... Are you on the most recent WP install? Do they really give this set of rules, in this order? Or did you absent-mindedly paste in rules that were originally located outside the WP section? Otherwise everything is backward, as discussed by various people earlier in this thread.

# only allow 88.888.88.888 IP to access /wp-login.php or /wp-admin/
RewriteCond %{THE_REQUEST} /(wp-login\.php|wp-admin/) [NC]
RewriteCond %{REMOTE_ADDR} !=88.888.88.888
RewriteRule ^ - [F]

I wouldn't do it like that, unless there's some arcane WP-specific reason why you have to. First, since this is an [F] rule, put it before all other RewriteRules (but after the line RewriteEngine on !). Then change it to say

RewriteCond %{THE_REQUEST} /(wp-login\.php|wp-admin/) [NC]
RewriteCond %{REMOTE_ADDR} !^88\.888\.88\.888$
RewriteRule ^(wp-login\.php|wp-admin/) - [F]

I assume the 88etcetera is a stand-in for your own IP address? These two forms are identical:
!^88\.888\.88\.888$
!=88.888.88.888
I just don't normally use = ("lexicographically equal") so I find it confusing to switch back and forth.

Never put something in a RewriteCond that can go in the body of the rule. This applies particularly to names of requested files. Note further that the line about THE_REQUEST is only necessary if the login/admin files can also be accessed in other ways, for example by an internal rewrite or something the WP software does. If nobody but you uses them, you can omit this line because it's then redundant.

mod_rewrite operates on a "two steps forward, one step back" system. Although RewriteConds come before RewriteRules, the server doesn't read them beforehand. It only looks at Conditions if the rule itself could potentially apply-- meaning, when the request is for a filename that matches what's given in the "pattern" of the rule (the part on the left).

Compare:
RewriteCond %{REQUEST_URI} hogwash
RewriteRule . - [F]

and
RewriteRule hogwash - [F]

Both rules do exactly the same thing: If someone comes along asking for a file whose name includes "hogwash", slam the door in their face. But in the first version, the server has to go back and evaluate the Condition on every single request, for any file, ever. In the second version, the filename is included in the rule. When the request is for something else, the server can proceed on its merry way.

leko

2:17 am on Nov 2, 2014 (gmt 0)

10+ Year Member



Regarding your second post on Nov 2:

Are you on the most recent WP install? Do they really give this set of rules, in this order? Or did you absent-mindedly paste in rules that were originally located outside the WP section?

Yes I'm using the latest WordPress 4.0 but most of those rules were pasted inside the WP section and were NOT originally there. Didn't mean to confuse you. Actually someone put together that .htaccess file for me. I reread your first post and I see what you mean by "rules are backwards".

Note further that the line about THE_REQUEST is only necessary if the login/admin files can also be accessed in other ways,
for example by an internal rewrite or something the WP software does. If nobody but you uses them, you can omit this line because it's then redundant.

#Redirect index.html
RewriteCond %{THE_REQUEST} ^.*/index.html [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

Do you mean to completely delete line 2 (RewriteCond)? Is the following correct:

#Redirect index.html
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

What is the difference between the below? In terms of ordering rules, which one should be near the top?
[R=301,L]
[L,R=301]


Regarding your first post on Nov 1:

Then, within each functional group, list rules from most specific to most general.

Is this order correct?

#only allow 88.888.88.888 IP to access /wp-login.php or /wp-admin/
#Stop Hotlinking
#Redirect IP Address
#Redirect non-www to www
#Force Trailing Slash
#Redirect index.php
#Original Wordpress section

In most htaccess files, the second-to-last external redirect will take care of "index.html" requests. The very last one will fix the domain name, such as with/without www.

#Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ [%{HTTP_HOST}...] [R=301,L]

#Redirect index.html
RewriteCond %{THE_REQUEST} ^.*/index.html [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

#Redirect IP Address
RewriteCond %{HTTP_HOST} ^123\.456\.789\.999$
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

#Force Trailing Slash
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

This was suggested from your first post. Can you show me which ones to keep by copy/pasting so I can be sure to not misinterpret?

lucy24

5:44 am on Nov 2, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you mean to completely delete line 2 (RewriteCond)?

You can omit the line about %{THE_REQUEST} if and only if the file is not used in any other way than by directly accessing it. This question is better answered by someone who knows WordPress. (not2easy? You out there?)

Is the following correct:

#Redirect index.html
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

Not yet, because it will lead to an infinite loop. I'll come back to this.

What is the difference between the below? In terms of ordering rules, which one should be near the top?
[R=301,L]
[L,R=301]

There is absolutely no difference whatsoever. All flags are the same; ordering of flags has no effect. For your own sanity, keep them in the same order throughout. I use [R=301,L]. Note that many flags, such as [F], carry an implied [L]. But [R] (with or without number) does not, so you also have to say [L]. Unless you have some specific and particular reason for omitting the [L]. In practice this only happens if your name is JDMorgan.

Is this order correct?

#only allow 88.888.88.888 IP to access /wp-login.php or /wp-admin/
#Stop Hotlinking

The 88.888 etcetera rule ends in [F], so it can go anywhere among the [F] rules, which should be grouped at the beginning of your RewriteRules. Same applies to the no-hotlinking rule (which should list jpg|png etc in the body of the rule, not in a Condition). There is almost always an optimal order-- but when you have a bunch of separate and unrelated rules all ending in the same flag, it isn't worth stressing over. Save that for when your site gets millions of daily visitors and every nanosecond of server time matters ;)

#Redirect IP Address
#Redirect non-www to www

These two are the same rule, unless I misunderstood one of your earlier posts. It is called the domain-name canonicalization redirect and looks like this:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

This is always the very last of your redirects, because it only covers requests that weren't already wrong in some other way. Ideally it will also be the only rule that doesn't say anything about the URL; it's a universal (.*) meaning "anything or nothing" where "nothing" is a request for the root. The rule means "If there is a request for anything whose hostname is not exactly 'www.example.com' or exactly nothing, redirect to the form I want". For "www.example.com" substitute the preferred form of your own domain name, with or without www. Note that it's expressed as a negative because, as Tolstoy tells us, correct requests are all alike, while every incorrect request is incorrect in its own way.

#Force Trailing Slash
#Redirect index.php
#Original Wordpress section

The trailing-slash redirect should come immediately before the WordPress section. There is undoubtedly a WordPress plugin that does the same thing-- but it will be less efficient that rolling your own. Do any of your URLs contain literal periods? (Domain name and extensions don't count.) If no, the rule is extremely easy:

RewriteRule ^([^.]+[^./])$ http://www.example.com/$1/ [R=301,L]


It means: "If there's a request for anything containing no periods at all, and its last character is not a slash, then add one."

Is this a brand-new site? Have you done your research and determined that a final slash-- for files that are, in fact, not directories but pages-- is the best way to go? It's not what I would do, but admittedly I don't do WordPress. You might ask in the WP forum for advice on the most convenient URL format. It is much easier to make these decisions before the site goes live.


This brings us to the vexed issue of index.whatever. In all rules, say
index\.(php|html)

where .php is the form that actually occurs, while .html is what search engines will ask for as soon as they see those final slashes. (They really do this; it's not hypothetical.)

The pattern should look like
^(([^/]+/)*)index\.(html|php)
or
^([^.]+/)?index\.(html|php)
depending on whether your URLs can ever contain literal periods. There's no absolutely perfect wording; the object is to keep the server from capturing all the way to the end and then have to backtrack. Option 3 involves a RewriteCond. This may be the best in your case, where there's a CMS involved.

In a hand-rolled HTML site, the index redirect takes a third flag:
[R=301,L,NS]
The [NS] flag means "don't invoke this rule if the request for 'index.html' came from mod_dir". That's how you avoid an infinite loop. In your case you will instead need a RewriteCond looking at {THE_REQUEST}, because WordPress itself will be doing the asking, and that doesn't count as a subrequest

Full rule will look something like
RewriteCond %{THE_REQUEST} [A-Z]{3,9}\ /(([^/]+/)*)index\.(html|php) 
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

Note that I've shifted the capture from Rule to Condition. This saves the server from having to capture anything on that vast majority of requests that don't end up involving index.something. This rule goes right next to the redirect that adds a directory slash. Before or after doesn't much matter, as they're mutually exclusive.


The WordPress boilerplate has two rules:
RewriteRule index\.php - [L]

meaning "If there's a request for 'index.php', stop here and don't continue to the next rule." This applies only to internal requests, because external requests have already been forcibly redirected. So here you don't need a condition.

The second WordPress rule-- the part that makes me foam at the mouth-- says
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule . http://www.example.com/index.php [L]

meaning "If there's a request for ANY file or directory that doesn't physically exist, dump it on WordPress (under the name 'index.php').

Now, those -f and -d tests are server-intensize, because the server has to physically look for the file on every single request. (This information can't be cached, because you might have added or deleted a file at any time.) If you can do so, constrain the rule to requests for pages. This is admittedly easier if you go with extensionless URLs with no final slash.


Almost forgot one thing. The moment you create RewriteRules in [F] you need one more thing:
RewriteRule ^forbidden\.html - [L]

substituting the exact name and URLpath of your custom 403 page, assuming you've got one. (If the page is created by WP, things get more complicated.) Otherwise there will be an infinite loop. The unwanted visitor does get locked out, but only after the server has made 30 attempts to send out the 403 page.


Whoops! Gotta go. YouTube loop has reached my favorite comedy number [youtube.com].

[edited by: phranque at 12:00 am (utc) on Nov 4, 2014]
[edit reason] typofix [/edit]

leko

7:58 am on Nov 2, 2014 (gmt 0)

10+ Year Member



Yoohoo Lucy are you back from youtube? I have to pick your brain again!

Have you done your research and determined that a final slash-- for files that are, in fact, not directories but pages-- is the best way to go?

I was debating which was best. Personally I dislike the slash but so many SEO tutorials suggested the slash, because:

"Usually, domain.com/link/ indicates a directory than a file. Whereas domain.com/link indicates a file. But which file? The links without file extension make browsers think about what the server returns. The server will first check if any such link exists. Then it will return the respective output. It can be HTML, text, image, a file or even a redirect. Hence browsers also have to check what is returned. This all increases the processing time and overall HTTP requests. Hence there a lag caused."

Is this true?

The moment you create RewriteRules in [F] you need one more thing:
RewriteRule ^forbidden\.html - [L]
substituting the exact name and URLpath of your custom 403 page, assuming you've got one. (If the page is created by WP, things get more complicated.) Otherwise there will be an infinite loop. The unwanted visitor does get locked out, but only after the server has made 30 attempts to send out the 403 page.

Is this code mandatory? I didn't make a 403 page. Is there a problem with letting WordPress use one of it's templates to display a 403 error? Yesterday I forced a 403 error and landed on a page displaying the text "403 error" using my archive.php template within my WordPress theme.

Does this mean I do not need this RewriteRule?

which should list jpg|png etc in the body of the rule, not in a Condition
# Stop Hotlinking
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)example.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|bmp|zip|rar|mp3|flv|swf|xml|php|png|css|pdf)$ - [F]

Wait, are you saying there is a mistake in my #Stop Hotlinking? Or are you simply pointing out a general tip?

#Redirect index.php
RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php)
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

Originally I had:

#Redirect index.php
RewriteCond %{THE_REQUEST} ^.*/index.php [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

In your URL you have /%1 instead of /$1, is the "percent sign" a typo?
And can you explain the difference in the two versions? Is your version more efficient, therefore the website will load faster?
In your version, should I put [NC] at the end of RewriteCond, like this?:

RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php) [NC]

#Domain-name canonicalization redirect
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Notice in the version below, the RewriteRule is slightly different. Are (.*) and ^(.*)$ interchangeable?

#Domain-name canonicalization redirect by
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

The trailing-slash redirect should come immediately before the WordPress section.
domain-name canonicalization redirect is always the very last of your redirects

Wait, is #Original Wordpress section considered a "redirect"? Therefore anything with "RewriteRule" are considered "redirect", correct?

Is it true that #Original Wordpress section takes Spot 4 because it's an internal rewrite? But according to your suggestion above, are you suggesting #Domain-name canonicalization to take Spot 4?

Also where would I put
RewriteRule ^forbidden\.html - [L]

And is this order correct?

#Redirect index.php
#Domain-name canonicalization redirect
#Force Trailing Slash
#Original Wordpress section

lucy24

8:37 pm on Nov 2, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally I dislike the slash but so many SEO tutorials suggested the slash

Heck, then don't use it. Most "SEO tutorials" aren't worth the paper they're printed on. And, since most of them don't exist in print, that means...

:: memo to self: ask appropriate WebmasterWorld member if the foregoing is a reasonable thing to say::
The links without file extension make browsers think about what the server returns. The server will first check if any such link exists. Then it will return the respective output. It can be HTML, text, image, a file or even a redirect. Hence browsers also have to check what is returned. This all increases the processing time and overall HTTP requests. Hence there a lag caused."

Is this true?

No, it is nonsense. Scratch that site off your Trusted Information list. The server does not have to check anything, beyond the single "Does this file exist?" that is an integral part of every non-403 request ever. Your own htaccess/config will rewrite to supply the appropriate extension. What the browser does is its own lookout, but it has absolutely nothing to do with server resources. The browser's own processing time is so short compared to everything else that the user will not even notice. Possibly it was an issue for people using MSIE 3.

Is this code mandatory? I didn't make a 403 page.

If WP itself handles 403s appropriately, then you don't need a rule. But you should ask next door in the WP forum; I don't really know how WP-internal stuff works. Make sure WP's 403 page displays as intended in both of these situations:
-- a 403 created by something other than mod_rewrite, such as saying "Deny from {your own IP}"
-- a 403 created by mod_rewrite, such as denying requests for admin or login pages

are you saying there is a mistake in my #Stop Hotlinking? Or are you simply pointing out a general tip?

General tip. WebmasterWorld's software makes it a little hard to see what you're replying to. Especially when you're composing offline.

RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php)
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

Originally I had:

#Redirect index.php
RewriteCond %{THE_REQUEST} ^.*/index.php [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

In your URL you have /%1 instead of /$1, is the "percent sign" a typo?
And can you explain the difference in the two versions? Is your version more efficient, therefore the website will load faster?

No, it's not a typo. A % percent sign means the capture came from the final Condition, while a $ means it came from the body of the rule. I figured that as long as you have to have a condition, you might as well do the capturing there instead. The act of capturing uses up server resources, so why bother to do it when the request won't end up using the capture?

A server essentially operates in one dimension. It can't look ahead to the end of the rule as say "Oh, this one's about 'index.html' requests so it doesn't apply here." Think of it like moving along a piece of varicolored string. "Take such-and-such action if the next color after blue is red" but you can't unroll the ball and see what's coming up; you don't know until you get there.

In your version, should I put [NC] at the end of RewriteCond, like this?:
RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php) [NC]

No. If someone asks for InDex.HtMl they deserve a 404. The intial [A-Z] part is because that's how a request is structured. Most of the time it will say GET, but it could conceivably say HEAD. (Also a bunch of other things like POST or PUT that are dealt with in other places.)

Are (.*) and ^(.*)$ interchangeable?

By default, a Regular Expression will start as soon as it can, and go on for as long as it can. So anchors are only necessary if you're naming some specific content. (.*) and ^(.*)$ both mean "everything you see".

is #Original Wordpress section considered a "redirect"? Therefore anything with "RewriteRule" are considered "redirect", correct?

No, let's spell this out. Long version, using what linguists call double markedness:
Redirect = external redirect = send a message back to the browser, telling them to make a fresh request
Rewrite = internal rewrite = stuff happens inside the server that the browser doesn't know about
In mod_rewrite, a Redirect is anything that has the [R] flag (with or without 3xx number) and/or target starting in full protocol-plus-domain. Anything with the [L] flag alone is a rewrite. Technically anything with no flag at all is also a rewrite, but normally you won't see those.

Is it true that #Original Wordpress section takes Spot 4 because it's an internal rewrite? But according to your suggestion above, are you suggesting #Domain-name canonicalization to take Spot 4?

Well, you can't expect me to keep track of numbers if they're not included in the htaccess file ;) The overall ordering goes:
-- anything with [F] flag
-- anything with [R] flag
-- anything with [L] flag alone
The WP package involves [L] only, so it goes after all rules with [R] flag.

Also where would I put
RewriteRule ^forbidden\.html - [L]

This rule is an exception to the normal F-R-L ordering. It has to go before any other rules, because it means "Go ahead and serve up the 403 page even if all other requests from this client are blocked". Note that "forbidden.html" here is a stand-in for whatever your actual 403 page is called. If you don't have a physical page and it's all handled in WP, you do not need this rule. Ask about this in the WP forum. In fact, if I remember I will go ask someone myself.

is this order correct?

#Redirect index.php
#Domain-name canonicalization redirect
#Force Trailing Slash
#Original Wordpress section

The trailing-slash redirect should go before the domain name. It can be either before or after index.(php|html). Domain-name-canonicalization is always the very last external redirect. Note again that the index redirect should say (php|html) even if "index.html" never actually occurs because search engines will ask for this page. It's better to grab these requests in htaccess rather than put WP to the extra work of dealing with them.

:: now off to ask about WP handling of 403 ::

leko

9:48 pm on Nov 2, 2014 (gmt 0)

10+ Year Member



No, it is nonsense. Scratch that site off your Trusted Information list. The server does not have to check anything, beyond the single "Does this file exist?"

Thanks for asking on my behalf! Yay, now I can get rid of this ugly trailing slash!

Make sure WP's 403 page displays as intended in both of these situations:
-- a 403 created by something other than mod_rewrite, such as saying "Deny from {your own IP}"
-- a 403 created by mod_rewrite, such as denying requests for admin or login pages

Forget what I said about my WordPress 403 page - I was wrong. What I should have said is: Because I do NOT have a 403 page, WordPress actually delivers a 404 page. Every WordPress theme usually comes with a 404 page called 404.php
Now that I have cleared that up, I have to ask again, is it mandatory to have a 403 page? In other words, is it okay to be a lazy bum and use a 404 page? If no, why is it mandatory?

If 403 pages are mandatory... Are you saying that I need a 403 page for stuff regarding <file> ? And another 403 page for everything after RewriteEngine On ?

In order to accomplish two different 403 pages dependent on the error, would I create two 403 pages?

403-deny-IP.php
403-deny-login.php

and would I create two RewriteRule? And would I put these two RewriteRule immediately BELOW RewriteBase / and also BEFORE any other rules?

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

# 403 Pages
RewriteRule ^403-deny-IP\.php - [L]
RewriteRule ^403-deny-login\.php - [L]

# Only allow 88.888.88.888 IP to access /wp-login.php or /wp-admin/
RewriteCond %{THE_REQUEST} /(wp-login\.php|wp-admin/) [NC]
RewriteCond %{REMOTE_ADDR} !^88\.888\.88\.888$
RewriteRule ^(wp-login\.php|wp-admin/) - [F]

But how does the website/htaccess/whatever know which 403 page to deliver? It seems like the 403 RewriteRule is sitting in their own little island (in other words, there is space above and below to indicate separate rules so therefore they are on an "island"). For example, does it work like php conditional statements:

If (hungry) {
eat sandwich
}

It seems like the 403 RewriteRule are missing the If (hungry) part, or is this just how htaccess works?

In your version, should I put [NC] at the end of RewriteCond, like this?:
RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php) [NC]

No. If someone asks for InDex.HtMl they deserve a 404.

Hmm what if someone enjoys using capslock? Because of capslock fanatics, would you use [NC] ?

lucy24

10:19 pm on Nov 2, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Capslock fanatics deserve everything they get ;) Can't do it myself, because I use a bilingual keyboard that switches to a different script when you put down Caps Lock. So the only way I can SCREAM IN ALL CAPS is by physically holding down the Shift key. But honestly, why would you ever want to encourage them? If you wanted to be really evil, you could make a rule that says

RewriteRule ^[A-Z./-]+$ http://www.example.com/capslock.html [R=301,L]


and then the "capslock.html" page would be written in ALL CAPS and say something like "TURN OFF YOUR ### CAPS LOCK BECAUSE I CAN'T HEAR MYSELF THINK!"

Don't mind me, I'm just blathering.

Note that this digression offered a redirect, not a bona fide error document. If you're dealing in physical pages, there can only be one error document per error type. One 403 page, one 404 page and so on.

Exception: you can set different ErrorDocument statements in different directories. In htaccess this is done by making an extra htaccess file, contining only the ErrorDocument directive, and putting it in the directory that you want to behave differently. I've never done this with 403, though I do have one directory with its own 404/410 page.

It's often appropriate to use the same physical page for both 404 and 410 errors, since your average human doesn't much care whether a given page used to exist or not. They only care that it ain't there now.

403 pages are a whole nother issue-- one that comes up occasionally in unrelated threads and garners a range of opinions. What follows is mine. (Duh. Who else's would it be?)

-- 403 pages are for humans. The server sends the page out-- if it can-- on every 403 response. But it's the rare robot that bothers to read past the "403" response header. (To be safe: If you do have custom error documents, include a "noindex" meta in the <head> section of each.) Some people do avoid putting any links in a 403 page so as not to provide the robot with more information. My opinion is that the robot isn't getting in anywhere anyway, while a human might appreciate the pointer.

-- a human getting a 403 response does not know they've done something wrong. Their only offense might be living in China, or sharing an IP address with an infected browser. So pages that say "Get out of my sight, you horrible Ukrainian robot" aren't appropriate.

-- a human may not even know that they've been categorically blocked. The most likely reason for a human to meet a 403 is if they're backtracking along an URLpath and inadvertently request a /directory/ that happens not to have an index file. In fact, before I had a www site of my own, I thought of 404 and 403 as "no file" and "no directory" respectively.


It may help to understand that error documents of any kind are actually a special kind of rewrite, except that they don't use mod_rewrite. If you've ever got a "page not found" message, you know that your browser's address bar still shows the name of the URL you requested. (If it shows the actual URL of the 404 page, like example.com/missing.html, the site has made a technical error.) The server notes the error, checks whether it's got an ErrorDocument directive matching that number, and sends back the appropriate page. This request for an error document is processed just like any other request: the original human's IP is still attached, the referer is still attached, the User-Agent is still attached, and so on. So if the original request came from, say, a blocked IP, the follow-up internal request for the 403 page will also be blocked-- unless you've got a rule that exempts requests for the 403 page. And, since more than one mod can issue a 403, you need a separate exemption in each module's own language.

leko

12:39 am on Nov 3, 2014 (gmt 0)

10+ Year Member



RewriteRule ^forbidden\.html - [L]

Tutorials use "ErrorDocument" instead of "RewriteRule" - are they interchangeable? For example:

ErrorDocument 403 http://www.example.com/error-403/

It's often appropriate to use the same physical page for both 404 and 410 errors, since your average human doesn't much care whether

403 pages are a whole nother issue

403 pages are for humans.

I don't understand.
Line 1 you say most humans don't know the difference between 404 and 410, so you can use same error page. Understood.

But, Line 2 you say 403 is not the same logic as Line 1, but is contradicted by Line 3. How can 403 be both an error page for humans and a "whole nother issue"...403 page is simply to notify the human that there is an error, that's all right? Am missing something?

But, Line 3 you say 403 pages are for humans. Okay, but if it is for humans, why can't we use the same logic as Line 1?

But it's the rare robot that bothers to read past the "403" response header.

Hmm so 403 pages are for both humans and robots? By robots, do you mean googlebot and bingbot etc?
Serving a 404 page for a 403 error would confuse a robot, is this what you are saying ?


How My Website Displays 403 Error currently

I force a 403 error by going to www.example.com/wp-includes/ (I replaced "example" with my real domain).
I am redirected to www.example.com/403.shtml
Notice the 403 in the URL. It displays a 404 page because my theme has a 404.php file

According to a tutorial:

"Earlier I had written an article on custom error pages for Apache but doing the same with WordPress is not that straightforward. WordPress has the ability to handle 404s internally but doing the same for other 4xx errors requires modifying the code."

What is he saying? But my website is displaying 403 except by using 404.php

I'm not sure what my end goal is. Am I creating a custom 403 page?


My Interpretation of Lucy's Instructions for 403 Error

1. Create a 403.php and paste in WordPress theme folder next to 404.php
2. Paste #403 Page RewriteRule before any rules such as

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

# 403 Page
RewriteRule ^403.php - [L]

But I still see 404 page when I access www.example.com/wp-includes/


I googled for "custom 403 error page" tutorials

One tutorial suggested:
1. Create a page in WordPress
2. Name the page whatever such as "You are not allowed here"
3. Name the URL slug whatever such as www.example.com/access-denied
4. Go to .htaccess file and paste "ErrorDocument" line under WordPress section (see below)

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

ErrorDocument 403 http://example.com/access-denied

6. Go to www.example.com/wp-includes/ to test the 403 page
7. Finished

But I still see a 404 page when I access www.example.com/wp-includes/

I understand you may not be familiar with WordPress, but am I making a mistake here?

I'm still not clear, what are the consequences of simply using 404.php to serve a 403 error ? Am I going to confuse The Humans or The Robots (google, yahoo, bing robots) ?

[edited by: phranque at 1:19 am (utc) on Nov 3, 2014]
[edit reason] exemplified domain [/edit]

lucy24

3:10 am on Nov 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got a nasty feeling phranque will have answered the original question by the time I finish typing, but oh well.

RewriteRule ^forbidden\.html - [L]

Tutorials use "ErrorDocument" instead of "RewriteRule" - are they interchangeable? For example:
ErrorDocument 403 http://www.example.com/error-403/ 

No, you've misunderstood, these are entirely different rules for different actions.

The ErrorDocument directive tells the server what document to send out along with the 403 response. If there is no such line, the server generates its own message-- not very helpful to humans.

The RewriteRule is an instruction about what to do when there is a request for the document called, for example, "forbidden.html". It needs a special rule because otherwise requests for the 403 page will be blocked right along with all other requests, and then the server will go into an infinite loop. It goes like this:

-- unwanted visitor asks for some page
-- server finds rule saying not to let them in
-- server looks for the custom 403 page to send out instead
-- server internally requests 403 page
-- this request passes through all the same rules as the original page request
-- server finds rule saying "don't admit this person"
-- server internally requests 403 page to go with 403 response
-- this request passes through et cetera, typically for 30 iterations before server gives up

You can see this in logs if you have a correctly worded ErrorDocument directive without poking a hole for the named document. There will be 30 iterations of the identical line.

If you have an incorrectly worded directive, you will instead end up getting an error message from the browser saying "This request is going nowhere fast". And that is the case with your quoted ErrorDocument line. It contains a hideous error, turning all 403 responses into 302 redirects. I hope this wasn't posted in some public place where other people will copy it.

The ErrorDocument line should look like this:
ErrorDocument 403 /forbidden.html

or, if all your error documents are collected in a subdirectory,
ErrorDocument 403 /boilerplate/forbidden.html

Note that each one begins with a leading / representing your root. No protocol, no domain name! This is an instruction to the server; no human will ever see this URL, and it will not be sent out to a browser. (The page goes out. The URL doesn't.)

Put the line near the top of your htaccess so it's easy to find. This is purely for your own sanity; there is no effect on execution.


I asked moderator not2easy about the WordPress 403 issue, because she knows more about WP than I do. Admittedly it would be hard to know less, but never mind that.

Apparently WordPress doesn't have a solid, built-in means of handling 403 responses. (Because of the nature of a CMS, it is essential that they have a way to handle 404s.) This means that you are probably better off building your own 403 page, giving it a name and parking it on the server. Once this page physically exists, it will automatically be exempt from all WordPress activity because of WP's inherent !-f rule. But you still need a RewriteRule with [L] flag so the page can be sent out at all.

Each module that issues 403s needs a separate exemption to cover requests for the error document. In the present discussion we've been talking about 403s that were originally issued by mod_rewrite, such as requests for the admin page* from anyone other than you. 403s issued by other mods need a different type of exception. For example a lockout from mod_authzzzz ("Deny from...") gets a piece like this:
<Files "forbidden.html">
Order Deny,Allow
Allow from all
</Files>

See above about each module issuing its own 403s. Or was that in some other thread? I forget...


* Edit: This specific situation would not trigger an infinite loop, because the requested page is what leads to the 403. A request for any other page is fine. But you can't rely on this all the time; most lockouts involve other elements such as user-agent or referer, and those don't change.

leko

4:31 am on Nov 3, 2014 (gmt 0)

10+ Year Member



Lucy I appreciate your help. Unfortunately, I'm still confused. But I think multiple choice answers to these questions will help me better understand.

Just like how you can jay walk if no one is looking, you are allowed to display a 403 error using 404.php . It's frowned upon but whatever.
(a) True
(b) False

It is mandatory to set up a 403 error page because
(a) htaccess/server will be angry
(b) human will be angry
(c) robot will be angry (googlebot, bingbot, etc)
(d) something about triggering an infinite loop

Breaking Down Lucy's Instructions:

Build your own 403 page, give it a name and park it on the server.

Let's see if I understand...

(1) Open text editor
(2) Type inside the text editor: Error 403 - Access Denied
(3) Save as: forbidden.html
(4) What does it mean to "park it on the server"? Save it in my theme's folder (the same location as my theme's php files and 404.php) ? Or save it in public_html (the same location as .htaccess and wp-config.php) ?

Once this page physically exists, it will automatically be exempt from all WordPress activity because of WP's inherent !-f rule.
But you still need a RewriteRule with [L] flag so the page can be sent out at all.

(5) Okay, so here you are saying to use this: RewriteRule ^forbidden\.html - [L]
and paste before any other rules like this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

# 403 Page
RewriteRule ^forbidden\.html - [L]

Each module that issues 403s needs a separate exemption to cover requests for the error document.
<Files "forbidden.html">
Order Deny,Allow
Allow from all
</Files>

(7) Okay so is this the correct order of how I would combine the code so that it would cover the two 403 scenarios?

<Files "forbidden.html">
Order Deny,Allow
Allow from all
</Files>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

# 403 Page
RewriteRule ^forbidden\.html - [L]

lucy24

4:46 pm on Nov 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not ignoring you and neither is anyone else. We've just hit a bad time period (first Monday of the month).

leko

5:55 pm on Nov 3, 2014 (gmt 0)

10+ Year Member



Everybody put your pencils down! Multiple choice Quiz is cancelled!

All along I was testing www.example.com/wp-includes/ which was showing 404 instead of 403 page
But this time I tested all the other stuff in my .htaccess
such as www.example.com/readme.html and www.example.com/wp-config.php
and there it was - a default 403 page!

I am on shared hosting and I later noticed on cPanel there is an area to customize error pages. I created custom 403 page and now www.example.com/wp-includes/
will show a 403 page instead 404

Phew. Glad that's over with.

Lucy thanks for your help ...and your humor. Phranque thanks for your help too!

leko

7:36 pm on Nov 3, 2014 (gmt 0)

10+ Year Member



One question.

Is this correct? It is the original
#Redirect index
except I replaced html with (html|php)

#Redirect index
RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]



I have to use the above because the below triggered a 500 Internal Server Error for my Shared hosting account. On an Enterprise hosting account (basically a more expensive shared plan), the below code worked fine, so I have to use the above. The above does not trigger error, but is it correct?

#Redirect index by Lucy
RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php)
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

leko

7:38 pm on Nov 3, 2014 (gmt 0)

10+ Year Member



oops I forgot to replace the second html. Is this correct:

#Redirect index
RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.(html|php)$ http://www.example.com/$1 [R=301,L]

not2easy

8:48 pm on Nov 3, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



All along I was testing www.example.com/wp-includes/ which was showing 404 instead of 403 page
But this time I tested all the other stuff in my .htaccess
such as www.example.com/readme.html and www.example.com/wp-config.php
It looks like your WordPress install is in the root directory.

Some easy Yes/No questions to try to simplify:
Your site uses Wordpress for everything (home page, posts, articles, categories, etc)?
WP is installed in your root directory?
Does your site use any .html pages?
(If not, there's no reason to tell the server to look for them.)

#Redirect index
RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.(html|php)$ http://www.example.com/$1 [R=301,L]

If there is no index.php or index.html in your root directory, you don't need this rule, it is handled inside the WP envelope.

Don't enclose everything in your htaccess file inside the WP envelope, that is the part that *might* be overwritten with a major update. Do not edit the section that begins with:
# BEGIN WordPress
<IfModule mod_rewrite.c>

and ends with
</IfModule>
# END WordPress

That part is generated by WordPress and it is based on the settings you entered in config and during install and in the Settings file. Whatever else you want in your htaccess file goes before this WP section.

Use the www rewrite to keep WP from needing to process every request.

and there it was - a default 403 page!
Yes, the server has a default 403 - almost every host gives you some basic error pages like 404, 403, 500. The whole point of creating your own to replace those with (and telling the server where they are) is to create a better, more useful user experience. A standard server 404 page does not help anyone find what they came to your site for, a 404 page with a smile helps them with the most likely links to get back to what they wanted to do instead of closing the browser and going back to search somewhere else. Same with a custom 403 that lets people know that sometimes mistakes are made and you apologize for the inconvenience and maybe offer a way to contact you about the problem.

lucy24

11:46 pm on Nov 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the below triggered a 500 Internal Server Error for my Shared hosting account.
<snip>
RewriteCond %{THE_REQUEST} [A-Z]{3,9} /(([^/]+/)*)index\.(html|php)
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

Oh, ###, I'm so sorry. I've just spent five minutes staring at this before the error finally jumped up and bit me. Literal spaces have to be escaped. To make it work, the Condition has to say:
[A-Z]{3,9}\ /(([^/]+/)*)index\.(html|php)

See the
\ (backslash followed by space)
? This is a special rule for Apache because spaces have syntactic meaning. In some situations you can instead enclose the material in quotation marks. But here you need to escape the space.

RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

It is never, ever necessary or appropriate to say
^.*

when you're not capturing. It means "there might be some stuff after the beginning and before the specific text I'm naming". Just leave off the anchor and the .* so you can proceed directly to /index\.


I'm going to ask phranque to edit my earlier post. This may cause confusion, but it's better than people coming along later and copying-and-pasting code with an error.

leko

11:52 pm on Nov 3, 2014 (gmt 0)

10+ Year Member



Thanks not2easy. I have additional questions.

Your site uses Wordpress for everything (home page, posts, articles, categories, etc)?

Yes.

WP is installed in your root directory?

I think so. Here is the path to my theme:
Go to cPanel > Files Manager > public_html > wp-content folder > themes folder > TwentyFourteen Theme folder

Does your site use any .html pages?

No. All php files and one css file. index.php single.php archive.php search.php style.css etc.

#Redirect index
RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.(html|php)$ http://www.example.com/$1 [R=301,L]

If there is no index.php or index.html in your root directory, you don't need this rule, it is handled inside the WP envelope.


There is an index.php within public_html (in other words index.php lives within the same folder as .htaccess readme.html wp-content folder for example). This is what's inside of this index.php

<?php
/**
* Front to the WordPress application. This file doesn't do anything, but loads
* wp-blog-header.php which does and tells WordPress to load the theme.
*
* @package WordPress
*/

/**
* Tells WordPress to load the WordPress theme and output it.
*
* @var bool
*/
define('WP_USE_THEMES', true);

/** Loads the WordPress Environment and Template */
require( dirname( __FILE__ ) . '/wp-blog-header.php' );


But are you referring to index.php files that every WordPress theme has? I am confused because according to SEO tutorials, this rule is to redirect the index.php that lives within the theme folder such as within the TwentyFourteen theme.
SEO tutorials says googlebot considers the following URLs as different URLs, therefore can result in duplicate content
and therefore you should use .htaccess to redirect to your desired URL structure

Googlebot considers these as different URLs:
www.example.com
www.example.com/
www.example.com/index.php
example.com
example.com/
example.comindex.php

So is www.example.com/index.php "handled inside the WP envelope" ? Or will I need to keep the RewriteRule above (regarding #Redirect index)

Use the www rewrite to keep WP from needing to process every request.

What do you mean? Are you simply saying to keep the following in my .htaccess

# Domain-name canonicalization redirect
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Don't enclose everything in your htaccess file inside the WP envelope, that is the part that *might* be overwritten with a major update.

Is the following correct? Notice that
RewriteEngine On 
used to belong within the WordPress envelope but I removed it and pasted it before all the rules.

Options -Indexes

<files wp-config.php>
Order allow,deny
Deny from all
</files>

<files error_log>
Order allow,deny
Deny from all
</files>

<files readme.html>
Order allow,deny
Deny from all
</files>

<files license.txt>
Order allow,deny
Deny from all
</files>

RewriteEngine On
RewriteBase /

# Limit Login to my IP
RewriteCond %{THE_REQUEST} /(wp-login\.php|wp-admin/) [NC]
RewriteCond %{REMOTE_ADDR} !^88\.888\.88\.888$
RewriteRule ^(wp-login\.php|wp-admin/) - [F]

# Stop Hotlinking
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)example.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|bmp|zip|rar|mp3|flv|swf|xml|php|png|css|pdf)$ - [F]

# Redirect index.html and index.php
RewriteCond %{THE_REQUEST} ^.*/index\.(html|php) [NC]
RewriteRule ^(.*)index.(html|php)$ http://www.example.com/$1 [R=301,L]

# Domain-name canonicalization redirect
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

leko

11:58 pm on Nov 3, 2014 (gmt 0)

10+ Year Member



Thanks Lucy. Your updated version works now:

# Redirect index.php and index.html
RewriteCond %{THE_REQUEST} [A-Z]{3,9}\ /(([^/]+/)*)index\.(html|php)
RewriteRule index\.(html|php) http://www.example.com/%1 [R=301,L]

lucy24

3:51 am on Nov 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Googlebot considers these as different URLs:
www.example.com
www.example.com/

Yikes. Where do they get this stuff? It's true that
www.example.com

and
example.com

are different URLs, though it's easy to canonicalize. You can even set a preference in WMT if for some reason it's out of your power to redirect. But the trailing slash right after the domain name is supplied by the browser, so there is no possibility of the two forms providing different content.

These three URLs are all different though:
example.com/directory/index.php
example.com/directory/
example.com/directory
In a hand-rolled HTML site this is not a problem, because an index redirect takes care of the first while mod_dir automatically deals with the third. In WordPress (or any CMS) you have to make sure that only #2 OR only #3 is valid, while the other redirects.