Forum Moderators: phranque
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteRule ^oldFolder\/stupid-long-file-name\.html$ "http\:\/\/example\.com\/new\-folder1\/stupid-long-file-name\.html" [R=301,L]
# Google Analytics Integration - Added by cPanel.
<IfModule mod_substitute.c>
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|(<script type='text/javascript' src='/google_analytics_auto.js'></script>)?</head>|<script src='/google_analytics_auto.js'></script></head>|i"
</IfModule>
# END Google Analytics Integration
RewriteCond %{HTTP_HOST} ^example\.com$ [OR]Is there an htaccess tutorial somewhere out there that puts this into every rule's boilerplate? Unless you have multiple sites sharing the same htaccess, you never need conditions of this type. And even if you do have a shared-htaccess setup, you wouldn't use this wording.
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$What are these Conditions for? The body of the rule has already named the specific URL that the rule applies to.
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteRule ^oldFolder\/stupid-long-file-name\.html$ "http\:\/\/example\.com\/new\-folder1\/stupid-long-file-name\.html" [R=301,L]Yup, there's a tutorial out there somewhere. Can we all go around saying nasty things about it so it goes out of business?
# Google Analytics Integration - Added by cPanel.Do you in fact use GA? <IfModule> envelopes are my particular bugaboo, because right away you know it's a generic rule put in by someone else. Either you've got the mod or you haven't. (My favorite is the WordPress boilerplate that looks for mod_rewrite. If you didn't have mod_rewrite, you wouldn't be able to run WP on an Apache system in the first place.)
Yup, there's a tutorial out there somewhere. Can we all go around saying nasty things about it so it goes out of business?
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
What are these Conditions for?
[edited by: whitespace at 10:28 pm (utc) on Sep 20, 2017]
...some things in the public HTML I'm not sure if I've put them there or a search engine when trying to index my site and it's messy.
or if I've placed them there for google, bing, etc to verify owning the site
[edited by: ztaco at 1:16 am (utc) on Sep 21, 2017]
Cleaning up an htaccess file
Step 1: Organize. Collect all the directives for each module in one place. The server doesn't care, but you-- and anyone who comes along after you-- will appreciate it.
Tip: Use a text editor with a "Find All" window to pull up all lines beginning with the element "Rewrite..." That takes care of mod_rewrite; dump them all at the end for now.
Step 2: Get rid of all <IfModule> envelopes. Not their contents, just the envelopes themselves. These envelopes are hallmarks of mass-produced htaccess files that have to work anywhere, on any server. You are now on your own site. Any given mod is either available to you or it isn't.
Exception: If you use a standard CMS such as WordPress, your htaccess file will contain a group of lines beginning and ending with #comments saying something like "begin WordPress" and "end WordPress". Leave everything in this package unchanged unless you know what you are doing.
Step 3: Sort by module. The server doesn't care what order the directives are listed in, or even if rules from different modules are all garbled together. Each module works separately, seeing only its own directives. But humans need to be able to find things.
For most people it will be most practical to group one-liners at the beginning:Options -Indexes
is a good start. If your htaccess file contains only one line, that's probably it. Other quick directives are ones starting with words like AddCharset or Expires. Then list your error documents.
If you have any very short Files or FilesMatch envelopes, put them near the top too. For example:<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>
<FilesMatch "\.(css|js)">
Header set X-Robots-Tag "noindex"
</Files>
Be sure to have an "Allow from all" envelope for your custom 403 page. If you are on shared hosting and they provide default error-document names such as "forbidden.html", this has probably already been done in the config file. But it does no harm to repeat it.
Step 4: Consolidate redirects.
Step 4a: Get rid of mod_alias. If your htaccess file contains any mod_rewrite directives, it can't use mod_alias (Redirect... by that name), or things may happen in the wrong order. For large-scale updating, use these Regular Expressions, changing \1 to $1 if that's what your text editor uses. Each of these can safely be run as an unsupervised global replace.
# change . to \. in pattern
^(Redirect \d\d\d \S+?[^\\])\.
TO
\1\\.
# now change Redirect to Rewrite
^Redirect(?:Match)? 301 /(.+)
TO
RewriteRule \1 [R=301,L]
# and if needed
^Redirect(?:Match)? 410 /(.+)
TO
RewriteRule \1 - [G]
^Redirect(?:Match)? 403 /(.+)
TO
RewriteRule \1 - [F]
Step 4b: Sort your RewriteRules. At the beginning is the single lineRewriteEngine on
A RewriteBase is almost never needed; get rid of any lines that mention it. Instead, make sure every target begins with either protocol-plus-domain or a slash / for the root.
Sort RewriteRules twice.
First group them by severity. Access-control rules (flag [F]) go first. Then any 410s (flag [G]). Not all sites will have these. Then external redirects (flag [R=301,L] unless there is a specific reason to say something different). Then simple rewrite (flag [L] alone). Finally, there may be a few rules without [L] flag, such as cookies or environmental variables.
Function overrides flag. If your redirects are so complicated that they've been exiled to a separate .php file, the RewriteRule will have only an [L] flag. But group it with the external redirects. If certain users are forcibly redirected to an "I don't like your face" page, the RewriteRule will have an R flag. But group it with the access-control [F] rules.
Then, within each functional group, list rules from most specific to most general. In most htaccess files, the second-to-last external redirect will take care of "index.html" requests. The very last one will fix the domain name, such as with/without www.
Leave a blank line after each RewriteRule, and put a# comment
before each ruleset (Rule plus any preceding Conditions). A group of closely related rulesets can share an explanation.
Step 5: Notes on error documents.
Reminder: ErrorDocument directives must not include a domain name, or else everything will turn into a 302 redirect. Start each one with a / representing the root.
Caution: Since each module is an island, any module that can issue a 403 must have its own error-document override. "Allow from all" in a <Files> envelope covers mod_authzzzz. If you have RewriteRules that end in [F], make sure your 403 documents can bypass these rules:RewriteRule ^forbidden\.html - [L]
This rule must go before any rules with the [F] flag.
I have 3 domains pointing to the same DNS.
Not sure if I can post the site if you wanted to look at it and point out any horribleness,
RewriteCond %{HTTP_HOST} (parked1|parked2)
RewriteRule ^ - [L]which essentially means "if the request is for either of my parked domains, don't apply any RewriteRules and just move on to the next mod". This would go at the beginning of the mod_rewrite section of your htaccess. I changed the parked domains to be redirected to the original siteThis seems contradictory. Is it parked, meaning you're not doing anything with it right now, or is it more like a typo domain where all requests for "exmaple.com" redirect to "example.com"? Typo domains are the easiest to code for, because sooner or later everything redirects to the right spelling. You don't even need any extra rules.
Boss=wife,All right then, Boss says use suchandsuch name, that's the name you'll be using ;)
WifeHappy=happyLife
I haven't changed host names unless I'm confused, same host since beginning, same domain since beginning.OK, add “host” to the words that have more than one meaning. (Another one I sometimes have problems with is “directory”.)
I do have a robots.text telling them not to index these folders, or I believe that's what I did.You may have a robots.txt telling them not to crawl those folders. In the case of stylesheets and scripts, material that isn't crawled won't be indexed, simply because search engines don't know what's there. (Nobody is going to link to someone else's css, which is the way un-crawled content gets indexed.) BUT if you’ve been paying attention to the major search engines, you know that they really want to be able to see your scripts and stylesheets, since that's the main way they assess mobile-friendliness. So a better approach is to allow them to crawl, and then block indexing. I think I even have this in my htaccess-fixing boilerplate:
<FilesMatch "\.(css|js)">
Header set X-Robots-Tag "noindex"
</Files>Put this in the main htaccess and it will trickle down to all files, no matter where they live. Headers beginning in X- are officially non-standard, additional or proprietary, but in the specific case of X-Robots-Tag, the major search engines know what it means. CONTACT-USDo you mean that “contact-us” and so on is the name of a directory that contains only the contact.html file? Or do you mean that “contact-us” is the linking text?
contact.htML
ABOUT-US
about.html
OUR-SERVICES
services.html
And how to stop the other domains from pointing to the site and still use them for email.I think I said earlier--if not in the present thread, then in an adjoining one--that you do not need to worry about email. Your htaccess file is only concerned with HTTP requests; mail is completely unaffected.
RewriteRule ^contact/(index\.html)?$ https://www.example.com/contact/Contact.html [R=301,L]and so on, one for each directory that behaves this way. I really want to stop the other domains from going anywhere, and using they purely for email.