homepage Welcome to WebmasterWorld Guest from 54.166.122.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Need help with htaccess file
301 redirect htaccess file and trailing slash
LedZep



 
Msg#: 4574525 posted 9:50 pm on May 15, 2013 (gmt 0)

Hi,
I am new here, I really need help I am creating a new site with opencart version 1.5.5.1 & I have tried creating my own htaccess file but it is not doing what I want it to do. I have tried a lot of the code from this site and my host sd, opencart is not built for doing this, which I don't believe. But they could be right cause its not working for me.
I have a good seo module SEO Pack Pro which is installed but I don't think it is preventing the changes in the htaccess file.

Main problem is to eliminate duplication of the urls with 301 redirects and get a header code of 200 for the www & 301 for the non www . I am pretty sure that's how it is suppose to be

So what I need is to start from scratch to make a new htaccess file:

So again, redirect all non www urls to www urls not just the home page with 301 redirects & I want to make sure there is a trailing slash after all the www urls. I don't need a slash after non www urls cause being redirected ?
Also need home page redirected from index.php & index.htm, index.html to http://www.example.com/

Also a trailing slash after each directory & sub directories, I think my site goes down 3 levels of sub directories and right now I don't care if there is a trailing slash after the file name and removing the file extension or no slash after the file name - I have read the arguments on both sides of that and I think it doesn't matter for the file name to have a trailing slash. So sorry for the book here, if anyone can help me out with this ?
Thanks,
Nicole

 

Dideved



 
Msg#: 4574525 posted 10:19 pm on May 15, 2013 (gmt 0)

I believe the Apache docs have examples for some of these exact scenarios.

https://httpd.apache.org/docs/2.4/rewrite/remapping.html

LedZep



 
Msg#: 4574525 posted 10:28 pm on May 15, 2013 (gmt 0)

Hi,
I have been on that site & I can't find what I need to do unless I missed it & don't know if I am grabbing the right code or not ?
Thanks,
Nicole

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4574525 posted 11:25 pm on May 15, 2013 (gmt 0)

welcome to WebmasterWorld, Nicole!


why don't you start with the code you tried and describe what response you get vs what you were expecting for each test request you have tried.

make sure you read this before posting your mod_rewrite directives.
IMPORTANT: Please Use Example.com For Domain Names in Posts Apache Web Server forum at WebmasterWorld:
http://www.webmasterworld.com/apache/4452736.htm [webmasterworld.com]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4574525 posted 12:57 am on May 16, 2013 (gmt 0)

If you're using a CMS that comes with its own htaccess, you have to be Very Careful when making your own additions:

--rules you add have to be in the right place ("Gone" here, 301 redirects over there, and so on) and in the right order
--new rules can't conflict with existing rules, either by preventing a rule from working or by doing extra stuff to a request that was supposed to be finished

You can start with some basic assumptions.

One is that your CMS relies heavily on rewrites (flag [L] by itself). So any redirects you add have to come before these. But in the case of a with/without www redirect, it should go after any existing redirects. You have to use mod_rewrite, because it's already in use; don't mix in mod_alias ("Redirect" by that name).

Second basic assumption is that your pre-installed htaccess is cluttered with unnecessary <If Mod... envelopes and clunkily worded rules that put the server to a lot of extra work. So if you're starting to edit it, this is a good chance to clean up the existing parts and end up with a lean mean htaccess that will work perfectly for your own site, rather than working borderline-adequately for all sites.

For the specific case of with/without www, there is also a copout solution. Your host probably has an option for preferred domain name format. This will result in some unneeded redirects, as it all happens before the request reaches your own htaccess. But in the short term it's one less thing to worry about.

Oh yes and... Unless you know for a fact that you're on Apache 2.4-- some hosts won't say-- use the information for 2.2. The update added lots of new features, but there were no significant deletions. So you can be tolerably sure any rule you make will work either way.

LedZep



 
Msg#: 4574525 posted 5:00 am on May 16, 2013 (gmt 0)

Hi phranque,
Here is my htaccess code below & this site is actually using Opencart 1.5.1.1 & I am making a new site using 1.5.5.1 & want to use the same code for that one also.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} [Bb]aiduspider
RewriteRule .* - [R=403,L]
</IfModule>

Options +FollowSymlinks

# Prevent Directoy listing
Options -Indexes

# Prevent Direct Access to files
<FilesMatch "(?i)((\.tpl|\.ini|\.log|(?<!robots)\.txt))">
Order deny,allow
# Deny from all
</FilesMatch>

# SEO URL Settings
RewriteEngine On
# If your opencart installation does not run on the main web folder make sure you folder it does run in ie. / becomes /shop/

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]


RewriteBase /
RewriteRule ^sitemap.xml$ index.php?route=feed/google_sitemap [L]
RewriteRule ^googlebase.xml$ index.php?route=feed/google_base [L]
RewriteRule ^download/(.*) /index.php?route=error/not_found [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]


# Respond to /include/ with 404 instead of 403
RedirectMatch 404 ^/include(/?|/.*)$

# 1. If your cart only allows you to add one item at a time, it is possible register_globals is on. This may work to disable it:
# php_flag register_globals off

# 2. If your cart has magic quotes enabled, This may work to disable it:
# php_flag magic_quotes_gpc Off

# 3. Set max upload file size. Most hosts will limit this and not allow it to be overridden but you can try
# php_value upload_max_filesize 999M

# 4. set max post size. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value post_max_size 999M

# 5. set max time script can take. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_execution_time 200

# 6. set max time for input to be recieved. Uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_input_time 200

# 7. disable open_basedir limitations
# php_admin_value open_basedir none

===========================================

So on my computer for my site using the code above, I do not see any trailing slash on http://www.example.com only on my phone running safari I can see the trailing slash.

And I also want the trailing slash after the directory & I don't see that at all, not even on my phone.

so the directory or category url should look like this http://example.com/directory/
so if you type in or click on a category you get this without the trailing slash http://example.com/directory

When I put the above url in with the directory I get this error message from my online duplication tool:

WWW/NonWWW Header Check: FAILED
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing sitewide duplicate content penalties

Also I use this header link tool for header status & if I put this url in:

This is weird, when I type this in to the online tool http://www.example.com/directory I am getting a 404 error code & now I am getting that on my site also but if you click on that same directory on the home page it brings you to the correct page with no errors ?

http://www.example.com/directory I get a 200 response code which is good
HTTP/1.1 301 Moved Permanently

So using this header response tool if I type http://example.com/directory I get a 301 header code which is good but I need a 200 header code for http://www.example.com/directory

For http://www.example.com/index.php that redirects correctly to http://www.example.com but no slash
and no re-direction for index.htm & index.html on www & non www url.\

So I am using this onine duplication test tool and when I run:
http://www.example.com/index.php I get this error:
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing sitewide duplicate content penalties

Next is when I click on a product on my home page I think this is good it goes to the prdt page but does not put the directory in the url, so it looks like this:
http://www.example.com/product1 is that the way it should be ?

Using my duplication tool I get this error message for http://www.example.com/product1

Potential Duplicate Content Issues
WWW/NonWWW Header Check: FAILED
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing sitewide duplicate content penalties

But when I add this http://www.example.com/product1 to check my header response I get this: Status: HTTP/1.1 200 OK

and if I use this http://example.com/product1 I get this message Status: HTTP/1.1 301 Moved Permanently

so that is good, getting these header codes 200 & 301 for www & non url but again above the duplication tool for the prdt url gives me:
Potential Duplicate Content Issues
WWW/NonWWW Header Check: FAILED
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing sitewide duplicate content penalties

Sorry if I am writing a book here, this is getting me more confused so now I want to find this product using my categories on home page to drill down to find it, so here is the url to get to that same product:
http://www.example.com/directory1/directory2/product1

So according to what I read google treats this as duplicate content cause different urls but same product

Using duplication tool:
WWW/NonWWW Header Check: FAILED
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing site wide duplicate content penalties

Next if you type in duplication tool: http://example.com/product1 I get:
WWW/NonWWW Header Check: FAILED
Your site is not returning a 301 redirect from www to non-www or vice versa. This means that Google may cache both versions of your site, causing site wide duplicate content penalties

and another error even though it says Success on the page below there is a huge red exclamation mark meaning failed.
http://example.com/product1
PageRank Dispersion Check: SUCCESS
You have a different pageranks ( vs 0}html,code{font) for the non-www and www version of your domains.

but I don't get an error for PageRank Dispersion if I type http://www.example.com/product1

So not to go through everything again, same errors when going to same product through brands or any different url to get to same product. So google I think is coming up with all duplicates. There is no error message in google webmaster tools but one of our sites that I am restarting all over again is 14 years old, it was always on first page of google results till recently & was nowhere to be found. I am assuming it was killed by pandas & penguins!

We do not want to make the same seo mistakes this time, so we really want to get this right so we can keep our jobs!

So new problem, maybe for another post, so we deleted that site, software was old & created new site it had like 16,000 prdt, alot were discontinued & or out of stock so we decided to start over, we only have a few hundred prdts in now but google still has indexed 16,000 prdts and they are gone now, so how do we let google know they are gone cause I am sure that is hurting us too on the new site ? We can't do a disallow in the robots.txt because we don't have the urls anymore, any ideas what to do with that ?

So if anybody can help me get this htaccess code right the way I need it, itwould be very much appreciated.
Thanks,
Nicole

LedZep



 
Msg#: 4574525 posted 5:09 am on May 16, 2013 (gmt 0)

Hi Lucy24 thanks for your info, I am not really familiar with htaccess code till the last few days, I am hoping that someone can help me figure out the code I need to make it work. I posted my current htaccess code just now. I went to my hosts site & they have posted on their site they are using apache 2.2.22
Thanks,
Nicole

LedZep



 
Msg#: 4574525 posted 6:16 am on May 16, 2013 (gmt 0)

I just did another seo test online & this says:
Trailing slashes check failed
Extra URL trailing slashes "/" are not removed

I don't see any ?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4574525 posted 8:17 am on May 16, 2013 (gmt 0)

Oh, lordy. I think you're trying to do too much at once and you will only make yourself miserable. Start by studying your existing htaccess. The one you had before you started tinkering with it. Be sure you understand everything that's already there. And then you can start fine-tuning it. Make sure the existing rules are exactly right before you start adding more.



RedirectMatch

This has to go away. Any rule that begins Redirect or RedirectMatch MUST be recast as a RewriteRule or things will happen in the wrong order.


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} [Bb]aiduspider
RewriteRule .* - [R=403,L]
</IfModule>

<snip>

RewriteEngine On

<snip>

RewriteEngine On

:: wanders off sobbing brokenly ::

Yes, I think the RewriteEngine is now On. All those RewriteRules need to be gathered into one place and arranged in the right order. Then you can hammer them all into the right form. First step is to ditch the "If Module..." envelope. Not its contents, just the envelope itself. You either have mod_rewrite or you don't-- and if you don't, you need to change hosts yesterday. Besides, your CMS wouldn't work without mod_rewrite.


Options +FollowSymlinks
<snip>
Options -Indexes

Collect all your Options into a single line. Put them near the top of your htaccess where you can keep an eye on them. Other things that go near the top are one-liners such as ErrorDocument statements and generic AddType or Expires lines. This has nothing to do with Apache execution; it's for your own sanity.


Extra URL trailing slashes "/"

Slow down. You can't go wantonly chopping off slashes-- or adding them on-- and not expect consequences.

If the URL represents a real, physical directory, it will pick up the slash without you having to do anything about it. You just need to make sure it doesn't also display "index.php" or whatever the index filename is.

If your URL does not represent a real, physical directory, then you need two separate rules. First a redirect to grab your users by the scruff of the neck and force them to use the URL format you want. And then a rewrite to fetch the content from wherever it really lives. But not yet.


<FilesMatch "(?i)((\.tpl|\.ini|\.log|(?<!robots)\.txt))">
Order deny,allow
# Deny from all
</FilesMatch>

This is way too complicated and probably not even necessary. I seriously doubt your logs are slopping around loose in the same directory as your site; far more likely they're aliased to a completely different part of the server where robots could never get to them. What you do need is a simple
<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

to override any Deny directives you've got lying around.


RewriteCond %{HTTP_USER_AGENT} [Bb]aiduspider
RewriteRule .* - [R=403,L]

An admirable sentiment :) But you don't need mod_rewrite for this. The easiest way to do simple User-Agent blocks is to run up a list in mod_setenvif like this:
BrowserMatchNoCase BaiduSpider get_lost
BrowserMatch Slurp get_lost
BrowserMatch AppEngine get_lost

et cetera. And then at the top of your Deny directives, before you start listing unwanted IPs, say
Deny from env=get_lost

If you do need mod_rewrite to lock someone out, the rule ends in a simple [F] flag.

LedZep



 
Msg#: 4574525 posted 3:07 am on May 17, 2013 (gmt 0)

Hi lucy24 thanks for that info, I will make those changes to cleanup the code. Then I will test it.
Thanks
Nicole

LedZep



 
Msg#: 4574525 posted 3:38 am on May 17, 2013 (gmt 0)

Hi,
I did what you sd, but still the same things as above, do you happen to know the code I need to do or tweak to get what I need above ?
Thanks
Nicole

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4574525 posted 5:03 am on May 17, 2013 (gmt 0)

post the cleaned up code.

LedZep



 
Msg#: 4574525 posted 4:27 pm on May 17, 2013 (gmt 0)

Hi phranque,

Here is current code, I added a few things myself & I have to add more for the whole site to be compressed & for all the pages to be cached in peoples browsers. Let me know if I missed stuff from lucy24 code & anything new to getwhat I need done as far as my original problems, Thank you so much for you guys looking at this code to help me.
Nicole

New htaccess code:

I have to redo the code just saw a mistake

LedZep



 
Msg#: 4574525 posted 4:31 pm on May 17, 2013 (gmt 0)

Hi,
Here is the new htaccess code

Options +FollowSymlinks

# Prevent Directoy listing
Options -Indexes

<IfModule mod_rewrite.c>
RewriteEngine On
# Add IP Canonicalization
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_USER_AGENT} [Bb]aiduspider
RewriteRule .* - [R=403,L]

</IfModule>

# Gzip To compress much of what leaves the server
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript text/x-javascript text/php

# Set Expires Headers
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
Header set Expires "Wed, 15 Jan 2059 20:00:00 GMT"
</FilesMatch>

# 480 weeks
<filesMatch ".(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
Header set Cache-Control "max-age=290304000, public"
</FilesMatch>

# 2 DAYS
<filesMatch ".(xml|txt)$">
Header set Cache-Control "max-age=172800, public, must-revalidate"
</FilesMatch>

# 2 HOURS
<filesMatch ".(html|htm)$">
Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>

# 1.To use URL Alias you need to be running apache with mod_rewrite enabled.

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]


RewriteBase /
RewriteRule ^sitemap.xml$ index.php?route=feed/google_sitemap [L]
RewriteRule ^googlebase.xml$ index.php?route=feed/google_base [L]
RewriteRule ^download/(.*) /index.php?route=error/not_found [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]


# Respond to /include/ with 404 instead of 403
# RedirectMatch 404 ^/include(/?|/.*)$


Thanks
Nicole

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4574525 posted 8:14 pm on May 17, 2013 (gmt 0)

Uh, hate to say this but I don't think you've changed anything :( Did you accidentally paste in the old file instead of the new one? Been there. Done that.

There are still multiple occurrences of "RewriteEngine On". There is still a rule using RedirectMatch. (Yes, it's commented-out, but it shouldn't exist at all.) There is still an <IfModule... envelope. There are still at least two separate "Options" lines. The rule in [F] (currently expressed as "R=403,L") still comes after the domain-name canonicalization redirect.

Et cetera :(

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved