Forum Moderators: phranque

Message Too Old, No Replies

conflicting rewrites

conflicting url rewrites from trying to remove index.php from homepage

         

kmhelms

10:53 pm on Dec 11, 2013 (gmt 0)

10+ Year Member



Hello. Disclaimer- I'm a complete noob. We launched a new website design back in January and I set up several url rewrites to send our old links to our new pages. Our old urls were query based with "index.php?page=somepage" in all of them.

Here's an example of one:
RewriteEngine On
RewriteCond %{QUERY_STRING} battingCageNets$
RewriteRule ^/index.php /batting-cages/batting-cage-nets.html? [R=301,L]


So these all work just fine, but recently I discovered the (what seems to be common) problem of being able to access our home page from example.com, example.com/index.php, and so on. Our old home page included the index.php, but I'd like our new one to just show the root domain. I set up another rewrite that seemed to do the trick, but then it broke all the other rewrites (like the one above). I'm not certain, but it seemed to be removing the "index.php" from all urls before it would trigger any of the other rewrites, so all the ones I had set up for old links stopped working. How do I get them both to work?

phranque

12:13 am on Dec 12, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, kmhelms!


you'll have to show the other directives in effect since the order is important.

typically you go from most specific to most generic redirects, usually ending with the default directory index document redirect then the hostname canonicalization redirect, and then any internal rewrites in order of decreasing specificity.

any external redirects should include the fully qualified url in the target including protocol and hostname.

IMPORTANT: Please Use example.com For Domain Names in Posts:
http://www.webmasterworld.com/apache/4452736.htm [webmasterworld.com]

please exemplify your keywords as well - abstracting your terms like "blueWidgetStuff" and "/blue-widgets/blue-widget-stuff.html" usually makes your problem and solution accessible to all readers.

kmhelms

12:53 am on Dec 12, 2013 (gmt 0)

10+ Year Member



Thanks for the reply! Apologies on not abstracting my terms. I tried to edit but it says the time has passed. Will do from here on out though.

I must be losing my mind. I tried it again and it suddenly works.... *facepalm* time to go home.

Just in case, here's what I had:


RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^.*/index\.php
RewriteRule ^(.*)index.php$ $1 [R=301,L]

RewriteCond %{QUERY_STRING} page1$
RewriteRule ^/index.php /page1.html? [R=301,L]


This was completing the home page issue, but then broke the page specific one. I changed it to this:

RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{QUERY_STRING} page1$
RewriteRule ^/index.php /page1.html? [R=301,L]

RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)index.php$ http://www.example.com$1

And it seems to work now. I didn't know order mattered. I do see slight differences between the two that I used. To be honest, I just found them online. Would you mind explaining the difference in the following?


RewriteCond %{THE_REQUEST} ^.*/index\.php
RewriteRule ^(.*)index.php$ $1 [R=301,L]

RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)index.php$ http://www.example.com$1 [R=301,L]

[edited by: kmhelms at 1:08 am (utc) on Dec 12, 2013]

phranque

12:58 am on Dec 12, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



always clear cache and/or refresh your browser when testing changes in redirects.

phranque

2:39 am on Dec 12, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



(didn't see your edit earlier)

RewriteCond %{THE_REQUEST} ^.*/index\.php

the backslash escapes the period, so it means a literal period, instead of "any character", which is the special meaning of an unescaped period in a regular expression.

RewriteRule ^(.*)index.php$ http://www.example.com$1 [R=301,L]

by specifying the canonical protocol and hostname in the target you can prevent non-canonical redirects and possible multiple hops to the canonical url.
for example, if your request is for http://example.com/foo/index.php you want to get redirected to http://www.example.com/foo/ in one hop instead of http://example.com/foo/ in the first redirect and then http://www.example.com/foo/ on a subsequent request.

you probably need another slash in that Rewrite Rule unless you are using this in server config context. (as opposed to directory or .htaccess context where the leading slash and perhaps more of the path prefix is removed)

http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that led the server to the current RewriteRule...


RewriteCond %{THE_REQUEST} ^.*/index\.php

there's no use in beginning a regular expression with ^.* unless you are capturing the begin-anchored string.
also note that THE_REQUEST is the HTTP Request sent from the browser/user-agent so it will typically begin with "GET " and end with " HTTP/1.1" or something similar.


RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.example.com/$1 [R=301,L]

this ruleset should go last.
also, see what i wrote above about path prefix removal.
i would change the conditional and, assuming these directives are in .htaccess, correct the RewriteRule to this:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lucy24

7:09 am on Dec 12, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some general stuff about ordering of RewriteRules:

First arrange rules from most severe to least severe. If you've got any RewriteRules involving access control, such as "If it comes from a bing IP but isn't the bingbot, slam the door in its face", put all of those first. No sense in redirecting someone if you're never going to let them in. Then any rules involving 410s. You may not have any at all, but if you've deleted pages and want to say so, put them after lockouts, before redirects. Then any external redirects-- the ones with [R=301,L] flag. Finally any internal rewrites-- the ones with [L] flag alone. Rarely you may have some superfinal rules involving things like cookies or environmental variables with no L-or-equivalent flag at all; those would come at the very end.

Then, within each broad category, arrange rules from most specific to most general. Anything that involves an individual named page goes first. Then any captures fitting some pattern. On most sites, the two last rules in the 301 category are:

second-to-last redirect: remove "index.html" or similar. These conventionally come with a condition looking at %{THE_REQUEST} but in most situations all you need is an [NS] flag to screen out mod_dir activity. The pattern is:
^(([^/.]+/)*)index\.html
replacing "html" with any extension(s) you actually use.

last redirect: domain-name canonicalization. Here the condition should be expressed as a negative:
!^(www\.example\.com)?$
meaning "if the request is anything other than my preferred form or nothing (for http 1.0), redirect to the correct form".

From first post:
RewriteRule ^/index.php /batting-cages/batting-cage-nets.html? [R=301,L]

So these all work just fine

If a rule with leading / slash in the pattern works as intended, it means the rule is lying loose in the config file. If the rule is in htaccess, or in a <Directory> envelope in the config file, leave off the leading / slash.

If you have a test site, set it to no caching at all for html files. (Obviously not practicable on a live site!) This saves having to keep emptying your browser cache every time you change one thing.

kmhelms

6:34 pm on Dec 12, 2013 (gmt 0)

10+ Year Member



Not going to lie, this is over my head. It will take me a while to understand what you guys are talking about, but I appreciate it nonetheless.

Phranque said:
you probably need another slash in that Rewrite Rule unless you are using this in server config context. (as opposed to directory or .htaccess context where the leading slash and perhaps more of the path prefix is removed)

I had another slash at the end before the $1 but then I ended up with www.example.com// so I took it off.

phranque

7:31 pm on Dec 12, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you still haven't mentioned if your directives are in the server config file or in .htaccess.

kmhelms

7:47 pm on Dec 12, 2013 (gmt 0)

10+ Year Member



erm. I think in a config file? We use virtual hosts so I click the one I want and select "edit directives". It's not specifically in anything .htaccess

phranque

8:11 pm on Dec 12, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



do you have a .htaccess file in your document root directory?

kmhelms

8:21 pm on Dec 12, 2013 (gmt 0)

10+ Year Member



yes. should they go in there? We have several websites running on the same server each with their own set of redirects and rewrites. Currently, I click each separate virtual host to see directives for one website. I'd imagine if they go in htaccess i'd have all the redirects for all websites in the same file?

lucy24

10:11 pm on Dec 12, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the long term it will be a very, very good idea to stop using an intermediary (are we talking about some type of control panel provided by the host?) and instead edit the config file or htaccess directly. You can see htaccess files exactly the same way you see everything else in whatever you use for ftp. Because of the leading . (period) they may alphabetize before or after everything else.

Exception: If you have never, ever touched an htaccess file, and you can't see it, you may need to change your ftp program's preferences to "show hidden files" (exact wording will depend on the program).

You will also need to give .htaccess a different name when you save copies to your personal hard drive, because the alternative is to tell your computer to show all invisible files all the time. For most people this is more of a mess than you want to deal with. (I am assuming for the sake of discussion that you're not on linux. Linux users are different from you and me.)

In the case of rule patterns

^blahblah
vs.
^/blahblah

the difference is actually not config file vs. htaccess. It's "directory context" vs. general:
In VirtualHost context, The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html").

In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that lead the server to the current RewriteRule (e.g. "app1/index.html" or "index.html" depending on where the directives are defined).

So if the rules are inside a <Directory...> envelope, they will follow the same syntax as if they were in htaccess. (Because htaccess is in many ways a special kind of <Directory...> section.) You can have <Directory...> sections in a virtual host, but not in htaccess files.

Do you control the entire (physical) server or is it a VPS? (It probably doesn't make a difference, but it's good to know.)