Forum Moderators: phranque
Now if I check my backlinks of yahoo explorer I get some with the .html and some without!
If you click on the links or enter the address without the .html extension then it works fine and goes to the same address but I'd like to get the backlinks combined.
So how do I do that with .htaccess I've been trying for the last few hours but been pulling my hair out!
Here is an exmaple to help
www.domain.com/dir/page1 and
www.domain.com/dir/page1.html
What would the code be to connvert www.domain.com/dir/page1 to www.domain.com/dir/page1.html
Thanks
Shaun
RewriteEngine On
RewriteRule photography/wedding-photo1$ [domain.com...]
I also suggest that you start-anchor your RewriteRule pattern.
If you end up with more than a few dozen of these redirects, there are 'generic' solutions available to do what you need to do for any extensionless URL using a few additional RewriteConds.
Jim
See the regular-expressions tutorial cited in our Forum Charter for a lot of useful information. Please don't use mod_rewrite and regular expressions without understanding; As you can see, very small omissions or errors can have serious consequences for your site's ranking and function. Server config code like this is most definitely *not* a copy-and-paste proposition...
Jim
You didn't mention other rules. If you're having problems with other rules, then be aware that rule order is important; You should order your rules with all external redirects (using the [R=30x] flags and/or specifying a full URL starting with "http" or "https") placed first, and ordered from most-specific patterns and conditions (fewest URLs affected) to least-specific, followed by all of your internal rewrites, again in order from most- to least-specific.
Doing this will prevent two problems: It will prevent multiple/chained/stacked redirects resulting from a single client request, and it will prevent having an external redirect 'expose' an internally-rewritten filepath as a URL.
Remember to always use an [L] flag on every rule unless you know why you don't want to, and to always completely-flush (delete) your browser cache before testing any new server-side code.
You want to verify that *any* 'incorrect' URL is redirected straight to the correct URL in one single step: One redirect no matter how many 'problems' the requested URL has... Use a server headers checker to verify this.
Jim
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com [L,R=301]
RewriteRule ^photography/wedding-photographer-surrey$ http://www.example.com/photography/wedding-photographer-surrey.html [R=301,L]
It's just not picking it up, even if I change the redirect URL to a completely different page!
[edited by: jdMorgan at 11:28 pm (utc) on Nov. 2, 2009]
[edit reason] example.com [/edit]
Comment-out all the other RewriteCond and RewriteRule lines, and try something trivial like:
RewriteEngine on
RewriteRule ^foo$ http://www.google.com/ [R=301,L]
If that works, we can address the rules you posted (which are in the wrong order, BTW).
Jim
did what you suggested i.e.
RewriteEngine on
RewriteRule ^foo$ [google.com...] [R=301,L]
and that worked! But can't for the life of me get it do do what i want!
RewriteRule ^photography/wedding-photographer-surrey$ [google.com...] [R=301,L]
RewriteRule ^foo$ [google.com...] [R=301,L]
But if I enter http://www.example.com/photography/wedding-photographer-surrey into browser it doesn't go to google, but www.example.com/foo does...
[edited by: jdMorgan at 11:46 pm (utc) on Nov. 2, 2009]
[edit reason] example.com [/edit]
1.) Exclusions from redirects & rewrites come first.
EG RewriteRule \.(txt¦js¦css¦gif¦jpg)$ - [L]
2.) 'Page Specific' External Redirects go second, because they can contain your canonicalization.
3.) Canonicalization comes third.
4.) Internal Rewrites are fourth.
RewriteRule ^photography/wedding-photographer-surrey$ http://www.example.com/photography/wedding-photographer-surrey.html [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule ^(.*)$ http://www.example.com [L,R=301]
The way your rules were ordered if a person requested:
example.com/photography/wedding-photographer-surrey they would initially be sent to: www.example.com/photography/wedding-photographer-surrey, then to www.example.com/photography/wedding-photographer-surrey.html, which is two redirects in a row, or 'stacked' redirects. A single redirect will pass link weight, where a 'chain' or 'stack' or 'more than one' will not, so by reversing the order and taking care of the www. non-www. issue at the same time, you can continue passing link weight and save some processing time, because there is only one redirect.
I also edited your canonicalization ruleset a bit to redirect anything NOT www.example.com or empty to www.example.com... You may need to leave it the way you had it, but this version (if you wave wildcard domains enabled) will redirect ww.example.com and wwww.example.com to the correct location, so it works a bit better as a 'courtesy' to visitors if you can use it... As long as you don't have other sub-domains set up, it should be fine. If you do have other subdomains, it can still be adjusted to exclude those.
I use a .htaccess in a lot of my directories for internal and external redirecting. Externals first and internals second and it works just fine.
I also handle canonicalization for all urls by placing the rules you have used in your post in my server root directory.
However as it stands this can create a chain of redirects.
eg. mydomain.com/oldpage/
to www.mydomain.com/oldpage/
to www.mydomain.com/newpage/
I can see if I move the rules to handle canonicalization from the root to each individual folder this chain will be eliminated.
But this will then cause any urls in the root not be redirected.
I can think of two possible solutions.
One which I sure will work would be to code the rules in the root to be more specific and only redirect the root itself and urls in the root directory.
Alternatively, and this what I am unsure about, could I leave everything asis and simply remove the [L] flag from the redirect in the root directory ?
If you move your 'directory specific' rules to the root, you can 'skip' through them fairly efficiently and not have to worry about the issue, what I mean is:
# S=NUM is the number of rules
# (not counting conditions) in
# the ruleset for each Directory
# I'll pretend you have 5 rules
# in each directory...
RewriteRule !^Directory1 - [S=5]
# Directory 1 Rules Here
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule !^Directory2 - [S=11]
RewriteRule !^[^/]{10}/SubDir - [S=5]
# Directory2/SubDir Rules Here
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
# Directory2 Rules Here
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
Basically, if you set your file up right (and can count) you can have all your specific rulesets in the root file and if you order them correctly and put some thought into 'finding matches and skipping to rulesets' you can still be very efficient. Here's another example:
# If it's not Directory 1 or 2 Skip 'em all
RewriteRule !^Directory(1¦2) - [S=17]
# If it's not Directory 1 we know it's 2
RewriteRule !^[^.]{9}1 - [S=5]
# Directory 1 Rules Here
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
# We know it's Directory 2, so if it's not the specific sub, skip 5
RewriteRule !^[^/]{10}/SubDir - [S=5]
# Directory2/SubDir Rules Here
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
RewriteRule ^[^/]{10}/[^/]{6}/
# We already know it's 2, so just run these
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
RewriteRule ^[^/]{10}/
Take advantage of the fact that no RewriteConds are processed unless the RewriteRule pattern matches; By making the pattern very specific, a lot of wasted effort can be avoided. Also, attention to the order of the RewriteConds can help; put the RewriteCond most likely to cause the rule to be skipped first. The only exception to this rule of thumb is for RewriteConds that do file-exists checks or reverse-DNS lookups; Because they are horribly CPU-intensive, they should always be last.
I'm not sure what the difficulty being addressed with this solution is. A simpler solution is to put all redirects in the root .htaccess file, leaving only internal rewrites in subdirectory .htaccess files if needed.
Jim
jdMorgan suggests putting the exclusions in the condition:
RewriteCont %{REQUEST_URI} ^[^/]{10}/stuff-to-match
RewriteRule ^Directory1
RewriteCont %{REQUEST_URI} ^[^/]{10}/[^/]{6}/stuff-to-match
RewriteRule ^Directory2/SubDir/
RewriteCont %{REQUEST_URI} ^[^/]{10}/stuff-to-match
RewriteRule ^Directory2
Here's the main difference I see and where your specific setting comes in to play... (If I'm understanding his post correctly.)
If your directory names are 'essentially the same' or if you are matching multiple sub-directories past one or two levels and use 'very specific' (as specific as possible) rules to eliminate errors EG
RewriteCont %{REQUEST_URI} ^[^.]{36}/page-to-match1.html
RewriteCont %{REQUEST_URI} ^[^.]{36}/page-to-match2.html
RewriteCont %{REQUEST_URI} ^[^.]{36}/page-to-match3.html
RewriteRule ^directory/subdir/subsubdir/one-more/ http://www.example.com/some-redirect [R=301,L]
RewriteCont %{REQUEST_URI} ^[^.]{36}/another-to-match1.html
RewriteCont %{REQUEST_URI} ^[^.]{36}/another-to-match2.html
RewriteCont %{REQUEST_URI} ^[^.]{36}/another-to-match3.html
RewriteRule ^directory/subdir/subsubdir/another/ http://www.example.com/some-other-redirect [R=301,L]
RewriteCont %{REQUEST_URI} ^[^.]{35}/third-to-match1.html
RewriteCont %{REQUEST_URI} ^[^.]{35}/third-to-match2.html
RewriteCont %{REQUEST_URI} ^[^.]{35}/third-to-match3.html
RewriteRule ^directory/subdir/subsubdir/a-third/ http://www.example.com/some-thrid-redirect [R=301,L]
# Keep in mind the preceding is more of an example of the rule matching necessary and the conditions could all be a single condition, but I wanted to make sure the code is 'more readable' and the main point I'm making is WRT the matching necessary for the rules.
Above, you match the first 27 characters before the pattern is broken and move to the second possible match, then match 29 characters to get to the 3rd rule, where with mine, you match the minimal amount of characters and I usually try to only match each section of a URL 'specifically' once...
I have .htaccess files I manage that skip 50+ rules, which would have to match at least a portion of the rules skipped if I did not find a way to skip past them with a single match / no match, and even if I switched from rules to conditions I would have a good number of rules to partially match.
I can see jdMorgan's point, where his way is more 'error proof' and not too much less efficient, but personally, I try to use the most efficient way I can, which means on sites I manage (and usually own) I have to pay more attention to what I'm doing than most people.
Like I said, the difference in efficiency is highly dependent on your URL structure, because if the main directories all start with a different character then the pattern is broken in a single character and there's not really a reason to use the less 'management friendly' version I posted, but if you have to match the first 27+ characters before you break the pattern and have to redirect 20+ sub-directories for some reason (rather than the 3 in my example), and have traffic, you might consider something you have to pay more attention to manage, because by combining all the redirects in one file, rather than in the sub-directory .htaccess files you could be adding significantly to the process, depending on your exact situation.
IOW: Here's what I see as the differences in our posts, and you have to evaluate your situation yourself... If it's not a high-traffic site, or you can break the matching patterns easily, or 'a number of other good reasons to do it go here'... Use jdMorgan's way... If you're a bit more high risk, speed and efficiency is absolutely essential, you have super-long URLs you will have to match repeatedly in a number of rules, etc... you might consider mine.
<aside>
One of the cool things about coding and scripting is where there are two people writing code there are usually 4 opinions on how to do it. :)
</aside>
I think in my specific case the safer method suggested by jdMorgan is the way forward.
I have about 10 directories and for most the first character differs with the worst 2 directories having 5 characters that match.
Since the purpose of the majority of my external redirects is to simply remove the file extension I suspect I could add one RewriteCond RewriteRule pair to the .htaccess in the root to effect this change.
I would precede this rule with the more specific re-writes and follow it with rules to handle canonicalization.
I can then leave the slightly more complicated internal re-writes in their own directory specific .htaccess files.
Would I be right in assuming that the internal re-directs would always be processed last and I would never be at risk of exposing any of my internal filepaths ?
I should also add that I only use .htaccess files whilst I am developing as I find it easier to make changes. In my production environment I put all these directives in the httpd.conf file. Does this negate any of the performance overhead of having directory specific rules ?
In the examples above, where we're using "directory1" and "directory2", etc., there's a possible red herring, in that it appears that the matching engine would have to match all of "directory" in each case before getting to "1" and deciding that no match was present and that furhter processing was unnecessary. But real directory names may be quite different, and if ordered by "shortest match first," a large gain in performance is possible if the real directory names are more like "able" and "carla" and "charlie" -- Here, a decision can be taken after only one or two characters.
There's also another way to do it, sort of halfway between the methods already discussed: You can use a RewriteCond "lookup table" approach:
RewriteCond $1>replacement-path1 ^old-URL-path1-pattern>(.+)$ [OR]
RewriteCond $1>replacement-path2 ^old-URL-path2-pattern>(.+)$ [OR]
RewriteCond $1>replacement-path3 ^old-URL-path3-pattern>(.+)$
RewriteRule ^directory1/(.+)$ /%1 [L]
RewriteCond $1>Replacement-path1 ^old-URL-path1-pattern-prefix[^>]+>(.+)$ [OR] And of course, if you've got server config-level access, a RewriteMap would be even better.
However, I agree that each Webmaster must evaluate all possible techniques and decide for him/herself which is best in any given circumstance. Mod_rewrite code isn't something to be written slap-dash, installed and forgotten; A lot of thought and analysis should go into it first, and maximizing efficiency is a major consideration. I just intended to point out that maintainability is another such consideration.
Jim
@ mark_roach
Safer is *usually* better, so go with what works best for you and your situation, erring on the side of caution, unless you really know what you are doing and have reason to do otherwise.
In the examples above, where we're using "directory1" and "directory2", etc., there's a possible red herring, in that it appears that the matching engine would have to match all of "directory" in each case before getting to "1" and deciding that no match was present and that furhter processing was unnecessary. But real directory names may be quite different, and if ordered by "shortest match first," a large gain in performance is possible if the real directory names are more like "able" and "carla" and "charlie" -- Here, a decision can be taken after only one or two characters.
Definitely, and where performance is only minimally impacted, then I definitely think easier to manage is better, because the performance impact will probably not be noticeable to the end user and should not impact the server to a great extent either.
One of the cool things about coding and scripting is where there are two people writing code there are usually 4 opinions on how to do it. :)
See what I mean... There are many different ways to arrive at the same result, and which you choose is really up to you and your situation. I could probably come up with a couple more if necessary, and I'm sure jdMorgan could.
The server starts at the root .htaccess file, processes the rewriterules in order, then goes to the next-lower subdirectory in the path to the requested file, and processes those rules in order, and continues this until it runs out of subdirectories "above" the requested file to look at.
So make sure that when seen from that viewpoint, there are no internal rewrites preceding any external redirects. As long as that is the case, then internal filepath exposure is not a concern.
Jim