Welcome to WebmasterWorld Guest from 18.207.132.114

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Strip print part of URL

Use .htaccess to strip old print part of URL and 301 redirect to same filen

     
9:43 pm on Sep 16, 2018 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts:153
votes: 15


I have been searching for hours to try and find a way to perform the following:

I am trying to use .htaccess to strip the "-print" part from the URL, and 301 redirect to the same URL without "-print" The directory levels containing the filenames range from 1 to 5 levels deep, so any code would need to cover any directory level.

E.G.

https://www.example.com/aaaa/blahblah-print.php
to
https://www.example.com/aaaa/blahblah.php

and

https://www.example.com/aaaa/bbbb/blahblah-print.php
to
https://www.example.com/aaaa/bbbb/blahblah.php

Also the same sort of 301 redirect code (or combined into one?) to strip "_printer" from the URL (also needs to cover any directory depth)

https://www.example.com/aaaa/blahblah_printer.php
to
https://www.example.com/aaaa/blahblah.php
10:06 pm on Sept 16, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


What have you tried so far?

I realize this will sound tactless, but the desired redirect seems so simple and straightforward--a single conditionaless RewriteRule, or a single RewriteMatch if that's what you are using--that I have to think there is something you're not telling us.
10:51 pm on Sept 16, 2018 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts:153
votes: 15


"I have to think there is something you're not telling us" - Not at all, just an old man who hasn't a clue about rewrites etc. but still hand-code my own site. I guess we all have areas we excel in - and other areas we know nothing of. Maybe my searches didn't get the wording right to accomplish what I need as I found no examples of the same.... I'm even willing to pay for a solution.
11:10 pm on Sept 16, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


Untested, but something along these lines should work:

RedirectMatch 301 (.*)-print\.php$ https://www.example.com/$1.php

See the Apache documentation for an authorised example.

Put it near the top of your .htaccess (before Rewrite Engine and rules).

...
12:03 am on Sept 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11875
votes: 246


if you are using mod_rewrite anywhere, you should avoid using Redirect(Match) everywhere, so you'll want to use the mod_rewrite equivalent:
RewriteRule ^(.*)[-_]print\.php$ https://www.example.com/$1.php [L,R=301]
12:34 am on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


if you are using mod_rewrite anywhere, you should avoid using Redirect(Match) everywhere

Can you explain why phranque?

I have used RedirectMatch and Redirect (as the first items in .htaccess) for many years without any issues.

And I have a *lot* of mod_rewrite rules below them.

Jim Morgan was the reason I joined WebmasterWorld.

...
1:48 am on Sept 17, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


^(.*)
Bite your tongue ;) (Yes, I do realize that Apache docs use this form constantly. That doesn't mean it's a good idea.)

If the characters - and _ occur nowhere else in the affected URLpaths, you can express the pattern very tidily as
^([^_\-]+)[_-]print(er)?\.php
or
^/([^_-]+)[_\-]print(er)?\.php
meaning “start capturing from the beginning, but stop as soon as you hit a - or _”
redirecting to
https://example.com/\1.php


If the characters - and/or _ do (or might) occur, the pattern has to be
^(.+)[_\-]print(er)?\.php
or
^/(.+)[_-]print(er)?\.php
which is fractionally less efficient, though not it ways you are ever likely to notice.

As noted above, don't mix mod_alias (Redirect or RedirectMatch) with mod_rewrite (anything beginning in Rewrite) because you can't control the execution order, leading to the possibility of chained redirects. Combining both mods will not make your server explode; it just leads to suboptimal behavior.

In the present case, whether you use RedirectMatch or RewriteRule the pattern will be identical except that if you use mod_alias the pattern has to begin with / (directory slash) while in mod_rewrite it has to not begin with a slash. (You did say htaccess, right?) Either way, make sure the pattern starts with ^ (opening anchor) so the whole request is captured.

Arcane technical point: The - hyphen sometimes has meaning inside grouping brackets, so it is safer to \- escape it. This applies only to the [^_\-] construction.

Finally: You don't say how many different URLs are involved. If there's only a handful of them, it is definitely cleaner to name them explicitly in the rule, like
^(onepath|otherpath|thirdpath)[-_] et cetera.

Edit: Since each module is an island, it makes no difference whatsoever if your RewriteRules come before, after, or randomly mixed in with Redirect(Match). Each module executes all at once.
1:50 am on Sept 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11875
votes: 246


When not to use mod_rewrite [httpd.apache.org]:
The use of RewriteRule to perform this task may be appropriate if there are other RewriteRule directives in the same scope. This is because, when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.


the precise reason has changed since then but jim would have made the same recommendation 10 years ago:
[webmasterworld.com...]
1:59 am on Sept 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11875
votes: 246


in your case, assuming you had a hostname canonicalization redirect in place (using mod_rewrite) and https://example.com/aaaa/blahblah-print.php is requested, you will get a chained redirect first to https://www.example.com/aaaa/blahblah-print.php and subsequently to https://www.example.com/aaaa/blahblah.php
3:27 am on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


Thank you phranque.

What Jim actually says in that post is:

do not mix directives from different modules

He was referring to some poor code where Redirects were interspersed with Rewrites in a doomed attempt to "enforce order of execution" of the different modules. That is not the case here.

In the same post he advises putting Redirects before Rewrites in the same file - it is just a more practical way of managing the contents of the file (a few redirects followed by a lot of rewrites, kept separate).

And as the Apache documentation you quoted says:

when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.

So Redirects and Rewrites in the same file is not a problem if you are not attempting to "enforce order of execution".

As for When not to use mod_rewrite [httpd.apache.org ]:

mod_alias provides the Redirect and RedirectMatch directives, which provide a means to redirect one URL to another. This kind of simple redirection of one URL, or a class of URLs, to somewhere else, should be accomplished using these directives rather than RewriteRule.

So my advice to the OP still seems good (and has worked for me on different hosts for over ten years).

Unless:

the precise reason has changed since

Everything about .htaccess needs to be precise, as you know.

So I really need to hear about this new reason, if you can spare the time.

...
4:00 am on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


you will get a chained redirect

I tested your example and got a single 301 in the logs.

The Redirect includes the www and would deal with the canonical issue.

I do have a standard canonical Rewrite as well, if it executed first there is no sign of it.

...
9:19 am on Sept 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11875
votes: 246


So I really need to hear about this new reason, if you can spare the time.

this is now:
when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first

this was then (per jim's description which was accurate at the time):
It is the server configuration which decides whether mod_alias or mod_rewrite directives are processed first, and you cannot control that by ordering the directives in your .htaccess file. In other words, the server will process all mod_alias directives first, followed by all mod_rewrite directives, or vice-versa, depending on how it's configured. So in order to enforce order of execution while using advanced features like "check requested host name" and "check for file exists" as you have done, you'll have to use mod_rewrite only, because mod_alias does not support those features.
9:20 am on Sept 17, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11875
votes: 246


this should actually be addressed in its own thread but...

tested your example and got a single 301 in the logs.

The Redirect includes the www and would deal with the canonical issue.

I do have a standard canonical Rewrite as well, if it executed first there is no sign of it.

what version of apache are you using?
2:04 pm on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


Thank you for your patience phranque.

I have not seen anything to justify this statement:

if you are using mod_rewrite anywhere, you should avoid using Redirect(Match) everywhere

Jim never said so (quite the opposite) and the Apache documentation clearly envisages using both modules in the same file:

when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first

I have had no issues using both mod_rewrite and mod_alias in over ten years on various hosting accounts.

what version of apache are you using?

Originally 1.3, currently 2.4.34 - as it is shared hosting I do not have control over the configuration other than by .htaccess.

...
6:19 pm on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


Having considered it further (and at the risk of being proved wrong) I believe it works like this:

The request in phranque's example is dealt with first by the mod_rewrite canonical directive:

# Canonical & Encryption
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://www.example.com/$1 [R=301,L]

But before the 301 is implemented the mod_alias directive is executed:

# Redirect specific files
RedirectMatch 301 (.*)-print\.php$ https://www.example.com/$1.php

So the end result is a single Redirect with no chain, as shown in my logs.

This is compliant with the Apache documentation stating that:

mod_alias provides the Redirect and RedirectMatch directives, which provide a means to redirect one URL to another. This kind of simple redirection of one URL, or a class of URLs, to somewhere else, should be accomplished using these directives rather than RewriteRule.

and

when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first

while this:

if you are using mod_rewrite anywhere, you should avoid using Redirect(Match) everywhere

seems to be the online equivalent of an urban myth.

Jim Morgan's advice was to keep directives for the two modules separate (and not try to control execution order).

@ No5needinput

Mr Johnny Five, we need input from you about what worked.

...
6:42 pm on Sept 17, 2018 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts: 153
votes: 15


I'm still here and reading with interest! Thanks for the examples and explanations.

This seems to work for the article print pages (one was .html instead of .php in my original post, sorry)

RewriteRule ^(.*)-print\.html$ https://www.example.com/$1.php [L,R=301]
RewriteRule ^(.*)_printer\.php$ https://www.example.com/$1.php [L,R=301]

I am just trying to incorporate ANOTHER redirect I forgot about that would go to the categories, all those print files were in the "categoryprint" directory and went several directory levels deep.

https://www.example.com/categoryprint/aa/bb/xx-print.html to https://www.example.com/aa/bb/

I think the .html rule above may not work the same for categories. I'll try some things myself and get back to you all if I get lost :-)