Forum Moderators: phranque

Message Too Old, No Replies

Complicated 301 redirections involving URI substitution

Want to 301 print versions, and selectively convert underscores to hyphens

         

stevej444

2:40 pm on Mar 9, 2007 (gmt 0)

10+ Year Member



Hi,
Wonder if anyone can help.I've got two requirements:

1. Any locations (URI's) that end in _print.htm e.g. http://www.example.com/my-fat-car_print.htm (print versions) to be 301'd back to http://www.example.com/my-fat-car.htm i.e. the _print bit is removed. I've tried and tried and tried, but cannot get this to happen. The URI's are in some cases four levels deep.

2. Convert all underscores to hyphens in URI's *except* the hyphen in 'special_words' if that appears in the URI e.g.

http://www.example.com/green_sauce/special_words/my-fat-car_print.htm ends up taking the user to:
http://www.example.com/green-sauce/special_words/my-fat-car.htm - so _print is dropped, underscores are altered apart from in special_words.

Again - converting the underscores could happen at upto four levels deep.

Anyone any suggestions?

Thanks,
Ste

[edited by: jdMorgan at 3:37 pm (utc) on Mar. 9, 2007]
[edit reason] Example.com [/edit]

jdMorgan

3:36 pm on Mar 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Requirement one is simple, while requirement two may be difficult depending on how many "special_words" you have... 25 exceptions might be acceptable performance-wise, but 100 would be pushing it on a busy site.

Please post whatever code you tried, along with how you tested it, the results you got (more than just "didn't work," please) and how those results differed from your expectations. This will serve as a basis for discussion, and save us a lot of guesswork and writing about things you already know, etc.

Also, are you coding for .htaccess or for a server config-level file such as httpd.conf or conf.d?

Jim

stevej444

6:45 pm on Mar 9, 2007 (gmt 0)

10+ Year Member



Hi JD,
Thanks for the reply.

RE: some of your questions:
1.This is all .htaccess
2. RE: print - I couldn't actually work out the syntax to "select" part of the URL i.e. the part excluding _print and attach .htm on the end. So I'm glad to hear that that would be simple.
3. RE: converting underscores to hyphens. There is actually just one "special_phrase" to not convert the underscores to hyphens in. To try to work towards that, I have been just trying to get all underscores converted to hyphens using following code inspired by the posts at [webmasterworld.com...] i.e.

RewriteRule!\.(html¦php)$ - [S=7]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4 [E=underscores:Yes]
RewriteRule ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3 [E=underscores:Yes]
RewriteRule ^([^_]*)_(.*)$ $1-$2 [E=underscores:Yes]
RewriteCond %{ENV:underscores} ^Yes$
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]

The above however, doesn't fully convert underscores to hyphens. For
a URL such as [mydomain.com...] it outputs www.mydomain.com/fred-book/book.htm/book.htm

For a URL such www.mydomain.com/fred-book/book_toot.htm as you get

"You don't have permission to access /fred-book/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-toot.htm/book-too etc ....."

So - that's where I'm at. Nowhere on the removal of "_print", some attempts at replacing all underscores with hyphens, not attempting to add the conditional aspect to that to convert underscores with hyphens apart from the in the one phrase "special_phrase".

Many thanks for any help,
Ste

jdMorgan

10:01 pm on Mar 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first problem is easily addressed:

> 1. Any locations (URI's) that end in _print.htm e.g. http://www.example.com/my-fat-car_print.htm (print versions) to be 301'd back to http://www.example.com/my-fat-car.htm


RewriteRule ^(.+)_print\.htm$ http://www.example.com/$1.htm [R=301,L]

The fact that the URLs may be several directory levels deep is irrelevant if this code is located above those directories (i.e. in your top-level .htaccess file).

The second problem I'm going to have to answer later, as I'm pressed for time. But this is caused by a bug in Apache mod_rewrite that was supposed to be fixed in Apache 2.x, but evidently, was not. See this thread [webmasterworld.com] in our forum library for more info on the bug, and *one* of the work-arounds. I don't know that I'd suggest the work-around in that thread for a point solution, though, which is why I need to go look up some of my other, possibly-more-efficient solutions for addressing this as a single problem.

Jim

stevej444

11:23 pm on Mar 9, 2007 (gmt 0)

10+ Year Member



Hi JdMorgan,

You the man. Many thanks - wonderful.
I have found a slightly inefficient way to replace the occurrence of underscores with hyphens:

# RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ /$1-$2-$3-$4-$5 [R=301,L]
# RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ /$1-$2-$3-$4 [R=301,L]
# RewriteRule ^([^_]*)_([^_]*)_(.*)$ /$1-$2-$3 [R=301,L]
# RewriteRule ^([^_]*)_(.*)$ /$1-$2 [R=301,L]

but NOT found something that can do it conditionally i.e. replace all occurences of underscores APART from if the underscore occurs in "special_phrase". Not found that.

If you have any thoughts on that Jd, that would be terrific.
Your help is already terrific.

Thanks, Ste

stevej444

5:52 pm on Mar 11, 2007 (gmt 0)

10+ Year Member



Hi Jim,

Update : using your uber-301-code at [webmasterworld.com...] I've pretty much managed to do all that I need - all ends in one 301.

When it's been put live and all signed off I'll come back here and post it up.

That post is tremendous by the way - extremely helpful.

Many thanks,
Ste

jdMorgan

7:14 pm on Mar 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem of excluding your "special" URL is only efficiently-solvable if the special keyword sequence occurs in a fixed position in the URL. If it does, you could for example, just use a new rule before the rule that would otherwise redirect it:

RewriteRule ^([^_]*)_([^_]*)_special_keyword$ /$1-$2-special_keyword [R=301,L]
# pre-existing rule
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ /$1-$2-$3-$4 [R=301,L]

and then exclude it from the rule that would otherwise redirect it again once the other underscores were replaced by the new rule above:

RewriteCond $1_$2 !^special_keyword$
RewriteRule ^([^_]*)_(.*)$ /$1-$2 [R=301,L]

So the whole mess would be:


RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ /$1-$2-$3-$4-$5 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_special_keyword$ /$1-$2-special_keyword [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ /$1-$2-$3-$4 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_(.*)$ /$1-$2-$3 [R=301,L]
RewriteCond $1_$2 !^special_keyword$
RewriteRule ^([^_]*)_(.*)$ /$1-$2 [R=301,L]

If the special keyword sequence can occur in any position, then you basically have to double the number of rules, and handle the "does have special sequence" and "doesn't have special sequence" cases separately and exclusively.

Jim

stevej444

12:02 pm on Mar 12, 2007 (gmt 0)

10+ Year Member



Hi Jim,

The "special-phrase" does occur in one place, and there's only one instance of it, so it's not too hard.

The main thing - is that your framework - which you described at [webmasterworld.com...] - allowed me to do all that I needed in one 301.

So the example you give above would do the trick, but as you state result in stacked 301's (a whole sequence of 301's). However your (totally amazing) script at [webmasterworld.com...] enabled me to do it all in one 301.

Having said all this, it isn't live yet - so - if it does go live and get sign-off, I'll come back and post it here for all to see.

Jim - thanks again for all your excellent and persistent help.
Ste

jdMorgan

4:22 pm on Mar 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The main point of that "framework" was simply to demonstrate a couple of things: First, that the bug in Apache mod_rewrite [archive.apache.org] does have a work-around (albeit difficult and not very efficient) and second, that using that work-around allows many URL-corrections with a single redirect. This latter is important, because search engines appear to reliably pass ranking credit through only a single redirect -- at least over reasonably short time periods.

Be sure to remove or comment-out anything in there that you don't actually need, as otherwise you may see performance problems on a busy site.

Jim

[edited by: jdMorgan at 4:23 pm (utc) on Mar. 12, 2007]

stevej444

9:09 pm on Mar 12, 2007 (gmt 0)

10+ Year Member



Hi Jim,

I got both those pts from the framework - i.e. the avoid-recursion-workaround, and the power of the all-in-one-301. It's great. And I've also cut out bits that aren't relevant or appropriate e.g. the case conversion.

I've also cut out the following as again not relevant, however, I've got another requirement coming soon that may well need this. But - I don't understand the syntax.

------------------ SNIPPET FROM [webmasterworld.com...] STARTS ------------------------

RewriteCond %{ENV:myURI}<>/locales.html ^/location\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/about/widgets-intl.html ^/about/local-widgets\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/selector/widget-selector.html ^/selector/widgets[^.]+\.xls<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Redirect all pages in old directories to same-named pages in new directories
RewriteCond /new_dir1<>%{ENV:myURI} ^([^<]+)<>/old_dir1(.+)$ [NC,OR]
RewriteCond /new_dir2<>%{ENV:myURI} ^([^<]+)<>/old_dir2(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]
#
# Redirect old filetype to new filetype
RewriteCond %{ENV:myURI}<>.jpg ^(/[^.]+)\.jpeg<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>.php5 ^(/[^.]+)\.php4<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]

------------------ SNIPPET FROM [webmasterworld.com...] ENDS ------------------------

I don't understand the syntax of the first part of the RewriteCond - e.g. /new_dir1<>%{ENV:myURI} . What does that mean? Likewise for "%{ENV:myURI}<>/selector/widget-selector.html "

What's the meaning of the <> - surely not unequals?

I appreciate this is all to redirect old pages from within the "framework" so that other changes can be applied as well as these in one 301.

Any chance you could expand on the syntax above?
Thanks, Ste

jdMorgan

9:26 pm on Mar 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please read the notes in that post -- The "<>" are utterly meaningless, except to demarcate the end of the first variable-value from the beginning of the second so that the regular-expressions patterns on the right can unambiguously identify the boundary and assign the values to the correct back-reference variables -- You could use any character(s) that has/have no special meaning to regex, for example "~", as long as they are unique with respect to values that might actually be found in the variables.

[added] Other than that, the syntax is straightforward; I'm just combining multiple-variable checks in one RewriteCond. As a simple example, you might "glue" the hostname, the URL-path, and the query string together to make a canonical URL by doing this:


RewriteCond %{HTTP_HOST}%{REQUEST_URI}?%{QUERY_STRING}&new_var=foo ^(.*)$
RewriteRule bar %1 [R=301,L]

which is a byzantine way of adding "&new_var=foo" to any URL which contains "bar" and redirecting, but demonstrates the basic concept. In this example, the three variables are all concatenated, and there is no way (and no need) to delimit them for further separate processing. [/added]

Jim

[edited by: jdMorgan at 9:34 pm (utc) on Mar. 12, 2007]

stevej444

9:23 am on Mar 13, 2007 (gmt 0)

10+ Year Member



Jim,

Got it, and tested it. Like it, very elegant and crafty.
Thanks again,

Ste