Forum Moderators: phranque

Message Too Old, No Replies

htaccess redirects Coppermine single urls

Migrate old Coppermine urls to new store on same domain

         

aeronautic

10:07 pm on Feb 7, 2015 (gmt 0)

10+ Year Member



Hi - I've read the source Apache docs, the tutorials here and old threads and I guess I'm just too dumb to get it. (That's a cut and paste from the first line of a thread I started in .htaccess in 2006 but it is very true this time too!) :/

I paid especially close attention to this old thread but to no avail:
[webmasterworld.com ]

Situation: Migrating Coppermine images and videos, one at a time (groan), from Coppermine to a different ecommerce tool on the same domain. Only taking on the permanent Coppermine url, not all the variations of album locations, etc.

Goal: As I make progress, add a single, manual 301 redirect to avoid both a duplicate content penalty and, since I can't get this to work so far, a 404 if I simply delete the old items as I go.

Problem: Everything I've tried either fails to redirect at all or causes a server error.

This server is running Wordpress too so WP's boilerplate code is in the .htaccess file along with other historical redirects, that work, for non-Coppermine (static old html) files.

Perhaps adding to the challenge (for my limited brain) is that there are "hyphens" or dashes in the path both on the old uri and new, and I'm not sure if I'm supposed to escape those with a \ or not, and the issue around the .php extension (see below) and the ? that follows, along with the trailing code.

Without further ado...

Old uri:

http://example.com/my-path/displayimage.php?pos=-12345


Desired new uri:

http://example.com/path/morepath/seo-friendly-real-world-thing/


I've tried this:

RedirectMatch permanent ^/my-path/displayimage\.php?pos=-12345$ http://example.com/path/morepath/seo-friendly-real-world-thing/


But it does nothing.

And this with the ? escaped:

RedirectMatch permanent ^/my-path/displayimage\.php\?pos=-12345$ http://example.com/path/morepath/seo-friendly-real-world-thing/


It also does nothing.

Also tried this more complicated approach (and hope I don't have to use something like it on zillions of files):

RewriteCond %{REQUEST_URI} !^/my\-path/displayimage\.php\?pos=-12345$ 
RewriteRule ^/my\-path/displayimage\.php\?pos=-12345$ http://example.com/path/morepath/seo-friendly-real-world-thing/$1 [R=301,L]


Any patient and detailed help would be greatly appreciated.

Thank you!

lucy24

5:53 am on Feb 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any patient and detailed help would be greatly appreciated.

"patient and detailed" sounds so much nicer than "Is she ever going to stop blathering and get to the point?" doesn't it :)

Now then...

add a single, manual 301 redirect to avoid both a duplicate content penalty and, since I can't get this to work so far, a 404 if I simply delete the old items as I go.

If you're talking about things you're doing on purpose, then return a 410. It won't affect indexing, but will make the googlebot stop requesting the URL faster. However, any URL that receives a 301 doesn't need a 404 or 410, because you've already set up a response. Save the 410 for URLs that are genuinely gone, with no new address.

This server is running Wordpress too so WP's boilerplate code is in the .htaccess file along with other historical redirects, that work, for non-Coppermine (static old html) files.

Make sure that everything you add goes before the WordPress section of htaccess-- which should be left completely untouched, including the silly parts. (Notably "<IfModule>" when WP simply won't function without mod_rewrite.)

there are "hyphens" or dashes in the path both on the old uri and new, and I'm not sure if I'm supposed to escape those with a \ or not

Hyphens never need to be escaped, except inside grouping brackets, where they have syntactic meaning. An escaped hyphen isn't actively harmful; it's just needless clutter.

and the issue around the .php extension (see below) and the ? that follows, along with the trailing code.

I see a
RewriteCond %{QUERY_STRING} blahblah

on the horizon. Let's see if I'm right.

Old uri:
http://example.com/my-path/displayimage.php?pos=-12345

Desired new uri:
http://example.com/path/morepath/seo-friendly-real-world-thing/

I've tried this:

RedirectMatch permanent ^/my-path/displayimage\.php?pos=-12345$ http://example.com/path/morepath/seo-friendly-real-world-thing/

But it does nothing.

That's just as well. Do not repeat not use mod_alias (Redirect by that name) for any purpose. It won't make the server explode; it simply doesn't play nicely with mod_rewrite. Besides, it can't deal with things like query strings.

Also tried this more complicated approach (and hope I don't have to use something like it on zillions of files):

RewriteCond %{REQUEST_URI} !^/my\-path/displayimage\.php\?pos=-12345$
RewriteRule ^/my\-path/displayimage\.php\?pos=-12345$ http://example.com/path/morepath/seo-friendly-real-world-thing/$1 [R=301,L]

Well, now we're getting somewhere-- except that %{REQUEST_URI} only applies to the "path" part of a request.

### and ###. I just explained this a few days ago, haven't saved it in my personal boilerplate, and now I can't find the link. (phranque? You out there? Post where I explained that the URL has four pieces-- protocol, host, path, query-- and what the RewriteCond is for each one. Can't find the blasted thing.) I did rather like this recent thread [webmasterworld.com], because the OP is really heroically trying to learn and understand instead of screaming "I give up! Can't you just write my htaccess for me?"


To get from
http://example.com/my-path/displayimage.php?pos=-12345
to
http://example.com/path/morepath/seo-friendly-real-world-thing/
you'd have to say

RewriteCond %{QUERY_STRING} pos=-12345
RewriteRule ^my-path/displayimage\.php http://example.com/path/morepath/seo-friendly-real-world-thing/? [R=301,L]


Note especially that when rules are in an htaccess file (or in a <Directory> section in the config file) the URL doesn't start with a / slash.

... but just how many URLs are involved? Is each of those "pos=blahblah" values going to turn into a completely different URL? Are there 99,999 of them, or possibly 2x99,999 if they might also have positive values? If so, you are probably better off creating a little php script that does all the lookups and issues the appropriate redirect. It would look something like

RewriteCond %{QUERY_STRING} \bpos=([^&]*)
RewriteRule ^my-path/displayimage.php /fixup.php?oldpage=%1 [L]


with this line placed before the WP stuff.


Finally, I notice that your new URLs end in / slash. Those aren't real, physical directories, each with their own index page, are they? In addition to the ordinary URL rewriting-- which is slightly trickier when you've got the final / instead of going extensionless-- you will also need RewriteRules to cover requests for

http://example.com/path/morepath/seo-friendly-real-world-thing (without final slash)
and
http://example.com/path/morepath/seo-friendly-real-world-thing/index.html

You need these rules because search engines will request these two URLs. (I know this from direct personal observation. It is independent of whether the two other forms have ever existed or been used.) And it's more efficient to redirect them yourself instead of relying on the CMS to do it.

[edited by: bill at 1:49 am (utc) on Feb 10, 2015]
[edit reason] added ? [R=301,L] to RewriteRule [/edit]

aeronautic

10:16 pm on Feb 8, 2015 (gmt 0)

10+ Year Member



Lucy: A million thanks! You are a life saver.

I gave your code a try:

RewriteCond %{QUERY_STRING} pos=-12345
RewriteRule ^my-path/displayimage\.php http://example.com/path/morepath/seo-friendly-real-world-thing/


but... it appended the "pos=-12345" to the end of the new url. Maybe I was unclear but that was undesired. So I dug around a bit more and found that if I changed it to:

RewriteCond %{QUERY_STRING} pos=-12345
RewriteRule ^my-path/displayimage\.php http://example.com/path/morepath/seo-friendly-real-world-thing/? [R=301,L]


it removed the unwanted string.

Note the addition of the trailing "?" and [R=301,L] in my change for those needing a similar solution.

Did I do anything dumb here Lucy? It seems to work perfectly.

You are a most excellent teacher and your answer was very clear and helpful. I was trying to teach myself and learn how this all works as it is not my area of professional concentration. I hope my post was not a shrill "help me" yell. :)

As to your other questions/points...

I am migrating over a thousand images from Coppermine to a Wordpress based ecommerce solution and while I've explored importing the database, for various technical and practical reasons that does not seem to make sense. As a result, I'm moving them item by item and want to add the redirects for the image file, permanent item ID (the one in the sample we've been working on) and albums as each move is complete. Therefore a global solution really is not in the cards until everything is moved, and perhaps not even then. The new urls are very different than the old ones.

You did mention, at the end of your very kind post, the issue with the trailing / of the new urls. I did not see a suggested solution but perhaps I misunderstood you. They are, as you guessed, virtual directories in the Wordpress SEO friendly url sense of the term. They are not actual directories with an index.anything in them.

I'd welcome any suggestions you have in that regard or any other.

Again, my sincere thanks!

lucy24

12:42 am on Feb 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but... it appended the "pos=-12345" to the end of the new url.

Whoops, that was careless of me. Yes, if you put a ? at the end of the target, it wipes any existing query string. That's what you want to do. And, er, I seem to have simply forgotten the [R=301,L] flag. Can't think why; I normally type that part on autopilot.

There exist WP options for dealing with URLs with/without trailing slash. If you don't have the energy to add a few more lines to htaccess, make sure the appropriate clickboxes are clicked. Checkboxes are checked. Whatever. There's a WordPress forum next door, but the people who know WP also check Apache periodically, so you may get an answer right here.

As a result, I'm moving them item by item

Well, that sounds restful :) By the usual yawn-provoking coincidence, it is not 24 hours since someone posted a similar question involving a redirect of one-and-only-one value of one specific parameter. So you are not alone.

aeronautic

2:00 am on Feb 9, 2015 (gmt 0)

10+ Year Member



Well I thought all my troubles seemed so far away... ;)

I used and adjusted the code I posted, which built on your code, to redirect the album of the first batch of moved images... which worked.

This:
RewriteCond %{QUERY_STRING} pos=-1234


Became this:
RewriteCond %{QUERY_STRING} album=123


I also changed the path in the rule slightly to reflect a change from displayimage.php to thumbnails.php

All else the same and as I mentioned, seems to be working.

I also tried both a "%{QUERY_STRING}" and "%{REQUEST_FILENAME}" approach to the actual jpg files. I took care to be sure the path was changed and tried it with and without the trailing "?"

Here is what is really strange. It is working for one file but not for others. If my memory is correct, when I tried it for the very first file after getting its PID ("pos=-XXXX") version taken care of, it worked. So I did the next PID, jpg, all good. Then finally, the album. Again, all five seemed to be working. Then I moved on to a new image in a different album. PID and jpg, check. Testing originals, PIDS worked, album worked, but jpgs no longer worked. Fail was to redirect to home page a la 404.

So I removed the new one and tried again. Nope. JPGs no longer worked. Tried switching JPGS to a QUERY_STRING. Still no joy.

I put them all back in just now and same situation. PIDS and album working but only one jpg working.

And it is working with this code:

RewriteCond %{REQUEST_FILENAME} photo_name_012345.JPG
RewriteRule ^my-path/somemore-path/yetmore-path/seo-album-name-yet-more-path/ http://example.com/path/path/path/path/path/file-name-that-seems-too-long-to-even-me_012345.jpg [R=301,L]


Everything is still "above" all the Wordpress code which remains untouched.

Also tried this variant without the trailing / on the path:

RewriteCond %{REQUEST_FILENAME} photo_name_012345.JPG
RewriteRule ^my-path/somemore-path/yetmore-path/seo-album-name-yet-more-path http://example.com/path/path/path/path/path/file-name-that-seems-too-long-to-even-me_012345.jpg [R=301,L]


One other difference: The extension JPG is all capped as shown on the one that is working and it is all lower "jpg" on the two that are not. Coincidence?

just in case...

Ugh.

lucy24

8:02 am on Feb 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't understand what the %{REQUEST_FILENAME} part is for. Aren't these redirects supposed to grab requests before the CMS gets a chance to tamper with things? You certainly don't want to insert an external redirect after things have started happening behind the scenes.

RewriteCond %{REQUEST_FILENAME} photo_name_012345.JPG
RewriteRule ^my-path/somemore-path/yetmore-path/seo-album-name-yet-more-path/ http://example.com/path/path/path/path/path/file-name-that-seems-too-long-to-even-me_012345.jpg [R=301,L]

But, but, but-- if the actual name of the file is photo_name_012345.JPG then why isn't that included in the body of the rule?

RewriteCond %{REQUEST_FILENAME} photo_name_012345.JPG
RewriteRule ^my-path/somemore-path/yetmore-path/seo-album-name-yet-more-path http://example.com/path/path/path/path/path/file-name-that-seems-too-long-to-even-me_012345.jpg [R=301,L]

The final slash doesn't make any difference, since there's no closing anchor. But, again, where's the filename? Do you have a bunch of jpgs, all with the identical name, located in assorted sub-sub-subdirectories deeper down in the original path?

One other difference: The extension JPG is all capped as shown on the one that is working and it is all lower "jpg" on the two that are not. Coincidence?

By default, Apache is case sensitive. There exists an [NC] flag for both Rules and Conditions, but you should think of it as the absolute last resort. Better to learn the correct casing of the URL in question, and use it in the rule.

aeronautic

8:03 am on Feb 9, 2015 (gmt 0)

10+ Year Member



I tried to edit my last post but the time had elapsed...

Hi Lucy - thanks for coming back around to this... I posted within a minute of your reply and never saw it so I'm editing this one to add answers.

I don't understand what the %{REQUEST_FILENAME} part is for.


That makes two of us. ;)

Seriously, I thought using that request followed by a specific file name would result in redirecting that file. I was wrong it seems.

Aren't these redirects supposed to grab requests before the CMS gets a chance to tamper with things?


Yes, for sure!

Do you have a bunch of jpgs, all with the identical name, located in assorted sub-sub-subdirectories deeper down in the original path?


Many jpgs but all with unique names but in many sub-directories deep down. I'm kinda careful about organizing things into groups and sub-groups of subjects, and file names are definitely unique and descriptive.

I'm aware of the [NC] flag which is why I asked. These images were uploaded over many years and the case of the .jpg part varied over time so it is not perfectly consistent I'm afraid.

But, again, where's the filename?


I believe I answer this with the code below that is working.

Original post below:

Okay, this seems to be working for the images:

RewriteCond %{HTTP_HOST} ^.*$
RewriteRule ^my-path/my-path/my-path/final-path/picture_name_012345.JPG$ http://example.com/path/my-path/my-path/my-path/final-path/picture_name_012345.jpg [R=301,L]


Here is where my ignorance will be painfully clear. And sadly, I just searched for this string "RewriteCond %{HTTP_HOST} ^.*$" to see if I could understand it better but could not find results that made any sense to me.

My belief is that condition is allowing all requests to the www or non-www host to redirect that file located on that path to the new url.

Am I close? Doing something stupid or dangerous?

Thanks! I am truly grateful for your help.

[edited by: aeronautic at 8:16 am (utc) on Feb 9, 2015]

lucy24

8:13 am on Feb 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_HOST} ^.*$

That makes no sense. It's just saying "there may or may not be a host" -- and surely the server already knows that much!

Have you got more than one site living on the same server, with all requests passing through the same htaccess file? If so, you'd need a condition something like
RewriteCond %{HTTP_HOST} example\.com

(no anchors)
to exclude requests for unrelated sites like example.org and example.net. BUT if the specified filenames only occur on one host-- which I would hope is the case when your paths are several miles long-- then you don't need the condition anyway.

aeronautic

8:18 am on Feb 9, 2015 (gmt 0)

10+ Year Member



Have you got more than one site living on the same server, with all requests passing through the same htaccess file?


I do feel dumb.

Just one domain/site on the server.

aeronautic

6:20 pm on Feb 9, 2015 (gmt 0)

10+ Year Member



If I understand both your comment/question and yet more reading of the docs and web on this subject, this condition:

RewriteCond %{HTTP_HOST} ^.*$


is completely unnecessary. Is that correct?

I just tested the redirect of one of the jpgs - one that was not working as reported earlier (before this fix seemed to resolve that issue), and without that condition above, it does indeed seem to still work as a rule without condition.

Essentially, just this:

RewriteRule ^my-path/my-path/my-path/final-path/picture_name_012345.JPG$ http://example.com/path/my-path/my-path/my-path/final-path/picture_name_012345.jpg [R=301,L]
on its own without a condition above.

I thought there always had to be a condition to set a "if this exists" type of conditional pretext, for the rule's "then do this" stage.

lucy24

10:51 pm on Feb 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



completely unnecessary. Is that correct?

Yup. There exist situations when you do need to look at the hostname. But the exact wording of the line with ^.*$ could never in any way be necessary, since it doesn't say anything.

I thought there always had to be a condition

Not at all. In fact mod_rewrite works on a "two steps forward, one step back" system: first it looks at the rule and see if it could potentially apply-- for example, if the rule specifies some exact filename. Then and only then does it evaluate conditions, if any, to make sure they also apply. Conditionless rules are the fastest and easiest.

A conditionless RewriteRule is often interchangeable with a mod_alias redirect (Redirect or RedirectMatch by those names). But the two mods tend not to play nice together, so you should use only one or the other. For redirects, that is. mod_alias also does other stuff which doesn't create conflicts.

aeronautic

11:14 pm on Feb 9, 2015 (gmt 0)

10+ Year Member



Aha! Enlightenment! Lucy, thank you so much for your patient help.

I think I've got my problem solved and I've certainly learned a lot.

Again, huge thanks!

lucy24

12:42 am on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For future reference: I've asked an administrator to fix the typo in my first reply, so there will be no risk of future copy-and-paste artists being led astray.