homepage Welcome to WebmasterWorld Guest from 54.226.21.57
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
301 for ridiculous URLs
Redirecting very long and poorly formatted URLs
timstaines

5+ Year Member



 
Msg#: 4573515 posted 1:29 pm on May 13, 2013 (gmt 0)

I'm trying to redirect URLs generated by a terrible CMS to fresh and clean WordPress URLs.

Here's an example of a URL I want to redirect to the home page:
http://www.example.com/index.cfm/fuseaction/site.content/mode/dtl/print
/1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/31355.cfm


I'm thinking there are some RegEx rules I'll need to use, but I haven't had any luck yet.

I have successfully redirected shorter URLs with this directive format:
Redirect 301 /index.cfm/fuseaction/site.content/mode/dtl/type/36260/post/34645.cfm http://www.example.com/locations/place-name/


And here's the basic WP mod_rewrite code I have in place below those successful directives:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress


Any advice?

P.S. If there's any way to wrap that code into multiple lines without adding spaces, please let me know, I'll happily modify it.

[edited by: Ocean10000 at 4:38 pm (utc) on May 13, 2013]
[edit reason] Updated so it will Wrap [/edit]

 

timstaines

5+ Year Member



 
Msg#: 4573515 posted 3:38 pm on May 13, 2013 (gmt 0)

So after a few more hours of testing different options I think I found a variation that will work for me.

I've added this to the bottom of my mod_rewrite block:

RedirectMatch 301 /index.cfm/fuseaction/site.content/(.*) http://www.example.com/page-name/


So it now looks like this:


# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RedirectMatch 301 /index.cfm/fuseaction/site.content/(.*) http://www.example.com/page-name/
</IfModule>

# END WordPress


It's not ideal, but it looks like it's going to work in bulk.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4573515 posted 7:10 pm on May 13, 2013 (gmt 0)

RedirectMatch is a mod_alias directive which shouldn't be mixed with mod_rewrite directives.
I would suggest using a RewriteRule directive with the [R=301,L] flags and put it before the WP internal rewrite ruleset.

also you probably don't need the IfModule.
you either have the module or WP won't work.

you can also drop the RewriteBase directive.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4573515 posted 9:02 pm on May 13, 2013 (gmt 0)

Whew. Had to wait for a moderator with scissors before I could even read the post :) (Do some browsers auto-wrap? None of mine do, though the text editor does by default.)

That pattern of
http://www.example.com/index.cfm/fuseaction/site.content/mode/dtl/print
/1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/

looks like an object lesson in what can happen if you leave off the [L] flag and/or don't anchor properly.

What's the source of the crazy URL? If it came from anything other than an error on your own site, you don't have to redirect it. You can always meet it with an unequivocal 404: "You're delusional, there's no such page and you can't possibly have imagined that there IS such a page!"

timstaines

5+ Year Member



 
Msg#: 4573515 posted 11:59 pm on May 13, 2013 (gmt 0)

Thanks phranque, I was messing around with RewriteRule at one point, maybe without the right chunk of URL. IfModule was in there when I pulled it off the server. I'll try to figure out what put it there, possibly theme or plugin related.

lucy24 - The site was on a MedFusion (now Intuit Health) CMS before we replaced it with WP. I had no access to the CMS or Server before we re-pointed the DNS, so who knows where it came from. I only noticed it sitting in the Webmaster Central error report . . . along with >600 other URLs. So evidently someone or something had linked to it at some point.

I'll post more on this tomorrow. There's something else weird going on with which portion of the repeating URL chunk is being interpreted by the directive.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4573515 posted 2:37 am on May 14, 2013 (gmt 0)

"Someone or something"? Could be anything including a badly programmed robot or the googlebot's own fevered imagination. You are not obliged to adopt it, and you are not obliged to "fix" 404 errors belonging to pages that genuinely don't exist. Do you have pages containing the element "fuseaction"? If not, feed visitors a 410 and think no more about it.

RewriteRule fuseaction - [G]

A 410 is not 100% honest if the page never existed, but it will make the googlebot go away sooner. (Doesn't appear to work on bingbot-- but I don't think a 410 makes it more persistent than a 404.)

timstaines

5+ Year Member



 
Msg#: 4573515 posted 8:30 pm on May 14, 2013 (gmt 0)

So I've got everything 301ing functionally at the moment, but I'm going to go back and replace the RedirectMatch with RewriteRules as long as that doesn't mess things up.

Here's the weird thing. This rule:
RedirectMatch 301 /index.cfm/fuseaction/site.physicians/(.*) http://www.example.com/page-name-1/

...appears to be redirecting this page:
http://www.example.com/index.cfm/fuseaction/site.locations/action/dtl/loc/ index.cfm/fuseaction/site.physicians/action/dtl/phys/99802154.cfm

...to http://www.example.com/page-name-1/

I was expecting this rule:
RedirectMatch 301 /index.cfm/fuseaction/site.locations/(.*) http://www.example.com/page-name-2/

...to apply to that URL.

Is the RedirectMatch directive just matching whatever segment of the URL that fits? I was expecting it to trigger when it matched only the first segment of the URL.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4573515 posted 9:15 pm on May 14, 2013 (gmt 0)

With a start anchor on the pattern it would match from the start rather than match a chunk in the middle.

Use RewriteRule in place of RedirectMatch.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4573515 posted 9:59 pm on May 14, 2013 (gmt 0)

g1!

:: jumping up and down screaming with excitement ::

Sheesh, thought you'd never come back.

Tim, if you've got a lot of rules to change, find a text editor that does Regular Expressions and feed it your htaccess file with the following:

# change . to \.
FIND
^(Redirect \d\d\d \S+?[^\\])\.
REPLACE WITH
$1\\.
# now change Redirect to Rewrite
FIND
^Redirect(?:Match)? 301 /(.+)
REPLACE WITH
RewriteRule $1 [R=301,L]
# and
FIND
^Redirect(?:Match)? 410 /(.+)
REPLACE WITH
RewriteRule $1 - [G]

These can safely be done as unsupervised global replaces-- but of course you'll make a copy of the htaccess first!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved