Welcome to WebmasterWorld Guest from 54.205.251.179

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

301 for ridiculous URLs

Redirecting very long and poorly formatted URLs

   
1:29 pm on May 13, 2013 (gmt 0)

5+ Year Member



I'm trying to redirect URLs generated by a terrible CMS to fresh and clean WordPress URLs.

Here's an example of a URL I want to redirect to the home page:
http://www.example.com/index.cfm/fuseaction/site.content/mode/dtl/print
/1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/31355.cfm


I'm thinking there are some RegEx rules I'll need to use, but I haven't had any luck yet.

I have successfully redirected shorter URLs with this directive format:
Redirect 301 /index.cfm/fuseaction/site.content/mode/dtl/type/36260/post/34645.cfm http://www.example.com/locations/place-name/


And here's the basic WP mod_rewrite code I have in place below those successful directives:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress


Any advice?

P.S. If there's any way to wrap that code into multiple lines without adding spaces, please let me know, I'll happily modify it.

[edited by: Ocean10000 at 4:38 pm (utc) on May 13, 2013]
[edit reason] Updated so it will Wrap [/edit]

3:38 pm on May 13, 2013 (gmt 0)

5+ Year Member



So after a few more hours of testing different options I think I found a variation that will work for me.

I've added this to the bottom of my mod_rewrite block:

RedirectMatch 301 /index.cfm/fuseaction/site.content/(.*) http://www.example.com/page-name/


So it now looks like this:


# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RedirectMatch 301 /index.cfm/fuseaction/site.content/(.*) http://www.example.com/page-name/
</IfModule>

# END WordPress


It's not ideal, but it looks like it's going to work in bulk.
7:10 pm on May 13, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



RedirectMatch is a mod_alias directive which shouldn't be mixed with mod_rewrite directives.
I would suggest using a RewriteRule directive with the [R=301,L] flags and put it before the WP internal rewrite ruleset.

also you probably don't need the IfModule.
you either have the module or WP won't work.

you can also drop the RewriteBase directive.
9:02 pm on May 13, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Whew. Had to wait for a moderator with scissors before I could even read the post :) (Do some browsers auto-wrap? None of mine do, though the text editor does by default.)

That pattern of
http://www.example.com/index.cfm/fuseaction/site.content/mode/dtl/print
/1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/
1/post/index.cfm/fuseaction/site.content/mode/dtl/print/

looks like an object lesson in what can happen if you leave off the [L] flag and/or don't anchor properly.

What's the source of the crazy URL? If it came from anything other than an error on your own site, you don't have to redirect it. You can always meet it with an unequivocal 404: "You're delusional, there's no such page and you can't possibly have imagined that there IS such a page!"
11:59 pm on May 13, 2013 (gmt 0)

5+ Year Member



Thanks phranque, I was messing around with RewriteRule at one point, maybe without the right chunk of URL. IfModule was in there when I pulled it off the server. I'll try to figure out what put it there, possibly theme or plugin related.

lucy24 - The site was on a MedFusion (now Intuit Health) CMS before we replaced it with WP. I had no access to the CMS or Server before we re-pointed the DNS, so who knows where it came from. I only noticed it sitting in the Webmaster Central error report . . . along with >600 other URLs. So evidently someone or something had linked to it at some point.

I'll post more on this tomorrow. There's something else weird going on with which portion of the repeating URL chunk is being interpreted by the directive.
2:37 am on May 14, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



"Someone or something"? Could be anything including a badly programmed robot or the googlebot's own fevered imagination. You are not obliged to adopt it, and you are not obliged to "fix" 404 errors belonging to pages that genuinely don't exist. Do you have pages containing the element "fuseaction"? If not, feed visitors a 410 and think no more about it.

RewriteRule fuseaction - [G]

A 410 is not 100% honest if the page never existed, but it will make the googlebot go away sooner. (Doesn't appear to work on bingbot-- but I don't think a 410 makes it more persistent than a 404.)
8:30 pm on May 14, 2013 (gmt 0)

5+ Year Member



So I've got everything 301ing functionally at the moment, but I'm going to go back and replace the RedirectMatch with RewriteRules as long as that doesn't mess things up.

Here's the weird thing. This rule:
RedirectMatch 301 /index.cfm/fuseaction/site.physicians/(.*) http://www.example.com/page-name-1/

...appears to be redirecting this page:
http://www.example.com/index.cfm/fuseaction/site.locations/action/dtl/loc/ index.cfm/fuseaction/site.physicians/action/dtl/phys/99802154.cfm

...to http://www.example.com/page-name-1/

I was expecting this rule:
RedirectMatch 301 /index.cfm/fuseaction/site.locations/(.*) http://www.example.com/page-name-2/

...to apply to that URL.

Is the RedirectMatch directive just matching whatever segment of the URL that fits? I was expecting it to trigger when it matched only the first segment of the URL.
9:15 pm on May 14, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



With a start anchor on the pattern it would match from the start rather than match a chunk in the middle.

Use RewriteRule in place of RedirectMatch.
9:59 pm on May 14, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



g1!

:: jumping up and down screaming with excitement ::

Sheesh, thought you'd never come back.

Tim, if you've got a lot of rules to change, find a text editor that does Regular Expressions and feed it your htaccess file with the following:

# change . to \.
FIND
^(Redirect \d\d\d \S+?[^\\])\.
REPLACE WITH
$1\\.
# now change Redirect to Rewrite
FIND
^Redirect(?:Match)? 301 /(.+)
REPLACE WITH
RewriteRule $1 [R=301,L]
# and
FIND
^Redirect(?:Match)? 410 /(.+)
REPLACE WITH
RewriteRule $1 - [G]

These can safely be done as unsupervised global replaces-- but of course you'll make a copy of the htaccess first!