Forum Moderators: phranque

Message Too Old, No Replies

Extra long URL rewrite not working

         

batface

8:11 pm on Nov 17, 2011 (gmt 0)

10+ Year Member



I have zillions of the following style URLs and the rewrites i'm trying just don't work. Is there something in the structure I am missing?

A sample URL is:
http://www.example.com/component/option,com_mailto/link,aHR0cDovL3d3dy5neWFuY2VudHJhbC5jb20vYXJ0aWN
sZXMvdXBkYXRlcy9hZG1pc3Npb24tbmV3cy9jYWxjdXR0YS1
1bml2ZXJzaXR5LWludml0ZXMtYXBwbGljYXRpb25zLWZvci1
tYXN0ZXItaW4tYnVzaW5lc3MtbWFuYWdlbWVudC1tYm0/ZGF0ZT0yMDEyLTA3LTAx/tmpl,component/

and I just want them to rewrite to http://www.example.com/

The rule I thought would have worked is:
RewriteRule ^component/([^\\]+)$ / [R=301,L]

my assumption being any character other than \ is iterated through to the end of the URL with the $ being the end of the URL.

What am I doing wrong?

[edited by: bill at 2:29 am (utc) on Nov 18, 2011]
[edit reason] added some line breaks to prevent sidescroll [/edit]

g1smd

10:29 pm on Nov 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's usually bad fom to mass redirect multiple URLs to the root - especially so when the content is a poor match.

If you want to match everything "beginning /component/" then make a pattern that does exactly that. It will run a lot faster too.

lucy24

11:51 pm on Nov 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to collar nearest passing moderator to make the question readable (although in this case it is pretty funny) ::

my assumption being any character other than \ is iterated through to the end of the URL with the $ being the end of the URL


The \ character would never appear in an URL at all, so there's no reason even to mention it. And since you're not capturing, it wouldn't matter if the character did occur.

Let's backtrack a bit. Where are those monster URLs coming from? Are they being actively generated or are they left over from some earlier site design?

When you say "doesn't work" do you mean that nothing happens at all, or that something happens but it isn't what you intended?

When you use mod_rewrite to redirect, include the full protocol and domain name:

RewriteRule {blahblah} http://www.example.com/ [R=301,L]

This will simultaneously take care of any with-or-without www. issues. The Redirect will usually work in the simpler leading-slash form (the same applies to Redirects using mod_alias), but may lead to the dreaded Duplicate Content because it reuses whichever domain name it was given.

:: wandering off to give further thought to "doesn't work" boilerplate ::

batface

5:06 am on Nov 18, 2011 (gmt 0)

10+ Year Member



The \ character would never appear in an URL at all, so there's no reason even to mention it. And since you're not capturing, it wouldn't matter if the character did occur.


I mentioned it because it is not in the URL. So I am looking for any character that is not \. Do I have to stop at every directory / ?

I'm so used to using regex buddy that this is doing my head in.
To me
[^/]+/[^/]+/[^/]+/[^/]+/

should also work (picks up the whole URL), but doesn't.

Sorry was frustrated even before my insomnia. I mean these URLs are not found (404) and I want to clean them up.

phranque

9:40 am on Nov 18, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you should follow protocol and use the appropriate response for the requested resource.

if the requested url never existed and you don't understand what is being requested that means "Not Found" which is a 404 status code.
if the requested url once existed but you now don't have any equivalent content to which the visitor should be redirected, that means it is now "Gone" which is a 410 status code.
if the equivalent content for the requested url has been moved to a new url or the requested url is in error but you understand the request, then you should inform the visitor that the content has "Moved Permanently" with a 301 status code and a Location: header specifying the new url.

as g1smd suggested a massive redirect to the home page is rarely a good thing and will usually be seen as a signal of low quality if not an outright attempt to manipulate page rank.
before you decide on the best response for this class of urls, you should examine who is requesting these urls and if possible where they were discovered.

batface

11:07 am on Nov 18, 2011 (gmt 0)

10+ Year Member



thanks all for the sound advice. I have got an underlying bug so these URLs are going to be continually churned out, so i'll get that fixed.

phranque

2:45 pm on Nov 18, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



in that case use a 301 if you can determine the proper destination or a 404 status code with a sufficiently helpful navigation and search on the error page

lucy24

2:39 am on Nov 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Backtracking again: the format

RewriteRule ^component http://www.example.com/ [R=301,L]

ought to pick up everything. Does it?