Forum Moderators: phranque

Message Too Old, No Replies

.htaccess on partially working

         

batface

6:09 am on Jun 24, 2011 (gmt 0)

10+ Year Member



Hi,

I really hope someone can help me with a .htaccess problem, or point me in the right direction.

I'm trying to set 301 redirects on a URL similar to:

[somesite.com...]

I need to 301 redirect .../updates/latest-news/... to /news/article/... and drop the -me suffix

So if I create the rule:

RewriteRule ^(.*)(-me.*)$ $1 [R=301,L]

the URL is the same and the trailing -me is cleaned up. I'm happy that rewrite works, the regex works, and I comment this out.

However, neither of the following will work and I am confused:
RewriteRule ^(.*)/updates/latest-news/(.*)(-me.*)$ $1/news/article/$2 [R=301,L]
or
RewriteRule ^http://www.somesite.com/updates/latest-news/48927-doc-resource-admission-notice-me10$ [somesite.com...] [R=301,L]

This is on a Joomla site.

Could some kind soul please help and point me the right way?

thanks
Mick

batface

10:26 am on Jun 24, 2011 (gmt 0)

10+ Year Member



not sure if this is relevant but the site was using aceSEF, this was removed and now using JoomlaSEF. What I cant figure out is why RewriteRule ^(.*)(-me.*)$ $1 [R=301,L] works and the other rules don't.

Please help. :-(

g1smd

10:43 pm on Jun 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Never use a
(.*)
pattern unless it is the LAST element before the $ end anchor.

(.*)
says "read the entire request into the $1 backreference".

(.*)-me
says "read the entire request into the $1 backreference then follow it with -me". This means the parser has to back off and retry hundreds of matches to see what you actually meant: when you said "all" you didn't actually mean "all".

You might need
([^/]+)/updates/latest-news/([^-]+)(-me.*)
which means "read until the next slash", then read "update/latest-news/" then "read until the next hyphen" followed by "me and the rest of the string".

If you need multiple folder levels before "updates/" then you will need a slightly different pattern.

Use example.com to stop forum auto-linking.

batface

6:14 am on Jun 25, 2011 (gmt 0)

10+ Year Member



Thanks for the feedback. mmm, this still doesn't work using your rule.

So I understand right I see that this is happening:

([^/]+) the / is negated so repeat through until you reach a /

([^-]+) the - is negated so repeat through until you reach a -

so i'm thinking if my URL ends like .../12345-blah-blah-blah-me10
the second rule stops at 5
so I amend to

([^-me]+) so it retries until it reaches the final 'h' then follow with (-me.*)

but that too does not redirect to the new URL or change anything in the browser address bar.

Am I starting to get there? :-)

g1smd

6:35 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



([^-me]+) matches anything that is not a "-" or "m" or "e".

It stops at the first "-" or "m" or "e".

batface

6:53 am on Jun 25, 2011 (gmt 0)

10+ Year Member



I think that I am now matching anything that is not '-','m' or 'e' in my character class. :-(

batface

7:01 am on Jun 25, 2011 (gmt 0)

10+ Year Member



sorry did no spot your update.

What about a lookahead like (-(?=me)) but then how do I manage if the URL contains other 'me's before?

lucy24

7:06 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops. Typed slower again, but I'll go ahead and post. No serious contradictions anyway ;)
so I amend to

([^-me]+) so it retries until it reaches the final 'h' then follow with (-me.*)

[^-me]+ does not mean "anything except the string '-me'". It means "any and all characters except hyphen, m or e". The only way to negate a group is through a lookahead or lookbehind, and I don't think htaccess does those.

[blank space left here for g1 to fill in missing information]

What you've got is, first, a bunch of directories that need to be handled with the
([^/]+/)+
which you've got. The second plus-- the one outside the parentheses-- means pick up as many sets like this as you can get. (If you need to capture this part, don't cut and paste, because you'll need a bit more stuff.)

The RegEx will keep scooping them up until it gets to the last batch of characters, which aren't followed by a slash. At that point it moves on to the next piece,
([^-]+-)+
again repeating until it runs out of letter(s)-plus-hyphen packages.

Are there ever any hyphens after the "-me" part? If so, you have got a big mess. If not, you are home free.

Except, urk, the capturing.
([^/]+)+([^-]+)+
would only give you the first directory and the first letters-plus hyphen bit. So you have to wrap the whole thing in a set of uber-parentheses:
(([^/]+)+([^-]+)+)
The outermost set are $1.

[Further blank left here for explication of how .htaccess handles nested parentheses, because that may be dialect-specific.]

batface

8:54 am on Jun 25, 2011 (gmt 0)

10+ Year Member



Thanks for the further information.

I thought I would be further forward, but unless I misunderstand you there is no change in page or address in the browser.

This is the exact rule i'm trying to use:

RewriteRule ^([^/]+)+/updates/admission-news/([^-]+-)+$ $1/article-search/item/$2 [R=301,L]

so
www.example.com/updates/admission-news/9542-iit-jam-2011-will-be-commenced-on-june-8-2011-dp17
should become
www.example.com/article-search/item/9542-iit-jam-2011-will-be-commenced-on-june-8-2011

So even if the above worked I would have a trailing - wouldn't I?

I'm confused? :-(

lucy24

9:54 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ouch! What happened to "me"? Has it morphed into dp?

To turn

www.example.com/updates/admission-news/9542-iit-jam-2011-will-be-commenced-on-june-8-2011-dp17

into

www.example.com/article-search/item/9542-iit-jam-2011-will-be-commenced-on-june-8-2011

you need to break it into four pieces:

www.example.com/updates/admission-news/9542-iit-jam-2011-will-be-commenced-on-june-8-2011-dp17

1. Ignore the first part (example.com). Nothing is happening to the domain name.
2. If you're looking for the exact text "updates/admission-news/" then keep it as-is. Otherwise, do the (([^/]+/)+) business. Note the double nest of parentheses, because you may be picking up more than one directory.
3. Now comes the (([^-]+-)+). From inside to outside, as written:

[^-]+ means one or more of anything except a hyphen.
[^-]+- means when you do get to a hyphen, pick it up.
([^-]+-) means put the whole thing into a package.
([^-]+-)+ means collect as many of these packages as you can.
(([^-]+-)+) means capture the whole package of packages: letter(s), hyphen, letter(s), hyphen, until you run out of hyphens.

Here's the catch. You have captured the last hyphen before dp17, or me10 or whatever it was. And you don't want it. To capture less, you can either make it a whole lot more complicated, ending up with
([^-]+(-[^-]+)+)-[^-]+
for a mega-package that begins and ends with non-hyphens, excluding the last set...

... or you can ask for special dispensation to use .+ so you can replace the whole horrid mess with

(.+)-[^-]+

or at least

(.+)-[A-Za-z0-9]+

which your server may or may not let you express as simply [\w]+

If the last bit is always exactly four characters, you might shave a nanosecond by replacing the + with {4}. It depends on whether the computer measures its string first, or just picks up pieces until it gets to the end.


"We carried away all that we did not catch, and all that we caught we left behind." That is now stuck in my head.

batface

1:32 pm on Jun 25, 2011 (gmt 0)

10+ Year Member



This is fantastic, it works a treat using
RewriteEngine ON
RewriteBase /
RewriteRule ^updates/admission-news/(([^-]+(-[^-]+)+)-[^-]+)$ article-search/item/$2 [R=301,L]

I have many directories to 301 to but this gives me a great template to work with.

Thank you so much for your help, it's really appreciated.

g1smd

6:05 pm on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You MUST also specify the protocol and domain name in the target URL.

RewriteBase / is the default and can be omitted.

batface

6:24 pm on Jun 25, 2011 (gmt 0)

10+ Year Member



Without RewriteBase / i'm getting the server directories on redirect at the front of the directories I want.

http://example.com/...something/something/public-html/...article-search/item... blah

There is trouble in store as a batch of the filenames in the correct redirected directory have a number at the front eg /1234-this-is-a-pain so my redirect goes to another 404 as that filename doesn't. Life is never easy and I think this is impossible as the numbers are random and you dont know which URL is effected. I just hope its not too many! :-)

batface

8:28 pm on Jun 30, 2011 (gmt 0)

10+ Year Member



me again! ;-(

I just can't figure out the right regex to add to
RewriteRule ^updates/admission-news/(([^-]+(-[^-]+)+)-[^-]+)$ article-search/item/$1 [R=301,L]

to catch a URL that has something like .../updates/admission-news/filename?tabslider=tab180_212&tstype=tabslider_ajax&tstype=tabslider_ajax&tabslider=tab178_212

Before I go and boil my head can someone please point me in the right direction?

lucy24

11:02 pm on Jun 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without RewriteBase / i'm getting the server directories on redirect at the front of the directories I want.

http://example.com/...something/something/public-html/...article-search/item... blah

By absolutely phenomenal coincidence I have just run into the identical issue in the course of my own htaccess-wrangling. The secret is that when you're doing a redirect-- regardless of whether you use Redirect or Rewrite with R=301-- you have to put a leading slash / in the target. (That's what g1 meant about "protocol and domain name". Apparently my server isn't quite as demanding, but it does insist on the slash.)

RewriteRule ^updates/admission-news/(([^-]+(-[^-]+)+)-[^-]+)$ article-search/item/$1 [R=301,L]

to catch a URL that has something like .../updates/admission-news/filename

Oh, lord, I'm getting a headache. You mean after all that business with hyphens, you now need to capture addresses that don't have hyphens at all? This isn't the thread where they were trying to capture the query string was it? So far, nothing you're doing has had any effect on the query. The rewrite stashes the query-- if any-- in a back room, does all of its stuff, and then quietly reappends it unchanged.

You could run a whole bunch of stuff with question marks ? and asterisks * to shovel them into the same rule, but it would be easier to just make a separate rule saying

^updates/admission-news/(\w)$

(Apache says you can use \w. If your server gets snarky, revert to the [A-Z0-9] form.)

Or are you saying that the [^-] expression is being interpreted to include question marks, i.e. query strings? Would [^-?] make any difference? Note that inside of brackets, ? just means a question mark.

g1smd

11:23 pm on Jun 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you want to do anything at all with query strings you need a preceding RewriteCond to check the value of either %{QUERY_STRING} or %{THE_REQUEST}.

The RewriteRule RegEx pattern can match ONLY the path part of a URL request.

batface

6:15 am on Jul 2, 2011 (gmt 0)

10+ Year Member



Oh, lord, I'm getting a headache. You mean after all that business with hyphens, you now need to capture addresses that don't have hyphens at all? This isn't the thread where they were trying to capture the query string was it? So far, nothing you're doing has had any effect on the query. The rewrite stashes the query-- if any-- in a back room, does all of its stuff, and then quietly reappends it unchanged.


Hehe, not as bad as my head! :-)

I do have URLs with hyphens and the help so far i've cleaned them up. I do however have some URLs that have hyphens and at the end there is something like this but is random with regard to the directory structure/path:

.../path/path?tabslider=tab180_212&tstype=tabslider_ajax&tstype=tabslider_ajax&tabslider=tab178_212


If you want to do anything at all with query strings you need a preceding RewriteCond to check the value of either %{QUERY_STRING} or %{THE_REQUEST}


To be honest I don't understand this. Could you give me an example and it will put me on the right path. I've looked at other threads and what i'm looking at varies so i'm confused?

I've got bogged down with this and now can't see why I can't even do another simple redirect with:

RewriteRule ^forum/forumdisplay\.php$ forum/forum\.php [R=301,L]


aarhh!

lucy24

6:37 am on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've got bogged down with this and now can't see why I can't even do another simple redirect with:

^forum/forumdisplay\.php$ forum/forum\.php [R=301,L]

Get rid of that second \.php escape. The target part of a rewrite isn't a RegEx, so it's trying to go to a page whose actual url has a backslash in it.

I do however have some URLs that have hyphens and at the end there is something like this but is random with regard to the directory structure/path:

.../path/path?tabslider=tab180_212&tstype=tabslider_ajax&tstype=tabslider_ajax&tabslider=tab178_212

Is this a whole new question? The original problem was how to chop off the -me or -dp or whatever at the very end of the URL. What are you trying to do with the ones that don't require chopping?

batface

7:03 am on Jul 2, 2011 (gmt 0)

10+ Year Member



Is this a whole new question? The original problem was how to chop off the -me or -dp or whatever at the very end of the URL. What are you trying to do with the ones that don't require chopping?


This is a mess a SEF extension in Joomla caused - most had -dp at the end of the URL but there are some duplicate pages with the duplicates having that stuff at the end with no -dp.

Spotting these now having cleaned up the bulk of the mess.

Get rid of that second \.php escape. The target part of a rewrite isn't a RegEx, so it's trying to go to a page whose actual url has a backslash in it.


Actually I had tried that then thought I needed the escape, so neither works.

lucy24

8:10 am on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, wait, you're still saying
forum/forum.php
Do you still have the RewriteBase or did it get lost? If so, you have to put in the full address for anything you're redirecting to.

Did you mean that you've got some spurious urls where the formula is
{blahblah}/{somestuff}/{the exact same stuff again}
? And you need to get rid of the duplicated part?

In a "vanilla" regex you'd be looking for
{blahblah}/(\w+/)\1
where \1 is an internal back-reference. But in Apache I suspect this will just make your server explode.

:: wandering off to investigate ::

batface

8:19 am on Jul 2, 2011 (gmt 0)

10+ Year Member



Actually there is something happening with a redirect I have just discovered from

forum/forumdisplay to forum/forum.php


This is not in the .htaccess and I need to investigate further as I guess it may be messing up

forum/forumdisplay.php to forum/forum.php

batface

8:26 am on Jul 2, 2011 (gmt 0)

10+ Year Member



Did you mean that you've got some spurious urls where the formula is
{blahblah}/{somestuff}/{the exact same stuff again}
? And you need to get rid of the duplicated part?


I have 2 URLs

http://www.example.com/articles/editorial/an-interesting-page


http://www.example.com/articles/editorial/an-interesting-page?tabslider=tab180_212&tstype=tabslider_ajax&tstype=tabslider_ajax&tabslider=tab178_212


so I wanted to get rid of the second one by 301 redirecting to the first.

I thought I could use a (.*) at the end of my original regex but that does not work.

(([^-]+(-[^-]+)+)-[^-]+(.*))

g1smd

9:33 am on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Get rid of that second \.php escape. The target part of a rewrite isn't a RegEx, so it's trying to go to a page whose actual URL has a backslash in it.

URLs are used only "out there", on the web. Inside the server there are only paths (folders) and files.

Get rid of that second \.php escape. The target part of an internal rewrite isn't a Regular Expression, it's a real filename on the server hard drive. It's trying to retrieve a file whose actual filename has a backslash in it.

g1smd

9:36 am on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To deal with URL requests with parameters you need a preceding RewriteCond looking at THE_REQUEST. The RewriteRule RegEx cannot see query string data. It can only see the path part of a request (the bit after the domain name and before the query string).

lucy24

3:43 pm on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have 2 URLs

http://www.example.com/articles/editorial/an-interesting-page

http://www.example.com/articles/editorial/an-interesting-page?tabslider=tab180_212&tstype=tabslider_ajax&tstype=tabslider_ajax&tabslider=tab178_212

so I wanted to get rid of the second one by 301 redirecting to the first.

Is it happening to one specific address-- or to some finite group of addresses-- or to all addresses? You have to set up a condition to say which addresses should get their queries chopped off. Or, conversely, which query patterns should be dumped.

The Horse's Mouth [httpd.apache.org] says
When you want to erase an existing query string, end the substitution string with just a question mark.


I don't think you need a redirect here. A rewrite will do. (Unless, ahem, g1 says otherwise ;)) But eventually you'll want to find out where that spurious query string is coming from. Goodness, what a lot of ajaxes. The last time I met unintended queries was when I was wrestling with javascript.

batface

8:36 pm on Jul 2, 2011 (gmt 0)

10+ Year Member



aha!

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteRule (.*) /$1? [R=301,L]

works for me!
I'm a happy bunny! :-)

g1smd

9:15 pm on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add the protocol and domain name to the RewriteRule target and you're there.

lucy24

9:47 pm on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add the protocol and domain name to the RewriteRule target and you're there.

I think this may be another of those shared-hosting, über-.htaccess issues. My own redirects all start with just a slash, except for the one that's being sent Far Away,* and they go where they're supposed to.


* To contemplate its navel at 127.0.0.1, to be exact. If I left out the http business there, it would probably end up on my own computer :)

g1smd

10:14 pm on Jul 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without the domain name in place in the redirect target,
www.example.com/old-page
redirects to
www.example.com/new-page

and
example.com/old-page
redirects to
example.com/new-page

thereby promoting Duplicate Content.

If you have a separate non-www to www redirect in place, you will also have created an unwanted two-step redirection chain for all non-www requests.

lucy24

1:09 am on Jul 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without the domain name in place in the redirect target,
www.example.com/old-page redirects to www.example.com/new-page
and example.com/old-page redirects to example.com/new-page
thereby promoting Duplicate Content.

If you have a separate non-www to www redirect in place, you will also have created an unwanted two-step redirection chain for all non-www requests.

I don't. My host lets you choose "with, without or either way" so I went with either way. (I don't like the look myself-- I always feel like telling the www-less site to go put some clothes on-- but generally No Skin Off My Nose.) And in the course of double-checking this, I discovered a slight booboo in my no-hotlinking code, so all is good.

gwt also has a "preferred domain" setting, so I think they all cancel each other out.
This 31 message thread spans 2 pages: 31