Welcome to WebmasterWorld Guest from 3.234.210.89

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Batch redirect question

     
4:03 pm on Apr 21, 2015 (gmt 0)

New User

5+ Year Member

joined:July 31, 2013
posts: 14
votes: 0


Hello!

I want to redirect quite a large number of URLs with this structure:

www.example.com/type/location/category-name.php


to

www.example.com/directory/category-name


As you can see, the only matching parts of these is the category-name element. I've already got a couple of lines in my .htaccess to strip the .php, but I was wondering if there was any way I could create a redirect based on the pattern above?

The /directory/ part will never change, so essentially, I just need to only match against the last part of the URL. There are hundreds of these, and ideally I'd like to deal with them all with one rule. Regardless of what the category-name is, I'd like the rule to strip out everything before it and replace it with /directory/.

Thanks - please feel free to say if I haven't been clear enough!

Ria
8:13 pm on Apr 21, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


What have you tried so far? I'm sorry to say that this looks like a question that has been asked approximately eighty thousand times in the history of WebmasterWorld. So let's see your current rule, and what the problems are.

If you already use any RewriteRules (it sounds as if you do), your new redirect will have to be worded the same way. If not, you could use a RedirectMatch within mod_alias.

Never mid about cagegory-name; that's the easy part. The question is whether everything in the form
/directory/subdir/
is to be redirected, or only certain selected names for each level.
12:17 am on Apr 22, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


The /directory/ part will never change

will the /type/location/ part change?
8:03 am on Apr 22, 2015 (gmt 0)

New User

5+ Year Member

joined:July 31, 2013
posts: 14
votes: 0


Thanks for your replies. This is what I've got so far:

RewriteRule ^type/(.*)$/(.*)$ /directory/$1 [R=301,NC,L]


To give an idea of what I'm trying to achieve, here are some examples:

www.example.com/men/spain/spanish-hats.php needs to become www.example.com/shop/spanish-hats

www.example.com/men/france/french-shoes.php needs to become www.example.com/shop/french-shoes

www.example.com/women/italy/italian-scarves.php needs to become www.example.com/shop/italian-scarves

So the /directory/ (/shop/) part will never change, but the /type/location/ part will.
9:00 am on Apr 22, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


RewriteRule ^type/(.*)$/(.*)$ /directory/$1 [R=301,NC,L]


when configuring an external redirect it's best to specify the full canonical protocol and hostname in the Substitution string of the RewriteRule.

you shouldn't use the [NC] flag here unless you can explain specifically why you have supplied it.

'(.*)$' means "capture zero or more of any character up to the end of the string, which makes the following '(.*)$' a syntax error, as nothing should follow an explicit end anchor in a regular expression.

$1 means the first capture group, which in your attempted example would redirect to /directory/spain instead of /directory/spanish-shoes.php.
this also points out you don't want to capture everything to the end anchor.

i would try to be as specific as possible with the regular expression.
maybe something like:
^(men|women)/([^/]+)/([^/]+)\.php$
9:08 am on Apr 22, 2015 (gmt 0)

New User

5+ Year Member

joined:July 31, 2013
posts: 14
votes: 0


Thanks, pharanque. So if I used

RewriteRule ^(men|women)/([^/]+)/([^/]+)\.php$ http://www.example.com/directory/$1 [R=301,L]


Would using the $1 mean the last part of the URL (such as spanish-hats) would be populated after /directory/?

Sorry if that's a stupid question! I struggle to get my head around regex at the best of times!
9:14 am on Apr 22, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


$1 means the first capture group (which are enclosed by parentheses in regular expressions).
this results in "men" or "women", whichever matched.
9:20 am on Apr 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


Oops, massive overlap, looks like a whole conversation went on while I was typing.
(.*)$/(.*)$

#1 Oh, dear. No 500-class error? No, I guess it would just be a silent failure: RegEx error, not Apache error. You can only have one $ because it means "the end of the entire string". Here you don't even need or want one $ sign, because you'll be capturing the last part of the string. By default, a regular expression goes on for as long as it can; you don't have to tell it to continue until the end.

#2 You certainly don't want .* because that would permit a null string. And if you're really getting requests for
/directory//morestuff
-- which I sure hope you're not --you don't want to redirect them. (Matter of fact your server may not even perceive // as two directory slashes, but that is neither here nor there.)

#3 You don't want .+ either, because the whole point is to stop at each directory boundary. The usual pattern is
^[^/]+/[^/]+/
But if all your directory names just use plain text, you can say \w or [a-z] instead. I find it easier to read; you may not care. I'll say \w from here on, assuming none of the directory names contain hyphens. (The shorthand \w covers alphanumerics and lowlines but not hyphens.)

#4 You don't need to capture the two directory names, because you won't be reusing them. You do need to capture the filename. But not all of the filename, since you're concurrently going extensionless. So now we're as far as
^\w+/\w+/([^./]+)\.php
Can't use \w here since your examples show that you do use hyphens. But can I please assume that your filenames don't contain literal periods other than the extension delimiter? (Periods are bad enough in directory names. In filenames they're hell.) You can leave off the closing anchor, because if there is extraneous garbage after the ".php" you may as well get rid of it in the same redirect.

#5 Always include the full protocol-plus-domain in a rewrite target, in case someone typed-in the wrong form of your domain name-- or a search engine intentionally asked for the wrong name to see what happens.

#6 [NC] is very seldom appropriate. Easy to type, but more work for the server. Here there's no reason at all for it.

So putting it all together
RewriteRule ^\w+/\w+/([^./]+)\.php http://www.example.com/shop/$1 [R=301,L]

But wait. Are you really redirecting absolutely every single php URL on your entire site, in all directories everywhere, so long as the directories are nested two deep? I can't help but feel that what you're really looking at is something more like
^(?:men|women|children|pets)/\w+/([^./]+)\.php
where the ?: means "we're not capturing this bit, so we don't have to keep count of $1 and $2 later".
10:06 am on Apr 22, 2015 (gmt 0)

New User

5+ Year Member

joined:July 31, 2013
posts: 14
votes: 0


Thanks so much, Lucy! I'll give it a go :)