homepage Welcome to WebmasterWorld Guest from 54.205.207.53
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
redirectMatch 301
Need to pattern match source and target pages
nubbin

10+ Year Member



 
Msg#: 4561526 posted 9:46 pm on Apr 4, 2013 (gmt 0)

Hi,
Please can you help me write a regular expression to redirect some pages.
I have many pages named
widget_NNNN.htm , where NNNN is a specific 4 digit number

I want to redirect each of those pages to
newwidget.html?stock=NNNN where NNNN is the same

I'd like to use a single redirectMatch statement to do this, but cannot figure it out..grrrr.
What should the redirectMatch statement look like?

Thanks very much.

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4561526 posted 10:57 pm on Apr 4, 2013 (gmt 0)

i think you really want an internal rewrite rather than a redirect.
this means you will need mod_rewrite rather than mod_alias and you'll want to use the RewriteRule Directive:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule

check out the RegEx documentation in the Apache mod_rewrite Introduction:
http://httpd.apache.org/docs/current/rewrite/intro.html [httpd.apache.org]

it should be very informative for your specific requirements.

then give it a try and report back.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4561526 posted 11:04 pm on Apr 4, 2013 (gmt 0)

Further:

Is this happening in htaccess or in your own server's config file?

If it's htaccess: The moment you add a rule using mod_rewrite, you will need to convert any existing rules using mod_alias (Redirect or RedirectMatch). This is because external redirects must happen before internal rewrites. On shared hosting you have no control over what order the mods load in. So you may be fine mixing mod_alias with mod_rewrite-- or you may have a big disaster. Don't take chances.

If it's your own server: You can continue using mod_alias if it loads after mod_rewrite, meaning it will excecute before. Execution order = opposite of load order.

nubbin

10+ Year Member



 
Msg#: 4561526 posted 11:41 pm on Apr 4, 2013 (gmt 0)

This is happening in my htaccess.

Basically I am replacing some old pages with some new versions of them. Therefore I thought I would use redirectMatch 301 to redirect any requests for the old pages from external sources to my site.

Do you still think I should be using internal rewrite and if so why?
Thanks

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4561526 posted 1:01 am on Apr 5, 2013 (gmt 0)

You cannot use mod_alias (Redirect by that name) because it works only on paths, not on queries. So you must use mod_rewrite.

Once you are in mod_rewrite you have two choices. You can redirect, meaning the browser's address bar changes, or you can rewrite, meaning it doesn't.

You currently have nice URLs that look like
widget_NNNN.htm
If the page content lives at
newwidget.html?stock=NNNN

... now wait a minute. What do you mean, .html?blahblah ? HTML pages don't have queries. Do you have a secret php file in the background? If so, you already have rewrites going on and you need to dump mod_alias yesterday. Or are you simply parsing html as php? Messy...

There are two questions: What to do and how to do it. You have to answer the "what" before the "how".

"What to do?" = Do you want the user's browser address bar to change? AND Where does the page content "really" live?

Let me assume for the sake of discussion that your content really lives at
newwidget.php?stock=NNNN

I do not think anyone hereabouts will advise you to REDIRECT users from the old pretty-good URL to the new not-so-good one. In fact, crystal ball says that within 48 hours there will be a post from g1smd extolling extensionless URLs. Personally I don't care for them. "Go back in the server and put some clothes on!" is my gut reaction.

If the users don't see the "real" filename, there is no need to parse html as php; your files can use honest extensions. All it takes is a single conditionless rule that looks something like this

RewriteRule widget_(\d\d\d\d)\.htm /newwidget.php?stock=$1 [L]

You may want a redirect going in the other direction-- but if nobody has ever seen the form with the query string, you probably don't need one.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4561526 posted 1:07 am on Apr 5, 2013 (gmt 0)

By using a rewrite, the URL as used on the web stays exactly the same.

Old site:

User requests example.com/widget_NNNN.htm and server internally uses file /widget_NNNN.htm to serve content (or maybe already uses rewrites to get the content from an index.php or other such file).

New site:

User requests example.com/widget_NNNN.htm and server internally uses /newwidget.html?stock=NNNN to serve content, without revealing what the internal location is. You achieve this by adding a RewriteRule configured as an internal rewrite (not as an external redirect).

RewriteRule ^widget_([0-9]{4})$ /newwidget.html?stock=$1 [L]
This rule must be the last rule in your mod_rewrite code.

You MUST convert all Redirect and RedirectMatch directives on your site to use RewriteRule otherwise you will run into problems.

nubbin

10+ Year Member



 
Msg#: 4561526 posted 1:38 am on Apr 8, 2013 (gmt 0)

Thanks very much for your replies lucy24 and g1smd.
Plenty of food for thought.
Lucy24, why is parsing html for php "messy" and what does that mean? Why is there a problem having a URL like newwidget.html?stock=NNNN?
Is there authoratative evidence that either practice is problematic? I ask because for years my site has been parsing pages for html and using a database driven page to generate a web page for a product based on its stock number. In other words I call page newwidget.html passing the stock number=NNNN. PHP code in newwidget.html uses the stock number to look up the product's details and creates the HTML to show a page about the product.
These pages rank very well in search engines. So, I am unclear why you disapprove.
I have another site that uses URL rewriting to create keyword type URLs. Those pages do no better in search engine results or product sales. That is a pain in the neck as it means every product needs a unique product name otherwise duplicate urls occur. Hence I conclude having keywords in the URL is not that important.

So as I do not see a problem with my approach, I do not want to retain my old style page names. It makes the site messy having lots of different URL naming formats. So I want to do 301 to the new page URL.

I did figure out how to use RedirectMatch to do what I originally wanted. This seems to work fine:

RedirectMatch 301 /widget_(.*).htm http://example.com/new_widget?stock_no=$1

Please let me know if I am committing any hideous errors or have overlooked any major problems with what I am doing.
Thanks

[edited by: tedster at 5:27 am (utc) on Apr 8, 2013]
[edit reason] switch to example.com [/edit]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4561526 posted 4:14 am on Apr 8, 2013 (gmt 0)

I thought your redirect involved the query string somewhere? If not, you can stay in mod_alias: Redirect(Match) by that name.

Forms like /widget_(.+).htm will work, but viewed purely as Regular Expressions they're pretty ghastly :)

#1 .* gives the possibility of zero content. Do you really have URLs in the form /widget_.htm and that's all?

#2 . is completely unconstrained. Since Regular Expressions are greedy by nature do not ask me why RegEx terminology all has to do with food that means your server first captures the request all the way to the end, and then has to backtrack until it's got a ".htm" left over.

#3 that same . shows up once more. It means "any character" so ".htm" doesn't only mean ".htm" but also "_htm" and "2htm" and "xhtm" and... well, you get the idea. In combination with the .* it adds an extra delay, because the server is now looking for "any old character followed by h" -- whoops! no, it's "any old character followed by ht" and so on. Express it as \. and you can cut to the chase.

And, since your URLs presumably don't contain any extraneous periods-- unlike, say, ahem, apache dot org which often has a literal . in mid-URL --you can replace the package with ([^.]+)\.htm

Finally, is that / the beginning of your path or does it come along later? Use an opening anchor ^ and your server can be out of there all the sooner "oops! doesn't begin with 'widget' so I'll stop looking right now".

Parsing html as php kinda defeats the point of having different extensions. On the actual files, not the URLs. Note that parsing X as Y is an entirely different process from rewriting an URL in X to a file in Y.

Someone else will give the lecture about pretty URLs and not-so-pretty URLs.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4561526 posted 7:19 am on Apr 8, 2013 (gmt 0)

I can highly recommend converting your RedirectMatch rule into the equivalent RewriteRule syntax.

That is a pain in the neck as it means every product needs a unique product name otherwise duplicate urls occur.

If you include a unique ID at the beginning of the URL, you avoid this problem.

Now would be the time to think about extensionless URLs. They are neat and are a great way of avoiding duplicate content issues. The needed code is simple, so there's no excuse to use parameter-based URLs these days.

Faced with a chice of
example.com/newwidget.html?stock=NNNN or
example.com/NNNN-doodad
I know which I prefer.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved