Forum Moderators: phranque

Message Too Old, No Replies

Mod Rewrite help

         

concepts99

2:42 am on Jul 19, 2011 (gmt 0)

10+ Year Member



Hello,

I need some help,

I am trying to convert

http://www.website.com/catalog/hONDA_ACCORD_p_164.html


to

http://www.website.com/product.php?productid=164


I am trying to get the number after _p_ (which in this case is 164
and then rewrite it to a product.php?product=(string here)

This is what I have so far, I believe it will take the string after _p_
but what I am not sure about is how to end it after the number, I want it to stop at .html and not copy .html for the string, only the number.

Can anyone help, thank you

What I have so far

RewriteRule ^_p_(.*)$ http://www.website.com/product.php?productid=$1 [NE,R=301,L]

lucy24

3:47 am on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I were doing it from scratch: to turn

http://www.example.com/catalog/hONDA_ACCORD_p_164.html

into

http://www.example.com/product.php?productid=164

It would be

RewriteRule catalog/[a-z_]+_p_(\d+)\.html product.php?productid=$1 [NC,L]

or

RewriteRule catalog/[a-z_]+_p_(\d+)\.html http://www.example.com/product.php?productid=$1 [NC,R=301,L]

That's assuming the original form either has no query string or you're throwing it away. The [a-z_]+_p_ part is a little scruffy, but I can't make it more precise without knowing the exact format of your old addresses. For example, does it always end in a capital letter? Entirely capitalized? It doesn't really say "hONDA" does it? If the original is always capitalized, leave off the [NC] element.

g1smd

8:08 am on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^_p_(.*)$


The fatal flaw is that ^_ says "begins with underscore" when there is clearly other stuff before the underscore.

If _p_ is always the first underscore,
^([^_]+)_p_([0-9]+)$
might be useful.

If there are multiple previous underscores,
^([^_]+_)+p_([0-9]+)$
might work.

You should also capture the text slug and validate it is correct. If it is incorrect you should redirect the user to the correct URL.

That is, if I link to you with
example.com/this-product-is-junk_p_456
your site should NOT serve content with "200 OK" status, it should instead redirect to
example.com/acme-widget_p_456
and only then should it serve the content.

You might also find placing the product ID first, and simply as
/p414-slug-text
makes the pattern matching a lot easier.

You also need to sort out in your head whether you need a "redirect" or a "rewrite". Both use
RewriteRule
syntax but have completely different end results.

concepts99

5:11 pm on Jul 19, 2011 (gmt 0)

10+ Year Member



Hello,

I made a mistake, it is not actually underscore _p_ but hyphen -p- "-p-"

I have tried
^([^-]+-)-p-([0-9]+)$
and
^([^-]+)-p-([0-9]+)$

and also

RewriteRule catalog/[a-z_]+-p-(\d+)\.html product.php?productid=$1 [NC,L]

but none of these are working

g1smd

7:49 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^([^-]+-)-p-([0-9]+)$
would match
<something>--p-<digits>.html
with a double hyphen.


In what way do the others not work? For the second pattern, you would need $2 not $1 for the rewrite target.

lucy24

7:56 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have tried
^([^-]+-)-p-([0-9]+)$
and
^([^-]+)-p-([0-9]+)$

Well, the first version doesn't work because you end up with two hyphens in a row, so scratch that :) Is this the only place in your address where there are hyphens? Or might they also occur earlier in the url? (Please say no.)
RewriteRule catalog/[a-z_]+-p-(\d+)\.html product.php?productid=$1 [NC,L]

Hm. What do you get when you try this version? Does your site die or do you get a 404 page? The 404 page is useful because your browser's address bar will continue to give the address it thought you were going go-- but only if the rule is set up as a redirect. So for testing purposes make it a redirect even if you won't need one in real life. That is:

RewriteRule catalog/[a-z_]+-p-(\d+)\.html http://www.example.com/product.php?productid=$1 [R=301,NC,L]

If your server doesn't like the \d form, use [0-9] instead. It only costs you three more bytes ;) Now try feeding in some random urls-- later on you can worry about weeding out the ones that don't really exist-- and see what you get.

concepts99

8:36 pm on Jul 19, 2011 (gmt 0)

10+ Year Member



"might they HYPHENS also occur earlier in the url"

Yes, they do. Do you think this may be a problem. Is there a way to look for the string with the exact 3 characters "-p-" - how does apache parse the URL? Does it do it one letter at a time, and does it remember what it reads?

My current apache is this, and it does not work. I tried replacing the \d with [0-9] and it still does not work.

I have a custom 404 redirect right now, when I type in the old page, it gets redirected to the custom 404. I removed the custom 404 for testing, I then type in the URL with the -p- code, and it dies.


An Example of what I wish to convert
www.examplecom/x/catalog/-SWITCHES-EVERYTHING-COMPONENT- VGA-HDMI-W-REPEATER-SELECT-p-13299.html

and

www.example.com/x/catalog/PYRAMID-PB617X-ARTIC-4-CHANNEL- AMPLIFIER-p-6348.htm

where I want to extract the # after -p- (6348 above) to

http://www.example.com/x/product.php?productid=6348

My current apache


DirectoryIndex index.html index.php


ErrorDocument 404 http://www.example.com/x/pages.php?pageid=35

Options -Indexes

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^cust/(.*)$ http://www.example.com/x/$1 [NE,R=301,L]

Options +FollowSymLinks
RewriteEngine on
RewriteRule catalog/[a-z_]+-p-([0-9]+)\.html http://www.example.com/x/product.php?productid=$1 [NE,R=301,L]


Options +FollowSymLinks
RewriteCond %{QUERY_STRING} DECLARE|EXEC|UNION|WHERE|ASCII|SCHEMA|SUBSTRING[NC]
RewriteRule . - [F]
RewriteCond %{QUERY_STRING} ^productid=([^&]*)&

doesnt seem to be working

g1smd

9:35 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Options +FollowSymLinks
should appear only once.

RewriteEngine on
should appear only once.

ErrorDocument 404 http://www.example.com/x/pages.php?pageid=35
is SEO SUICIDE. It produces a 302 redirect to a different URL. Remove the domain name from this. You need to serve 404 status at the originally requested URL.

Your rule with [F] flag should be first. Deny bad requests up front.

lucy24

11:51 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a custom 404 redirect right now, when I type in the old page, it gets redirected to the custom 404.

Short version: a 404 (or other error such as 403 or 50-anything) should never be redirected. Error pages are a special kind of rewrite: the human user sees a different physical page than the one they asked for, but the computer simply records the error code.

I removed the custom 404 for testing, I then type in the URL with the -p- code, and it dies.

Heh. I think we've got different interpretations of "die". I meant that the whole site crashes and you get hit with some type of 500 error. (This is the usual response when there is an error in .htaccess. Unlike HTML and CSS, it doesn't just ignore the part it doesn't understand and proceed to the next line, it shuts down cold.)

An Example of what I wish to convert
www.examplecom/x/catalog/-SWITCHES-EVERYTHING-COMPONENT- VGA-HDMI-W-REPEATER-SELECT-p-13299.html
and

www.example.com/x/catalog/PYRAMID-PB617X-ARTIC-4-CHANNEL- AMPLIFIER-p-6348.htm

where I want to extract the # after -p- (6348 above) to

http://www.example.com/x/product.php?productid=6348

!
Is everything before the -p- always capitalized? That will help a lot. The Regular Expression won't look any different, but it will reduce the computer load because it will only have to stop and backtrack once: when it meets last hyphen before p. Are there really spaces in your original url or are they artifacts of cut-and-paste? As long as they're inside class brackets, it's probably not a big issue. If capitalization really is as in your examples, do not include [NC] in your line-end stuff.

RewriteRule catalog/[A-Z0-9 \-]+[A-Z0-9]-p-([0-9]+)\.html http://www.example.com/x/product.php?productid=$1 [R=301,L]


The hyphen - has to be escaped \- only inside brackets. This applies to RegEx in general, not just apache. Conversely, inside class brackets the space is just a literal space. The final [A-Z0-9] is to stop the computer from counting the hyphen right before p. You can't say [^-] because that would simply pick up the p, and then the computer would have to backtrack even more.

This is all assuming your current code either doesn't have a query (the part after ?) or that you're throwing it away.