Forum Moderators: phranque

Message Too Old, No Replies

Rewrite rule with url decoding

         

drewjuk

9:52 am on Jan 26, 2012 (gmt 0)

10+ Year Member



Hi,

I have been trying to learn how to setup rewrite rules in .htaccess, I can't seem to get it to work when part of a string is url encoded, can any body help please?


Options -Indexes
Options +FollowSymLinks

#Rewrite engine
RewriteEngine on
RewriteRule ^product/(\w+)/([0-9]+)$ /index.php?product=$1&pid=$2&sp=1


where the product=$1 the $1 string could any type letters or numbers and it is html encoded, any ideas how to get it read this?

Thanks for your help

drewjuk

10:25 am on Jan 26, 2012 (gmt 0)

10+ Year Member



I almost got it to work with:

RewriteRule ^product/(.*)/([0-9]+)$ /index.php?product=$1&pid=$2&sp=1 


But then all the internal files like .css files are not found and it messes up the page..

lucy24

11:15 am on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you are rewriting from one directory to another-- even if the first directory doesn't really exist-- you MUST use absolute links for all associated files such as css. That is, address beginning in / and working downward from the root.

Never use .* (or .+) anywhere but at the end of a pattern. The RegEx then has to grind to a screeching halt when it discovers that it was supposed to leave room for a slash. And then more backtracking to ensure that the slash is followed by numerals.

So you need to figure out what characters may actually occur in that middle part of your pattern. One directory or more? Anything other than alphanumerics? Not sure what you mean by "html encoded" since an url can't contain anything that requires encoding. Just [a-zA-Z0-9_-] or, compactly, [\w-] (the lowline counts as a "word character" though the hyphen doesn't).

The simplest form is often
[^/]+
to capture exactly one directory name. It's probably what you want here.

Don't forget to put [L] at the end of your RewriteRule, unless you have a clear and particular reason for leaving it open to further Rewrites.

penders

11:22 am on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...you MUST use absolute links for all associated files such as css. That is, address beginning in / and working downward from the root.


Do you really mean 'absolute' or root-relative?

drewjuk

11:30 am on Jan 26, 2012 (gmt 0)

10+ Year Member



Sorry I mean url encode its a php function which makes urls correct for example converts spaces in to + etc..

I am not entirely sure of all the characters it would use I think it is just a-Z 0-9 and % and +

So is there a new rule I could write to tell all files which do not have full directories to start from base directory or something?

Thanks very much

lucy24

11:33 am on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Root-relative or site-absolute, depending on your preferred terminology. That's what I meant by "beginning in /". Not beginning in http:// :)

drewjuk

11:34 am on Jan 26, 2012 (gmt 0)

10+ Year Member



urlencode - Also it could have ' or - I am not sure any other characters would be used.

Would this be something which would work?

[a-zA-Z0-9_-%+']

Thanks for your help

lucy24

11:39 am on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Waitwaitwait. What are spaces doing in your url? They could easily be present in a query string, where they'd generally be changed into + in transit. But that's not part of the URL in your Rewrite.

If you have made the horrendous mistake of using literal spaces in your urls, they are far more likely to end up as %20. This happens in transit across the Internet before they ever reach your php file. And I would prefer not to contemplate an URL containing + signs. Anything besides alphanumerics, hyphens or lowlines and you're in trouble.

Remember, we're talking about your starting point. The "pattern" part of the rule is here the URL that humans see, and that travels around the Internet. You can rewrite it to anything you like, since that part won't be going anywhere but your own server.

Edit:
I think it would help at this point if you step back and explain in English what you're trying to do. That is: What do you want the user to see? What are they clicking on? What do you want to have going on in the background? Figuring out what you want to do is probably 90% of mod_rewrite. The rest is just mechanics.

[edited by: lucy24 at 11:42 am (utc) on Jan 26, 2012]

drewjuk

11:40 am on Jan 26, 2012 (gmt 0)

10+ Year Member



Would something like this work:

RewriteEngine on
RewriteRule ^product/([a-zA-Z0-9_-%+']+)/([0-9]+)$ /index.php?product=$1&pid=$2&sp=1

Then a new rule to say any image files start from base directory or root then follow on actual location instead of new location product/whater ever here/23/Images/images.jpg

drewjuk

11:42 am on Jan 26, 2012 (gmt 0)

10+ Year Member



So are you saying I should'nt urlencode the urls because I would not need to if using rewrite..

The only reason I put urlencode / decode on was so search engines could read urls properly?

If its insecure then I would rather take it of and do it a safer way?

Thanks for your help

drewjuk

11:52 am on Jan 26, 2012 (gmt 0)

10+ Year Member



/product/Nissan PHO2 A20/95

This does not work with this as the .htaccess

RewriteEngine on
RewriteRule ^product/([a-zA-Z0-9_-]+)/([0-9]+)$ /index.php?product=$1&pid=$2&sp=1

any ideas?

drewjuk

12:10 pm on Jan 26, 2012 (gmt 0)

10+ Year Member



AM I looking about this wrong way, would it be easier to cut down the product to just the first word possibly?

Thanks

drewjuk

12:22 pm on Jan 26, 2012 (gmt 0)

10+ Year Member



Ok so I have figured out how to get this to work with the help of the internet:

Options -Indexes
Options +FollowSymLinks

#Rewrite engine
RewriteEngine on
RewriteBase /
RewriteRule ^product/([-0-9a-zA-Z_\s]+)/([0-9]+)$ /index.php?product=$1&pid=$2&sp=1 [L]

But I still have the issues with any images or images inside css files etc not being found, so is there a way to sort this out with a rewrite rule cond?

Thanks

g1smd

8:45 pm on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can highly recommend that you do NOT use spaces or underscores in URLs. Spaces in particular cause too much complication and may also cause some accesses to fail.

Use some other character in place of a space, comma hyphen or period would be OK, as would several others. Try very very hard to never have to URL encode anything in a URL. It's extra complication that can easily break.


When you show a character group, list the hyphen last and away from the digit definitions.

Links to CSS and JS files should begin with a slash and show all folder levels and the filename. Never use the ../ notation here.

drewjuk

9:03 pm on Jan 26, 2012 (gmt 0)

10+ Year Member



Hey, thanks for your reply.

Is there a rule which will atomically tell say all jpg files to use root directory or do you have to specify full URL on everything ? CSS, jpegs all other type of images?

Should I rewrite code to make all spaces - I wasn't sure which was best for seo?

Thanks

g1smd

9:11 pm on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should link to the real location of the file, beginning with a slash and including all folder levels.

lucy24

10:16 pm on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you think everyone had abandoned you? :) I went to bed and g1 operates in a different time zone from both of us.

To recapitulate: The easiest way to capture any-and-all directory names is

([^/]+)

meaning "anything except a slash". For a series of them,

([^/]+)/([^/]+)/ etc. if you're capturing some exact number of directories

or else

(([^/]+/)+)

to capture all of 'em. And then put the filename at the end if and only if it's something other than a directory's index file (which properly has no name).

Any file in your own domain that is referenced in forms such as href= or src= has to be named like this:

/directory/otherdirectory/filename.xtn

if you want it to be accessible from everywhere on the site. The leading slash means you're starting at the top level and working downward.

Unfortunately this means you won't be able to view the pages locally unless you maintain two versions: one with relative links for offline viewing, and one with absolute (root-relative) links for the real thing. Or you can install a pseudo-server like WAMP or MAMP that sits on your own hard drive. By some weird oversight this is extremely easy, requiring no command-line stuff. And then you can make a copy and play with the config file at no risk to anybody. But it only works for one domain at a time.

g1smd

11:05 pm on Jan 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you have not got WAMP, MAMP or something similar installed on your PC or laptop, or else a dev. or test. subdomain on your server then you're missing out on a very simple way to develop without trashing your live site.

drewjuk

8:41 am on Jan 27, 2012 (gmt 0)

10+ Year Member



Hi,

Thanks for your help, I do use xampp, I have define the root url in constants so I have a dev and a live version for local/live.

I started changing the file locations with the main url instead of the root directory as I thought it would be better if hackers saw the url instead of the root file path in view source, should I just the main file path is there a difference in load speed etc..?

Its a pain going through it all but I would rather it was all right!

Thanks for your help much apreciated!

drewjuk

9:03 am on Jan 27, 2012 (gmt 0)

10+ Year Member



.htaccess doesn't work on xampp it just redirects to xampp home page

- never mind fixed this took base of and / before index.php


I am still unsure if I should use full path url or full directory path to link to files I would of thought url would be safer?

lucy24

9:50 pm on Jan 27, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Links can begin with:

http:// (absolute link)
/ (root-relative: for example your robots.txt is /robots.txt)
word character (relative link to something in same directory)
../ one or more iterations of ../ (relative link starting from a higher-level directory*)

Here is the long version [w3.org]. There exists an even longer one, but I don't have it bookmarked.

I can't think of any circumstance where you would use a full-blown http:// absolute link within your own site.

If you rewrite among directories, you MUST use root-relative links, because that is the only way the user's browser knows where to look for your css, images and other linked files.

If the address bar says

www.example.com/directory/filename

and the user is "really" at

www.example.com/index.php?a=directory&b=filename

then any links in the form

picture.jpg

will be interpeted as

www.example.com/directory/picture.jpg

because the browser goes by where it "thinks" you are, not where you "really" are.

Even if you don't rewrite anything else, stylesheets shared by your error documents have to use root-relative links because you never know where the user's browser is coming from. But you can be pretty sure it's not the /error_docs/ directory.


* NEVER EVER use this-- not even if, like me, you keep your pages in "packages", so relative links within the package are more stable than root-relative links.

drewjuk

5:59 pm on Jan 28, 2012 (gmt 0)

10+ Year Member



Hi Lucy,

Sorry I have been getting very confused, I changed all the links to absolute links because when using rewrite the links to images etc.. would not work.

So before I had link structure:

"Images/bleh.jpg"
"Images/bleh2.jpg"
"../Something/Images/Bleh.jpg"
"Inc/CSS/Styles.css"

But this was wrong because if we were theurl.com/product/something/32

The files would add on Images/bleh2.jpg to the end of theurl.com/product/something/32 which is wrong because the actual directory it is in is theurl.com/

S now I have changed these to

"http://FULLURL/Images/bleh.jpg"
"http://FULLURL/Images/bleh2.jpg"
"http://FULLURL/Something/Images/Bleh.jpg"
"http://FULLURL/Inc/CSS/Styles.css"

But this is obviously wrong also what should it be?

"/Images/bleh.jpg"
"/Images/bleh2.jpg"
"/Something/Images/Bleh.jpg"
"/Inc/CSS/Styles.css"
Is this what you mean?

Sorry for being annoying I am just trying to understand it properly and not always the best at reading things properly, I learn from self teach or experience "rubbish at school".

Thanks for being patient and helping!

lucy24

12:02 am on Jan 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But this is obviously wrong also what should it be?

"/Images/bleh.jpg"
"/Images/bleh2.jpg"
"/Something/Images/Bleh.jpg"
"/Inc/CSS/Styles.css"
Is this what you mean?

Yes, that's right. Your two last groups mean the same thing, but the version in http:// is overkill. And forms with leading / will work nicely in XAMP or whatever you've got.

drewjuk

11:51 am on Jan 30, 2012 (gmt 0)

10+ Year Member



Right thanks lucy understand that now got it all sorted how it should be, I have been trying to develop my .htaccess further into blocking php files and forwarding any htm files to php, the code works to forward htm files to php but the code to block php files does not work so great..

Any ideas what I am doing wrongm, it seems block any php requests cancels out redirect any php request to htm?


Options -Indexes
Options +FollowSymLinks

#Rewrite engine
RewriteEngine on

#BLOCK ANY PHP REQUESTS
#RewriteRule ^(.*)\.php errors/404
#End

#REDIRECT ANY PHP REQUEST TO HTM
RewriteRule ^(.*)\.htm$ $1.php [nc]
#End

#Main Pages
RewriteRule ^page/([-0-9a-zA-Z_]+)/([0-9]+)$ Pages/$1.php?pid=$2
RewriteRule ^content/([-0-9a-zA-Z_]+)$ index.php?pg=$1
RewriteRule ^content/([-0-9a-zA-Z_]+)/([0-9]+)$ index.php?pg=$1&toe=$2
RewriteRule ^content/contact_gbforklifts/([0-9]+)/([0-9]+)/([0-9]+)$ index.php?pg=contact_gbforklifts&toe=$2&pid=$1&buy=$3

#News Articles
RewriteRule ^news/([-0-9a-zA-Z_]+)/([0-9]+)$ index.php?pg=news&article=$1&nid=$2

#Shop
RewriteRule ^categories/([-0-9a-zA-Z_]+)/([0-9]+)$ index.php?cat=$1&cid=$2
RewriteRule ^categories/([-0-9a-zA-Z_]+)/([0-9]+)/([0-9]+)$ index.php?subcat=$1&scl=$2&cid=$3&child=2
RewriteRule ^categories/subcat/([-0-9a-zA-Z_]+)/([0-9]+)$ index.php?subcat=$1&cid=$2&spl=1
RewriteRule ^products/([-0-9a-zA-Z_]+)/([0-9]+)$ index.php?product=$1&pid=$2&sp=1
#End

#Create 404 Error Page
RewriteRule ^errors/([0-9]+)$ index.php?pg=error&type=$1
#End

#AddType application/x-httpd-php .htm .html
#ErrorDocument 404 /404.php
#End

#Forward sitemap
RewriteRule sitemap\.xml site-map.php [L]
#End


Once again thanks for all your help

g1smd

12:08 pm on Jan 30, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You must add the [L] flag to EVERY rule.

This stops further processing of .htaccess when a rule matches.

Never use (.*) at the beginning or in the middle of a pattern. Use a less ambiguous and less greedy pattern, one that will parse left-to-right in one pass.

drewjuk

2:30 pm on Jan 30, 2012 (gmt 0)

10+ Year Member



Hi g1smd,

Thanks for your reply I misunderstood the meaning of [L] thanks for pointing that out for me I will at that to the end of each rule and re test, I will also change .* to a-z0-9.

Thanks for your help!

g1smd

2:41 pm on Jan 30, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#REDIRECT ANY PHP REQUEST TO HTM
RewriteRule ^([a-z0-9])\.htm$ /$1.php [NC,L]
#End


The above code performs an internal rewrite not an external redirect.

It accepts an incoming URL request ending in .htm and rewrites the request to instead fetch content from a file with name ending in .php (so that's backwards to what you wrote in the code comment).

What had you actually intended to happen here?

Additionally, when the target begins with a backreference e.g. $1 you MUST precede the target with a leading slash otherwise you allow hackers to use path injection methods on your site.

g1smd

4:19 pm on Jan 30, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^categories/([-0-9a-zA-Z_]+)/([0-9]+)$ ...
RewriteRule ^categories/([-0-9a-zA-Z_]+)/([0-9]+)/([0-9]+)$ ...
RewriteRule ^categories/subcat/([-0-9a-zA-Z_]+)/([0-9]+)$ ...


A request for
example.com/categories/subcat/<something>
will have to be trial-matched against the first and second rule before the real match is found in rule 3.

When rules are listed in htaccess, you should list the "more specific" rules first, and the least specific rules last. So, rule 3 should be moved up to be listed before the other two.

The same applies to the content and content/contact... rules.

lucy24

10:50 pm on Jan 30, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have been trying to develop my .htaccess further into blocking php files and forwarding any htm files to php

In English: Your files are "really" php, but you only want users to ask for htm? Might be simpler all around to go extensionless, but that's your choice.

At some point you may need a Condition that looks at {THE_REQUEST} so that you are only intercepting php requests coming in from "outside", not the ones that result from internal activity. Note that certain functions such as auto-indexing and some analytics programs also use php, so make sure you're not interfering with those.

drewjuk

9:58 am on Jan 31, 2012 (gmt 0)

10+ Year Member



Hi

The website is an already established website there are no real .php requests everything is done by ?pg=4 or ?cat=1.

But yes in English the files are all php, I just want to trick the user so it thinks there are htm files instead of php files.

I could do extensionless, but then it will mess up analytic s or in this case the google seo stuff like you said!

Thanks
This 49 message thread spans 2 pages: 49