Forum Moderators: phranque

Message Too Old, No Replies

Assistance with HTACCESS and rewriterule

         

BlackRook

12:03 am on Jun 17, 2010 (gmt 0)

10+ Year Member



Hello,

I am currently building a web site for my company, and wish to be as search engine friendly as possible. I currently have a dynamic URL operating like this: index.php?page=foo&subpage=bar

My current HTACCESS file looks like this:

Options -Indexes
ErrorDocument 403 /KC4/403.php
RewriteEngine On
RewriteRule ^([^/]*)/$([^/]*)$ /KC4/index.php?page=$1&subpage=$2 [NC, L]

I am pretty new to this, so I'm not sure if I'm doing something wrong on this end, or if its my PHP. The subpage variable works, but the page one does not.

Any help would be great.

Thanks.

jdMorgan

2:09 pm on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should refer to the resources cited in our Apache Forum Charter before continuing. mod_rewrite and regular-expressions syntax is complex and very restrictive, and not subject to being 'guessed at.'

The first "$" in your rule and the space in the [flags] field will both cause your rule to fail.

RewriteRule ^([^/]+)/([^/]+)$ /KC4/index.php?page=$1&subpage=$2 [L]

would likely work better.

Note that [NC] is not needed, since no specific alphabetic characters appear in the pattern.

This rule will work only for requested URLs containing both "variables" as "virtual subdirectories."

If the script will tolerate "subpage=<blank>", then the single rule could be modified to:

RewriteRule ^([^/]+)(/([^/]+)?)$ /KC4/index.php?page=$1&subpage=$3 [L]

which makes the slash and the subpage variable optional.

Otherwise you will need two rules, one for both variables, and one for only the "page" variable.

Note that all links to resources included on your pages must be in server-relative or canonical form, because it is the browser that resolves relative links based on the 'directory level' that it 'sees' in its address bar. As a result, page-relative links will not work, because the browser will be expecting the linked resource to exist in a path relative to /page/.

Therefore, links to images, css, and JavaScript files on your pages should be in the form <img src="/images/logo.gif"> or <img src="http://www.example.com/images/logo.gif"> and not <img src="images/logo.gif"> or <img src="../images/logo.gif">.

Jim

BlackRook

2:52 pm on Jun 17, 2010 (gmt 0)

10+ Year Member



Thank you for your quick reply. So you could tell I was guessing? Haha.

I implemented what you said, and it seems to be working better, but I'm still having problems with the "page" variable. Whenever I try to access just that page it results to "404". With that said, if I visit a subpage "foo/bar" and then try to visit "foo" by deleting "bar" from the URL and then returning, the "foo" page appears. I can only guess this is something to do with how things are referring?

If you wouldn't mind explaining what could cause that problem, I'd be grateful. I'm also reading up on the resources as advised.

Thank you.
Black.

jdMorgan

3:07 pm on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Be sure to delete your browser cache before testing any new server-side code... Otherwise, your browser will show you stale previously-cached pages and server responses, making your test results invalid. Forcing a page reload doesn't always work; Delete and/or disable the cache for testing (e.g. set the browser's caching time to zero days). But don't forget to re-enable it when done testing!

If the problem is related to the previously-mentioned relative linking, then you will be able to see incorrect URLs when hovering over links on your pages, and you should be seeing 404 errors related to bad included-object URLs in your server error log file.

Jim

BlackRook

3:29 pm on Jun 17, 2010 (gmt 0)

10+ Year Member



Hey Jim,

This is really strange. I've cleared the cache and even disabled it. The URLs appear as they should www.example.com/site/foo/.

However when I click it, it just returns as a dud. The problem is still the same. I can call a page and sub page with it returning as it should. Then deleting the subpage variable from the URL for just the page and seeing it. However, directly accessing it, or through the web site does not deliver the page. I'm going to try a different environment to see if it may be some setting or random fluke.

BlackRook

3:56 pm on Jun 17, 2010 (gmt 0)

10+ Year Member



Never mind, I'm an idiot. Missed a forward slash. -Whistles- Well thanks for the help Jim!

jdMorgan

5:25 pm on Jun 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This may also cause problems for user type-ins and link citations on other sites and in social media.

You should pick one format -- trailing-slash required or no-trailing-slash required, and then accept only that format for rewriting to your script. Requests in the other format should be detected and externally redirected (301-Moved Permanently) to the correct-format URL.

Jim

BlackRook

1:57 am on Jul 13, 2010 (gmt 0)

10+ Year Member



Hey Jim,

I was actually researching into that today on ensuring that everything has a trailing slash, or a redirect to make this happen.

This is the code that is currently in my HTACCESS file. I've tried and tried but I think I'm bound to be illiterate here.


Options -Indexes
ErrorDocument 403 /KC4/403.php
ErrorDocument 404 /KC4/404.php
RewriteEngine On
RewriteRule ^([^/]+)(/([^/]+)?)$ /KC4/index.php?page=$1&subpage=$3 [L]

What do I need to do to make sure that the rewrite rule inserts a trailing slash after the second argument? And ensure that people who don't add a trailing slash get redirected?

Thanks,
BR

BlackRook

2:02 am on Jul 13, 2010 (gmt 0)

10+ Year Member



Okay - I've managed to figure out how to get the trailing slash on the second argument:

RewriteRule ^([^/]+)(/([^/]+)(/)?)$ /KC4/index.php?page=$1&subpage=$3 [L]

I'm not sure if that's the best way or not. In either case, I'm still stuck on how to redirect people if they don't enter the slash.

On another note, is there a way to exclude certain folders from this rewrite rule? I have an admin panel that I use to manage my site and of course me entering the URL makes it think its a page that belongs to the site. Is there a way to rule that out along with folders containing images and so forth?

Thanks again,
BR

g1smd

7:20 am on Jul 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The redirect code to add, or to remove, a slash has been posted hundreds of times before.

I would suggest that you remove the slash for your page URLs; go extensionless.

URLs with a trailing slash imply the index page within a folder, and that there may be internal pages within that folder.

jdMorgan

3:22 pm on Jul 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Something like this...

Requiring and adding a trailing slash:

# Externally redirect to add missing trailing slash
RewriteRule ^([^/]+(/[^/]+)?)$ http://www.example.com/$1/ [R=301,L]
#
# Internally rewrite friendly URLs to script filepath
RewriteRule ^([^/]+)(/([^/]+))?/$ /KC4/index.php?page=$1&subpage=$3 [L]

Requiring no trailing slash, and removing it if present:

# Externally redirect to remove trailing slash
RewriteRule ^([^/]+)(/[^/]+)?/$ http://www.example.com/$1$2 [R=301,L]
#
# Internally rewrite friendly URLs to script filepath
RewriteRule ^([^/]+)(/([^/]+))?$ /KC4/index.php?page=$1&subpage=$3 [L]

Do be aware that this code -like your original- rewrites *almost all* requests to your script. So take care that your script properly handles requests for images, CSS files, JavaScript files, robots.txt, sitemap.xml, and all other files. It must either "include" the files to serve requests for these objects, or it must generate these objects itself. If this is not your intent, then requests for these object types should be excluded from the rules shown here, either by making the rule patterns more specific, or by adding RewriteConds.

Jim

BlackRook

2:19 am on Jul 14, 2010 (gmt 0)

10+ Year Member



Cool.

Thanks this should be able to get me started on more advanced HTACCESS down the line.

Thanks,
BR