Forum Moderators: phranque

Message Too Old, No Replies

Excluding specific dir from being rewritten + trailing slash addition

Unusual one this, old server worked fine but new one doesn't.

         

christopherwoods

3:03 am on Apr 2, 2008 (gmt 0)

10+ Year Member



Ok, so I'm freshly confused - especially as this site's previous hosting handled this quite gracefully without any extra code.

index.php in the root handles all incoming requests, and mod_rewrite is used to pass variables to the script (which then sees what page is being requestd and includes the content on a per-request basis).

I have a very simple htaccess:

RewriteEngine On
RewriteRule ^([a-zA-Z0-9]*)$ index.php?section=$1 [L]

If you request example.com/link1 , index.php will see the "link1" variable and include code from ./content/link1.php - if an invalid variable is specified, index.php just includes the default text.

However, there's a physical directory called 'catalogue' in the root which contains a Quick.Cart installation (flat-file product catalogue, since you asked). On the old server, with the aforementioned rewrite rules, if I requested example.com/catalogue (no trailing slash), it would load the catalogue dir first time. However, on the new server, it doesn't.

I've struggled with my basic knowledge to try and construct a rule to redirect requests to add a trailing slash, but the closest I got to success was with

RewriteRule ^catalogue$ catalogue/$1 [L]

inserted BEFORE the do-it-all rewrite rule (which passes the variables to the site's main index.php file). It would redirect the browser and it would display the page content, but all images and CSS references are broken so it's next to useless. Any other permutation or alternative either takes me back to the front page again (almost like it's ignoring the rule and request altogether) or it can't figure out what to do.

What's the best practice here for excluding a directory from being rewritten, THEN apply a trailing slash rewrite so that the catalogue loads? If I manually request /catalogue/ with a trailing slash, it works fine - but I'd like to keep the site's URI appearance the same across the board, particularly as people will most likely not type a trailing slash if they go straight to that URL. Would a RewriteCond be the key to solving this problem, or do I need a more competently written RewriteRule?

This forum has turned up some golden answers before, so fingers crossed you guys will help me out again. Many thanks in advance. :)

jdMorgan

3:59 am on Apr 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try:

RewriteEngine on
#
RewriteRule ^catalogue$ http://www.example.com/catalogue/ [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/catalogue/
RewriteRule ^([a-z0-9]*)$ /index.php?section=$1 [NC,L]

The [NC] flag makes the pattern-matching case-insensitive, and is more efficient than using "a-zA-Z".

If you request example.com/link1 , index.php will see the "link1" variable and include code from ./content/link1.php. If an invalid variable is specified, index.php just includes the default text.

Now that is a serious problem, if you care a whit about search engines... If an invalid path is requested, the script should generate a 404-Not Found response, and a custom 404 ErrorDocument should be used to explain the situation to your visitor (human) and provide links to your home page, site map, search page, categories directory, and product selector pages, as applicable.

Returning a 200-OK response to any and all arbitrary URL requests will cause search engines to limit the depth to which they're willing to crawl your site, and risks potentially-massive duplicate-content-induced ranking problems. This leaves your ranking subject to malicious tampering.

Jim

christopherwoods

10:42 am on Apr 2, 2008 (gmt 0)

10+ Year Member



Blimey, didn't think of that - somehow when I was devving the site on my own server it came to have a higher google ranking for certain combinations of keywords than their existing web site! That amused me greatly (I had people submitting enquiries about bookings through my dev platform! Quite amusing).

I actually designed that feature in from an aesthetic point but I never considered it'd affect the SEO like that, so I'll recode that sharpish.

It won't take much to code in this 404 handling as the if...else loop is coded to only include content if the request matches an existing file, so it won't take long to recode for this contingency. Thank you very much for pointing it out, I feel suitably chastised :D

I was toying with the idea of using NC but decided to leave it out for the sake of simplicity, I might as well stick it in if you think it's more useful in than out.

Does returning a 301 code for the redirect work well with search engines? I want to effect a redirect (to add the trailing slash so the scripts inside the /catalogue/ directory function correctly instead of the request being captured by the catch-all rewrite rule) but is the 301 appropriate for this particular usage?

The RewriteCond with the negator was what I couldn't wrap my head around, also one of the problems with coding at 3am :( I'll give your code a go now, many thanks for the reply. :) (I love this place!)

jdMorgan

2:06 pm on Apr 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> is the 301 appropriate for this particular usage?
Absolutely. And is required to fix your "included object links are broken" problem.

Best practices also require that your dev platform(s) include a robots.txt file that Disallows all robots -- and access-control code in httpd.conf or .htaccess to back that up.

Many developers deny access to all "outside" IP addresses or require a password/log-in from all outside IP addresses to access the dev server(s). Again, you risk duplicate-content problems if this is not done and, as you've discovered, run the risk of your dev server(s) out-ranking your (or your clients') Web sites.

Jim

christopherwoods

2:34 pm on Apr 2, 2008 (gmt 0)

10+ Year Member



Well, I've set up an htaccess redirect to silently forward all requests to the new (permanent) domain with a 301 (which I was going to do anyway) - so as google has it ranked quite highly it should propagate.

For the benefit of people finding this thread via Google, here's the code I used:

RewriteEngine On
RewriteRule ^(.*)$ http://example.co.uk/$1 [r=301,nc]

And it works perfectly. :)

It was an odd situation because it was the dev server but for a while it was also going to be the production server where the site was hosted, but my colleague and I we invested in a new reseller hosting platform a week ago, so things were a little ad-hoc for a while. :)

Quite an amateur approach as far as dev servers go, but it worked for our rather odd requirements... I'll have to take your feedback on board though and make future web-facing dev sites far more restricted.

I'm about to go try out your code now, will post back if there's any odd problems. Thanks again :)

[edited by: jdMorgan at 8:32 pm (utc) on April 2, 2008]
[edit reason] example.com.uk [/edit]

jdMorgan

8:35 pm on Apr 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That rule should read:

RewriteRule (.*) http://example.co.uk/$1 [R=301,L]

Unnecessary (in this case) start and end anchors removed from pattern, superfluous [NC] flag removed, and [L] flag added for efficiency.

If combined with the code above, this should be the *second* rule in the group. Put external redirects first, in most-specific to least-specific pattern order, then internal rewrites, again most-specific to least-specific.

Jim

christopherwoods

9:58 am on Apr 3, 2008 (gmt 0)

10+ Year Member



I'll update my code with your (always correct!) versions later - I just dropped by quickly to let you know that I applied your update to my first problem's code late last night, and it works absolutely perfectly. Thank you for saving some of my follicles. ;)