Forum Moderators: phranque
index.php in the root handles all incoming requests, and mod_rewrite is used to pass variables to the script (which then sees what page is being requestd and includes the content on a per-request basis).
I have a very simple htaccess:
RewriteEngine On
RewriteRule ^([a-zA-Z0-9]*)$ index.php?section=$1 [L] If you request example.com/link1 , index.php will see the "link1" variable and include code from ./content/link1.php - if an invalid variable is specified, index.php just includes the default text.
However, there's a physical directory called 'catalogue' in the root which contains a Quick.Cart installation (flat-file product catalogue, since you asked). On the old server, with the aforementioned rewrite rules, if I requested example.com/catalogue (no trailing slash), it would load the catalogue dir first time. However, on the new server, it doesn't.
I've struggled with my basic knowledge to try and construct a rule to redirect requests to add a trailing slash, but the closest I got to success was with
RewriteRule ^catalogue$ catalogue/$1 [L] inserted BEFORE the do-it-all rewrite rule (which passes the variables to the site's main index.php file). It would redirect the browser and it would display the page content, but all images and CSS references are broken so it's next to useless. Any other permutation or alternative either takes me back to the front page again (almost like it's ignoring the rule and request altogether) or it can't figure out what to do.
What's the best practice here for excluding a directory from being rewritten, THEN apply a trailing slash rewrite so that the catalogue loads? If I manually request /catalogue/ with a trailing slash, it works fine - but I'd like to keep the site's URI appearance the same across the board, particularly as people will most likely not type a trailing slash if they go straight to that URL. Would a RewriteCond be the key to solving this problem, or do I need a more competently written RewriteRule?
This forum has turned up some golden answers before, so fingers crossed you guys will help me out again. Many thanks in advance. :)
RewriteEngine on
#
RewriteRule ^catalogue$ http://www.example.com/catalogue/ [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/catalogue/
RewriteRule ^([a-z0-9]*)$ /index.php?section=$1 [NC,L]
If you request example.com/link1 , index.php will see the "link1" variable and include code from ./content/link1.php. If an invalid variable is specified, index.php just includes the default text.
Now that is a serious problem, if you care a whit about search engines... If an invalid path is requested, the script should generate a 404-Not Found response, and a custom 404 ErrorDocument should be used to explain the situation to your visitor (human) and provide links to your home page, site map, search page, categories directory, and product selector pages, as applicable.
Returning a 200-OK response to any and all arbitrary URL requests will cause search engines to limit the depth to which they're willing to crawl your site, and risks potentially-massive duplicate-content-induced ranking problems. This leaves your ranking subject to malicious tampering.
Jim
I actually designed that feature in from an aesthetic point but I never considered it'd affect the SEO like that, so I'll recode that sharpish.
It won't take much to code in this 404 handling as the if...else loop is coded to only include content if the request matches an existing file, so it won't take long to recode for this contingency. Thank you very much for pointing it out, I feel suitably chastised :D
I was toying with the idea of using NC but decided to leave it out for the sake of simplicity, I might as well stick it in if you think it's more useful in than out.
Does returning a 301 code for the redirect work well with search engines? I want to effect a redirect (to add the trailing slash so the scripts inside the /catalogue/ directory function correctly instead of the request being captured by the catch-all rewrite rule) but is the 301 appropriate for this particular usage?
The RewriteCond with the negator was what I couldn't wrap my head around, also one of the problems with coding at 3am :( I'll give your code a go now, many thanks for the reply. :) (I love this place!)
Best practices also require that your dev platform(s) include a robots.txt file that Disallows all robots -- and access-control code in httpd.conf or .htaccess to back that up.
Many developers deny access to all "outside" IP addresses or require a password/log-in from all outside IP addresses to access the dev server(s). Again, you risk duplicate-content problems if this is not done and, as you've discovered, run the risk of your dev server(s) out-ranking your (or your clients') Web sites.
Jim
For the benefit of people finding this thread via Google, here's the code I used:
RewriteEngine On
RewriteRule ^(.*)$ http://example.co.uk/$1 [r=301,nc] And it works perfectly. :)
It was an odd situation because it was the dev server but for a while it was also going to be the production server where the site was hosted, but my colleague and I we invested in a new reseller hosting platform a week ago, so things were a little ad-hoc for a while. :)
Quite an amateur approach as far as dev servers go, but it worked for our rather odd requirements... I'll have to take your feedback on board though and make future web-facing dev sites far more restricted.
I'm about to go try out your code now, will post back if there's any odd problems. Thanks again :)
[edited by: jdMorgan at 8:32 pm (utc) on April 2, 2008]
[edit reason] example.com.uk [/edit]
RewriteRule (.*) http://example.co.uk/$1 [R=301,L]
If combined with the code above, this should be the *second* rule in the group. Put external redirects first, in most-specific to least-specific pattern order, then internal rewrites, again most-specific to least-specific.
Jim