Forum Moderators: phranque
RewriteConf %{REQUEST_FILENAME} !-f
RewriteConf %{REQUEST_FILENAME} !-d
RewriteRule .* - [R=404]
I redirect non-www requests to their www version towards the end of my htaccess file
My non-www to www is near the bottom of the file
RewriteRule ^page-(.*)$ /?code=$1 RewriteRule ^page-(.*)$ http://www.example.com/?code=$1 [R=301,L]
Positioning the www-non-www rewrite before other rewrites can cause requests to be processed more than once and in some cases can cause a 500 error.
External redirects (301/302) will execute before any internal rewrites, unless that has been changed at the server level.
RewriteRule ^foo$ bar
RewriteRule ^foo$ http://example.com/ [L,R]
The target should be formatted with the protocol and path:
towards the end of my htaccess file
If so, then you could perhaps do an early check for non-existent files/directories (before your canonical www redirect):
RewriteCond %{THE_REQUEST} same-as-below
RewriteRule ^(admin|wp|blahblahetcetera) - [R=404]
That way you're not letting anyone in-- but you're also not giving away any information about what's really on your site. By returning the 404 manually, you save the server the work of going to look for the file. The flag R=anything-outside-the-300-range carries an implied [L], so the request will never reach the www redirect and will get an immediate 404. Yes, it is perfectly all right to lie to malign robots ;) The original directive is an internal rewrite. By including the protocol (and especially the R flag) it becomes an external redirect.
Make sure you've got an [L] exemption for your 403 page right at the beginning of your mod_rewrite section, or you'll get robots asking for it by name when they used the wrong www. Ask how I know this.How? :) You made me think, should I be giving bots that are filtered by user agent a 404 or is just F ideal? A 404 tells them very little while a 403 suggests I'm onto them.
RewriteCond %{HTTP_USER_AGENT} ^(example|random|badguy|bot|list) [NC]
RewriteRule .* - [F]