Forum Moderators: phranque
It works fine for most domains but not domain.co.uk - anyone know why its not working and what modification i need to make?
RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g¦gif)$ - [F]
ErrorDocument 404 http://example.com
DirectoryIndex index.html
[edited by: jdMorgan at 1:06 pm (utc) on Oct. 2, 2009]
[edit reason] example.com [/edit]
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
The problem here is in establishing some "fixed point" in the requested hostname at which you can decide "everything before this is the subdomain, and everything after is the domain." That's a bit tough when dealing with .com/.net/.org versus .co.cc and "any number of subdomains" at the same time, so I'd suggest looking specifically for those two two-letter sequences and single occurrences of TLDs of 3 letters or more, as in:
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[a-z]{3,})\.?(:[0-9]+)?$ [OR]
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[a-z]{2}\.[a-z]{2})\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.([a-z]{3,}¦[a-z]{2}\.[a-z]{2}))\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
Replace all broken pipe "¦" characters above with solid pipes before use; Posting on this forum modifies the pipe characters.
---
Also note that you've got an 'SEO-fatal' error in your ErrorDocument directive -- One that ensures that your server will never return a 404 response code, and that will therefore cause your site to appear to have 'infinite duplicate content.' With the code you've got in place now, all non-existent URL requests will be redirected to your domain root with 302-Found status.
I'll make two suggestions, the first critical, and the second very important:
First, use only a local URL-path to specify error document locations:
ErrorDocument 404 /<something-anything>
The proper approach from both a usability and SEO standpoint is to use a 'real' error page with a short note explaining that the requested URL cannot be found, and then offering helpful text links to your home page, category pages, HTML site map, and site search facility -- as applicable. This informs the visitor of the error, and helps them to find what they were looking for. Combined with the previously-discussed correction it also prevents search engines from seeing the home page and error page as duplicate content.
ErrorDocument 404 /my-concise-but-very-friendly-and-helpful-404-error-page.xyz
---
If you are not doing so already, I strongly suggest testing your code by using a server headers checker, testing both URLs that should exist and those that should not. Then carefully examine your server's response and make sure that it is correct. If you had not posted here, I'd imagine that within a few months you'd have been posting in the search forums asking why your site won't rank for anything -- The "ErrorDocument" problem, as I said, is often 'fatal' to search engine ranking...
Keep always in mind that .htaccess is a server configuration file, and that a single typo, a tiny logic error, or a slight misunderstanding can destroy your business by ruining your search rankings. Therefore, it is wise to research the documentation and test very thoroughly.
Jim
As regards other feedback, I previously tried your suggestion of using local path to the page of the 404 but this did cause a duplication problem. It shows the url of the wrong page and switches to index. This resulted in a duplication problem because it was showing different page names in google and cacheing the index.
Whereas if I send it to the full path of the url (as now), then it simply switches to that page without showing the original url and also doesnt cause me any duplication problem.
So if I want to use your critical issue solution, I absolutely have to go with your important issue solution of having a 404 error page - which isnt suitable for my site at the moment.
Also bear in mind that my code was designed to send www. and all wildcard subdomains to non-www.
Ive not had any problems thus far (in well over a year of use) but im not sure what to do because you may well be right.
This could very well happen if you used "ErrorDocument 404 http://example.co.uk/" but should never happen if you used "ErrorDocument 404 /" -- The search engines will get a 404 response code, and know not to cache anything.
If you saw something different, then you either experienced a Google-glitch, or you have a third problem elsewhere that caused it. Specifying a full URL always results in a 302-Found server response, and never a 404-Not Found, as described in the Apache core ErrorDocument documentation.
I hope you'll re-consider the custom 404 error page idea, because you are playing quite close to the fire here as it is. If it might help tip the scales, you could always put a five-to-ten-second meta-refresh on the custom error page, so that the visitor gets 'redirected' automatically if he doesn't click a link.
Jim