homepage Welcome to WebmasterWorld Guest from 50.19.172.0
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Htaccess Query For .co.uk
murmy

5+ Year Member



 
Msg#: 3999669 posted 7:11 pm on Oct 1, 2009 (gmt 0)

Ive been using the following code for both hotlinking and redirecting the www.domain.net to just domain.net

It works fine for most domains but not domain.co.uk - anyone know why its not working and what modification i need to make?

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g¦gif)$ - [F]

ErrorDocument 404 http://example.com
DirectoryIndex index.html

[edited by: jdMorgan at 1:06 pm (utc) on Oct. 2, 2009]
[edit reason] example.com [/edit]

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3999669 posted 1:05 pm on Oct 2, 2009 (gmt 0)

With this rule:

RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]

you'd get www.example.co.uk/file redirected to http://example.co/file

The problem here is in establishing some "fixed point" in the requested hostname at which you can decide "everything before this is the subdomain, and everything after is the domain." That's a bit tough when dealing with .com/.net/.org versus .co.cc and "any number of subdomains" at the same time, so I'd suggest looking specifically for those two two-letter sequences and single occurrences of TLDs of 3 letters or more, as in:

RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[a-z]{3,})\.?(:[0-9]+)?$ [OR]
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[a-z]{2}\.[a-z]{2})\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]

or compacting that into a single RewriteCond (which I suggest only after debugging the above):

RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.([a-z]{3,}¦[a-z]{2}\.[a-z]{2}))\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]

Hopefully, that'll work, but you're fighting sub-pattern greediness and a very-ambiguous pattern-match at the same time here. Anything you can do to make the pattern more specific will be helpful to efficiency, such as limiting the accepted TLDs to exactly .com/.org/.net and/or .co.uk (or at least .co.cc), or limiting the number (depth) of accepted subdomains to that which your DNS settings actually support.

Replace all broken pipe "¦" characters above with solid pipes before use; Posting on this forum modifies the pipe characters.

---

Also note that you've got an 'SEO-fatal' error in your ErrorDocument directive -- One that ensures that your server will never return a 404 response code, and that will therefore cause your site to appear to have 'infinite duplicate content.' With the code you've got in place now, all non-existent URL requests will be redirected to your domain root with 302-Found status.

I'll make two suggestions, the first critical, and the second very important:

First, use only a local URL-path to specify error document locations:

ErrorDocument 404 /<something-anything>

Second, do NOT use your home page as an error document. By directly sending bad URL requests there, you will very likely confuse and disorient your visitors who click a bad link or make an error typing in a URL that leads to your site but doesn't resolve to a file that exists.

The proper approach from both a usability and SEO standpoint is to use a 'real' error page with a short note explaining that the requested URL cannot be found, and then offering helpful text links to your home page, category pages, HTML site map, and site search facility -- as applicable. This informs the visitor of the error, and helps them to find what they were looking for. Combined with the previously-discussed correction it also prevents search engines from seeing the home page and error page as duplicate content.

ErrorDocument 404 /my-concise-but-very-friendly-and-helpful-404-error-page.xyz

As with all error pages, the 404 error page should have absolutely minimal external dependencies. Consider that the more images, scripts, stylesheets, etc. that it includes, the more likely you'll get an 'infinite' 404 loop if one of those required resources goes missing. To prevent such problems, I suggest using *very* simple error documents, and if the 'look and feel' and 'visual consistency with the rest of the site' have to suffer, so be it.

---

If you are not doing so already, I strongly suggest testing your code by using a server headers checker, testing both URLs that should exist and those that should not. Then carefully examine your server's response and make sure that it is correct. If you had not posted here, I'd imagine that within a few months you'd have been posting in the search forums asking why your site won't rank for anything -- The "ErrorDocument" problem, as I said, is often 'fatal' to search engine ranking...

Keep always in mind that .htaccess is a server configuration file, and that a single typo, a tiny logic error, or a slight misunderstanding can destroy your business by ruining your search rankings. Therefore, it is wise to research the documentation and test very thoroughly.

Jim

murmy

5+ Year Member



 
Msg#: 3999669 posted 6:15 pm on Oct 2, 2009 (gmt 0)

Thanks for your feedback. The .co.uk code is achieving the desired result.

As regards other feedback, I previously tried your suggestion of using local path to the page of the 404 but this did cause a duplication problem. It shows the url of the wrong page and switches to index. This resulted in a duplication problem because it was showing different page names in google and cacheing the index.

Whereas if I send it to the full path of the url (as now), then it simply switches to that page without showing the original url and also doesnt cause me any duplication problem.

So if I want to use your critical issue solution, I absolutely have to go with your important issue solution of having a 404 error page - which isnt suitable for my site at the moment.

Also bear in mind that my code was designed to send www. and all wildcard subdomains to non-www.

Ive not had any problems thus far (in well over a year of use) but im not sure what to do because you may well be right.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3999669 posted 10:29 pm on Oct 2, 2009 (gmt 0)

> This resulted in a duplication problem because it was showing different page names in google and cacheing the index.

This could very well happen if you used "ErrorDocument 404 http://example.co.uk/" but should never happen if you used "ErrorDocument 404 /" -- The search engines will get a 404 response code, and know not to cache anything.

If you saw something different, then you either experienced a Google-glitch, or you have a third problem elsewhere that caused it. Specifying a full URL always results in a 302-Found server response, and never a 404-Not Found, as described in the Apache core ErrorDocument documentation.

I hope you'll re-consider the custom 404 error page idea, because you are playing quite close to the fire here as it is. If it might help tip the scales, you could always put a five-to-ten-second meta-refresh on the custom error page, so that the visitor gets 'redirected' automatically if he doesn't click a link.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved