Forum Moderators: phranque

Message Too Old, No Replies

Redirect all unauthorized domain aliases to actual domain?

         

georgec

8:22 pm on Apr 3, 2009 (gmt 0)

10+ Year Member



Hi guys:
I have a problem where hundreds of unauthorized domains on the web are set up to act as aliases to my actual domain, so domains like:

bad.com
bad2.com
bad3.com

all load my site with NO redirection (so bad.com loads my site while remaining bad.com in the user's address bar). I've read this is for the purpose of stealing page rank.

With that said, I need to basically redirect all domain aliases to my actual domain. So far I have:


RewriteCond %{HTTP_HOST} !^(www\.mysite\.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]

This seems to work for URLs with no query strings, but for those with, such as:

http://www.badsite.com/forums/forumdisplay.php?s=&daysprune=&f=29

I get redirected to the 404 error page on mysite.com. This even happens if I replace badsite.com with mysite.com above and leave out the "www" part. Is there something I'm missing in my mod_rewrite code above to take into account basically all types of URLs?

Thanks, I could really use some help getting this working, since right now there are literally hundreds of domain alias leeching on my actual domain.

[edited by: jdMorgan at 4:43 am (utc) on April 4, 2009]
[edit reason] de-linked [/edit]

g1smd

8:50 pm on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's nothing in the code you supplied to cause that effect.

There's some other issue making that happen.

Err, hang on. What happens if you remove the question mark from the Condition?

Surely, none of these domains are listed in your httpd.conf configuration file?

georgec

9:34 pm on Apr 3, 2009 (gmt 0)

10+ Year Member



Hi:
Thanks, and nope, those bad domains are definitely not in my httpd.conf. Here's what happens with the code I posted above. If I enter:

http://www.mysite.com/adirectory/ -> no redirection, fine
http://mysite.com/adirectory/ -> redirects to mysite.com's 404 page
http://www.badsite.com/adirectory/ -> redirects to http://www.mysite.com

In other words, case #2 and #3 all fail, though differently depending on whether the domain is mine or the unauthorized one. Removing the question mark from the rewrite code doesn't make a difference.

[edited by: jdMorgan at 4:44 am (utc) on April 4, 2009]
[edit reason] de-linked [/edit]

jdMorgan

4:33 am on Apr 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

A few points about this code:

First, it is obviously intended for use in .htaccess as opposed to a server config file.

Second, it is "query string neutral" -- Query strings will be passed through this rule unchanged.

Third, and a bit more detailed, is that the hostname pattern in the RewriteCond is enclosed in parentheses with a "?" at the end to mean, " If NOT(www.example.com OR blank) ". This allows the code to function properly on IP-address-based servers, which can accept HTTP/1.0 requests. HTTP Host: headers are not sent with true HTTP/1.0 requests, and the %{HTTP_HOST} variable will therefore be blank. Therefore, if a provision is not made to disable the rule when the Host header is absent in the request, the result would be an "infinite" redirection loop. Again, this is only needed on IP-based servers; Name-based virtual servers don't need this provision because they cannot be reached by true HTTP/1.0 requests.

Because there is nothing wrong with this code, it is very likely that either your server is configured in such a way that it does not direct non-www/subdirectory requests to the same filepath as www/subdirectory requests, or that some other module, setting, or script is interfering with the action of your code. Some things to look into are, mod_negotiation, mod_speling, mod_dir's DirectorySlash, and AcceptPathInfo. If you don't need these, disable them. It is also possible that a script is responsible for sending the request awry after the non-www request has been redirected correctly.

If none of this helps, please post your Apache server version and tell us where you are putting this code.

Jim

georgec

7:18 am on Apr 4, 2009 (gmt 0)

10+ Year Member



HI Morgan:
I appreciate your further insight. Running some more tests, I may be on to something. I was placing the rewrite code in a sub directory from /www to isolate its effects, and it seems the code matches everything including the sub directory itself and replaces that with www.mysite.com. So placing the code in say [mysite.com...] if I go to:

[mysite.com...] ->redirects to [mysite.com...]

I assume from this that the rewrite code must be placed inside the .htaccess file of the root /www directory right? One issue with that is that in each my sub directories there exists its own .htaccess file with rewrite code to stop hotlinking:


RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mysite.com/.*$ [NC]
RewriteRule \.(gif¦jpg¦js¦mid¦css)$ - [F]

When I tried placing the domain alias rewrite code in the root /www directory, it doesn't seem to affect any of the sub directories. It seems the mere precense of a .htaccess file in the sub directory with the line:


RewriteEngine on

causes that directory to not inherit the effects of the mod rewrite code defined in the root /www directory. Is that normal, and if so, how would I get around that limitation?

georgec

3:25 am on Apr 5, 2009 (gmt 0)

10+ Year Member



Looking at RewriteOptions [httpd.apache.org], it seems "inherit" should theoretically address the non sub directory inheritance issue. Editing the server's httpd.config file is unfortunately not an option for me right now. Is there a way to just get the .htaccess file inside a sub directory to also inherit the mod_rewrite rules of its parent directories? I tried using:


RewriteOptions inherit

inside the sub dirs, but it doesn't have any effect.

jdMorgan

4:07 pm on Apr 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When dealing with referer-based access control code, you must completely-flush your browser cache between tests. Otherwise, a previously cached server response will be retrieved from the browser cache instead of from your server, hopelessly confusing your test results.

Jim