Forum Moderators: phranque
Afaik, there are two threads on this already, one started by me in November 2003 [webmasterworld.com] when i first saw this problem, and another from December 2003 [webmasterworld.com].
Here we go:
The hypothetical question is:
When somebody asks for "www.domain.com/?some-string", how do i make the server return an error code (say, 404)?
Case 1: 410 Gone
One answer is this:
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* - [G]
----------------------
This rule checks to see if there is a query string present, and if this is the case a 410 Gone header is sent to the user-agent.
But ...this file isn't gone, it has never been there - i just won't allow these requests. Okay, next:
Case 2: 403 Forbidden
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* - [F]
----------------------
Here, the [F] tells the user-agent that this action is forbidden.
But... i had such a file once, it's just unavailable right now. Okay then:
Case 3: 404 Not found
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* /this-is-a-filename-we-dont-have-here.htm [L]
----------------------
This is not as straightforward as the other two. What happens is that when a query string is seen, an internal rewrite is made to a filename that doesn't exist. This results in a 404 Not Found error. The [L] tells the server that this is the last rule to be processed for this request.
Uhm... all those errors... i don't like them... can't i just serve up my homepage in stead, or a sitemap?. Of course, here we go:
Case 4: 301 Moved Permanently
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* [example.com...] [R=301,L]
----------------------
This one will serve up the domain "www.example.com" in stead of "whateveritis-with-a-query-string". It will be a permanent redirect.
Case 5: 302 Temporarily moved
If you only want a redirect for a limited amount of time, do this:
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* [example.com...] [L]
----------------------
- i wouldn't recommend doing it this way, but for completeness sake and all it's there. And there's a few other options that i'm not even going to mention here - the above cases should be enough for most purposes.
Wow, that's nice... but: i do use querystrings in my php scripts and now these don't work at all. Okay:
Conditional rules
Let's assume you want to serve a "401 Gone" - it will be easy for you to modify this to another type given the examples above.
Further, you have a folder called "my-scripts" in which you use, well scripts that accept querystrings.... duh. Here goes:
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !/my-scripts/
RewriteRule .* [G]
----------------------
The only new thing here is line two. The rest is exactly as in examples above. The example above instructs your server that the rule should only be carried out if the URL is not in your folder "my-scripts". This means that requests for URLs (1) and (2) below will get an error 401 while URL (3) will not:
(1) www.example.com/filename.htm?blabla
(2) www.example.com/folder/filename.htm?blabla
(3) www.example.com/my-scripts/filename.htm?blabla
The last part "Conditional rules" speaks of a "401 Gone" - this should have been "410 Gone". The code is OK, the wording is not.
Additions:
The conditional rule example above has this rule:
----------------------
RewriteCond %{REQUEST_URI} !/my-scripts/
----------------------
This means that the folder "my scripts" can be located anywhere in the tree, ie. it is valid for both (a) and (b) below:
(a) www.example.com/my-scripts/filename.htm?blabla
(b) www.example.com/1/2/3/my-scripts/filename.htm?blabla
If you want the folder to be the first, so that only (a) is valid, do like this:
----------------------
RewriteCond %{REQUEST_URI} !^/my-scripts/
----------------------
Note the little "^" thingy, meaning "start of string".
Specific filetypes, folders, etc...
You can easily modify it, eg. for a specific filetype (php, asp, ..) like this:
----------------------
(1) PHP
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !\.php
RewriteRule .* [G]
----------------------
----------------------
(2) ASP
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !\.asp
RewriteRule .* [G]
----------------------
----------------------
(3) PHP or ASP
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !\.php [OR]
RewriteCond %{REQUEST_URI} !\.asp
RewriteRule .* [G]
----------------------
----------------------
(4) PHP or ASP in "my-folder" (anywhere in tree)
----------------------
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !/my-folder/
RewriteCond %{REQUEST_URI} !\.php [OR]
RewriteCond %{REQUEST_URI} !\.asp
RewriteRule .* [G]
----------------------
I'm trying to prevent traffic coming in on links such as www.example.org/?someone-else.com They wind up at my index page, and I've noticed at least slurp getting a 200 on url's such as that, having gotten my index page as the result. It concerns me that things will be seen as duplicate content.
I found this thread, and it seems to address my problem, but I couldn't make things work in the .htaccess. What I tried was this method:
Case 4: 301 Moved Permanently----------------------
RewriteCond %{QUERY_STRING}!^$
RewriteRule .* [example.com...] [R=301,L]
----------------------This one will serve up the domain "www.example.com" in stead of "whateveritis-with-a-query-string". It will be a permanent redirect.
In the first place, am I confused about what is being accomplished? Does it address my situation? Secondly, the {QUERY_STRING}... does this get the www.example.org/?someone-else.com as shown in my example above, or part of it, and what part?
Any help would be appreciated.
Using the ethereal network analyzer, I am seeing a lot of HTTP "CONNECT" requests hitting my server, mostly originating from Taiwan. As I understand it (from the HTTP/1.1 RFC) this method has never been fully implemented, but has something to do with proxying? Anyhow, my server responds to these requests with a "200 OK" and serves up the index file. I can't tell if it's actually doing anything else or not.
I would feel more at ease if I could make the server reject any "CONNECT" requests, responding with a "403 Forbidden" or something else that's applicable, but I can't figure out how to do this. Is rewriting what I need? I thought that was just for protecting files, but this post talks about returning different error codes, which is what I am looking for.
Does this code go into httpd.conf or the .htaccess file? (Forgive the newb question.)
Many thanks.
(1) PHP
----------------------
RewriteCond %{QUERY_STRING}!^$
RewriteCond %{REQUEST_URI}!\.php
RewriteRule .* [G]
----------------------
Wouldn't the following RewriteCond be a little better?
RewriteCond ${REQUEST_URI}!\.php$
Wouldn't want to inadvertently match on:
[example.com...]
Yeah, it's a bit of an edge case. =)
This is what I ended up with to fix my problem:
--------------------
[pre]
RewriteEngine on
RewriteCond %{REQUEST_METHOD} ^CONNECT$ [OR]
RewriteCond %{REQUEST_METHOD} ^SEARCH$
RewriteRule ^.*$ %N [F]
[/pre] The [F] flag forces a response code of "403 Forbidden" to matching requests.
---------------------------
RewriteCond %{QUERY_STRING} !^$
RewriteRule .* http://www.example.com/ [R=301,L]
--------------------------- It says: "if there is a non-empty query string on a request, serve up the main domain with a 301 status code, whatever the request is". The {QUERY_STRING} matches everything after the questionmark, not the questionmark itself.
Beware that this one will also match: "example.com/folder/file.htm?query-string" and send that particular request to the root domain.
Note that there has to be a space in front of the exclamation mark ("!") - this board tends to eat those spaces. What exactly was it that turned out wrong for you?
Greenbirdweb: You can disallow the "CONNECT" method specifically, as you found out. AFAIK, you're right that it has to do with proxying, which you don't want. This can be done either in the config file or in .htaccess.
All the tips above is about the .htaccess file.