Forum Moderators: phranque

Message Too Old, No Replies

Subfolder file redirect to external site

         

peace

1:17 am on Sep 2, 2010 (gmt 0)

10+ Year Member



Hello there,
I want to redirect anyone looking for file main.php located in any subfolder to a external site. I'm trying to use the following but doesn't work.
What I'm missing?

RewriteRule ^([^/]+)/[^.]+\main.php [secondsite.com...] [R=301,L]

Thank you.

jdMorgan

1:30 am on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remove the spurious "[^.]+\" subpattern, escape all literal periods, and end-anchor the pattern:

RewriteRule ^([^/]+)/main\.php$ http://www.example.com [R=301,L]

If you're copying and pasting or guessing at regex patterns, don't -- it can be quite dangerous to the health of your site. There's a nice regular expressions reference in our Forum Charter.

If the purpose of this code is to protect against some exploit, be aware that typical bad-bots do not follow redirects. A 403-Forbidden response or even a 200-OK with a blank file might be more effective.

Jim

peace

2:34 am on Sep 2, 2010 (gmt 0)

10+ Year Member



Morgan, Thank you for your comments. I'm not really trying to block bad-bots but some suspicious remote request I'm getting like the following:

[mysite.com...]

I though if (because I don't have a file named main.php on my site) anyone looking for this file at any subdirectory it could be good idea if I redirect to another site.
What do you think about?

jdMorgan

12:57 pm on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you don't have a file by that name, then there is no need to redirect the request. Your primary concern is to conserve bandwidth and server resources.

So there are three choices:

1) Let the request generate a 404-Not Found
2) Generate a 403-Forbidden response
3) Rewrite the request to a smal or zero-byte file, and return 200-OK

For each kind of bad request, any one of these three might be the best solution. Some of these user-agents will "go away" if they see a 404-Not Found because you don't have the file(s) they are looking for, some may go away if they see a 403-Forbidden because they see that they have been detected, and some may go away if you always return a 200-OK for every file they request, because then they cannot tell if the file is really there or not. In this last case, you must be careful not to return a 403-Forbidden or 200-OK response to any legitimate search engine robot, however, because this also affects how they view your site, and returning a 200-OK response to them will likely affect how they spider and rank your site.

For case one, no code is needed.

For cases two and three:

# Return a 403-Forbidden response to requests for main.php, except from search engine robots
RewriteCond %{HTTP_USER_AGENT} !googlebot|bingbot|msnbot|slurp|teoma
RewriteRule ^([^/]+)/main\.php$ - [F]
#
# Return a tiny file and a 200-OK response to requests for main.php, except from search engine robots
RewriteCond %{HTTP_USER_AGENT} !googlebot|bingbot|msnbot|slurp|teoma
RewriteRule ^([^/]+)/main\.php$ /zero-byte-file.txt [L]

For case 3, you must create and upload a zero-byte (or small) file with the name used in the RewriteRule substitution path -- shown here as "/zero-byte-file.txt

The legitimate search engine robot user-agent strings shown are extremely-simplified; you may wish to collect and then detect the actual (and much longer) user-agent strings, qualify them with known robot IP addresses, and/or check the other HTTP request headers in order to avoid having either of the rules get fooled by spoofed user-agent strings.

You may also want to add more robot user-agent strings to that list, depending on what search engines send valuable visitor traffic to your site; my list is intended only as a simple example.

There is really no easy way to tell which method you should use for each "bad-bot" except to test them. I have noticed that if you feed the 200-OK response to robots who first request URL-paths like "/thisfiledoesnotexisthaha" that they go away immediately. Others, like Toata Dragostea and ZmEu are fairly stupid, and will continue requesting a list of files no matter what response you give them. So in their case, returning a short or zero-byte file may be the best approach.

Anyway, there is no simple answer and no "one size fits all" solution. Sometimes the best answer is to simply let the requests go 404, and sometimes more-complicated solutions are needed. You have to think about what each of these requests *means* and how it can effect your site, and decide accordingly. Different Webmasters with different sites may reasonably choose different solutions.

Jim