Forum Moderators: phranque
RewriteEngine On
# send the naked URL to the forums index
RewriteCond %{HTTP_HOST} ^forums\.example\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.forums\.example\.com$
RewriteRule ^/?$ "https\:\/\/forums\.example.com\/forum" [R=301,L]
RewriteRule ^\.well-known\/acme-challenge\/ - [L]
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
order allow,deny
deny from xxx.xxx.xxx.xxx
deny from xxx.xxx.xxx
allow from all
"https\:\/\/forums\.example.com\/forum"The quotation marks aren’t necessary, and you don’t need to escape anything in the target of a RewriteRule. Obviously it does no harm, since the rule has been working for a while, but it’s needless clutter. <RequireAll>
Require all granted
<RequireNone>
Require env unwanted
</RequireNone>
</RequireAll>In your text editor, globally change all occurrences of “Deny from” to “Require ip” and move them into the RequireNone envelope. Delete the “Allow from” and “Order” lines; they’re no longer needed. BrowserMatch facebookexternalhit unwanted BrowserMatch facebookexternalhit unwanted=facebook
BrowserMatch ^meta-externalagent unwanted=metaand so on. This is the simplest way to deny by user-agent.
BrowserMatch facebookexternalhit unwanted=facebook
BrowserMatch ^meta-externalagent unwanted=meta
<RequireAll>
Require all granted
<RequireNone>
Require env unwanted
Require ip xxx.xxx.xxx.xxx
Require ip xxx.xxx.xxx
Require ip xxx.xxx
</RequireNone>
</RequireAll>
RewriteEngine On
# send the naked URL to the forums index
RewriteCond %{HTTP_HOST} ^forums.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.forums.example.com$
RewriteRule ^/?$ https://forums.example.com/forum [R=301,L]
RewriteRule ^.well-known/acme-challenge/ - [L]
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# =====================================================================
BrowserMatch facebookexternalhit unwanted=facebook
BrowserMatch ^meta-externalagent unwanted=meta
<RequireAll>
Require all granted
<RequireNone>
Require env unwanted
</RequireNone>
</RequireAll>
# =====================================================================
I'm assuming the Require ip directive works for ranges, i.e. xxx.xxx.xxx or xxx.xxx ?Yes, it’s exactly the same as the old Allow/Deny rule, including partials like 45.72.0.0/17 (meaning 45.72.0.0 through 45.72.128.255).
every single facebook and meta crawler has disappeared!Tralala! Next time you check your access logs, you should see a lovely line of requests, each with a 403 response.
Should I have both of these? Or everything in one .htaccess file in the root directory?In your case, it should be possible to combine everything into a single htaccess. Sometimes it's necessary to have more than one if it's a hosting setup with a “primary”/“addon” structure, where most sites’ directories are inside the “primary” site directory. (Happily, mine has a “userspace” setup, where all sites are parallel. That lets me have a single htaccess covering access controls for all sites, and then site-specific htaccess for any individual sites.) That means the physical directories on the server, which may or may not align with what the user sees in the URL.
Options -Indexes
for the whole site, but then for some directories you do want to allow auto-indexing. Since you can’t use a <Directory> section in htaccess, you’d need to make a supplemental htaccess with just one rule in it. Same goes if you want part of the site to use a different ErrorDocument. And so on. But these are specific situations that can be dealt with as they arise. .well-known/acme-challenge/looks like something the host added when the site went https. It’s used by Let’s Encrypt; leave it as-is. The same applies to most things with leading dot; if something shows up and you know you didn’t put it there, check with the host.
Redirect permanent /releasenotes/ko https://forums.example.com/forum/forumdisplay.php?f=229
They may even be spoofing IPs for all I know about how this works.Probably not. It isn't like faking a caller ID, or putting a bogus return address on snail mail. If a request comes in with a fake IP, they won't receive the requested file, because it will be sent to the fake address. So IP spoofing is not likely unless someone's trying a DDOS attack, aimed at either your site or the real owner of the faked IP.
How could I block EVERYTHING that comes from "hwclouds-dns.com"?You should be able to do it with mod_authz_host, but this is not something I'm awfully familiar with; I stick with the numerical IP. The syntax is
Require host badname.com
Note that this requires your server to do a reverse-DNS lookup on every request, so you have to judge whether this extra work is worth it. Most offending colos or server farms have a finite number of IP ranges, which you could block in a few lines. ^(Redirect \w+ \S+?[^\\])\.
TO
\1\\.
and repeat until it rinses clean. (Because there might be more than one.) Similarly, get rid of all quotation marks as needless clutter: ^(Redirect.+)"
TO
\1
and repeat as needed. ^Redirect(?:Match)? (?:301|permanent) (\^)?/(.+)
TO
RewriteRule \1\2 [R=301,L]
The meat of these rules is the part expressed as (.+) which includes both pattern and target. (This is the part that caused me to run off in a panic, having forgotten mod_alias syntax.) If you have any temporary redirects (302 instead of 301) we’ll deal with them later. # 410 if needed
^Redirect(?:Match)? 410 (\^)?/(.+)
TO
RewriteRule \1\2 - [G]
# 403 if needed
^Redirect(?:Match)? 403 (\^)?/(.+)
TO
RewriteRule \1\2 - [F] # comment
Redirect permanent /forum/oasys/ann https://forums.example.com/forum/forumdisplay.php?forumid=155
Redirect permanent /forum/oasys/gen https://forums.example.com/forum/forumdisplay.php?forumid=140
Redirect permanent /forum/oasys/seq https://forums.example.com/forum/forumdisplay.php?forumid=156
Redirect permanent /forum/oasys/kar https://forums.example.com/forum/forumdisplay.php?forumid=142
Redirect permanent /forum/oasys/tut https://forums.example.com/forum/forumdisplay.php?forumid=154
Redirect permanent /
(include the leading slash / because mod_rewrite doesn't use it in this environment) to RewriteRule
(with trailing space). And then add [R=301,L]
(with leading space) to the end of each line. The R=301 part is the equivalent of “Redirect permanent”, and L is a necessary flag for most RewriteRules. RewriteRule ^replacing the former / with ^. This saves your server a few nanoseconds, because the element /forum/ happens to come at the very beginning of the URL. Consider it your very first step into the world of Regular Expressions.
Not to get in the way of lucy24's excellent explanations, but if you are blocking via IP CIDR, the one you mentioned (HUAWEI CLOUDS) is at 159.138.0.0/16
thought it would be easier to just blockIt might be easier for you in the short term, but not necessarily easier on the server, because of the extra DNS-lookup step. Are you sure about the huawei? It's noteworthy that the IPs seem to come from all around the globe, not just Asia. But I think we talked about this somewhere upthread, arriving at
Require host blahblah
SetEnvIf Accept-Language ^zh badlang
SetEnvIf Accept-Language ^zh-(tw|TW) !badlang
coupled with Require env badlang
Translated from Apache to English, that means “Don’t admit Chinese-speaking visitors unless they are from Taiwan.” I don't know why robots even bother to send an Accept-Language header, but I just checked and found tens of thousands of them over the past year on a not-very-big* site.
Are you sure about the huawei? It's noteworthy that the IPs seem to come from all around the globe, not just Asia.
IP Address Geolocation
159.138.110.12 or ecs-159-138-110-12.compute.hwclouds-dns.com is an IPv4 address owned by Huawei International Pte. LTD and located in Singapore, Singapore
I do not have this many legitimate viewersBesides, you’ll know when you look at the day’s access logs, because it’s rare for robots to request supporting files. Just pages and-that’s-all.
So, for example,
Redirect permanent /forum/oasys/tut https://forums.example.com/forum/forumdisplay.php?forumid=154
becomes
RewriteRule forum/oasys/tut https://forums.example.com/forum/forumdisplay.php?forumid=154 [R=301,L]
Redirect permanent /forum/m3 https://forums.example.com/forum/forumdisplay.php?forumid=191
RewriteEngine On
# send the naked URL to the forums index
RewriteCond %{HTTP_HOST} ^forums.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.forums.example.com$
RewriteRule ^/?$ https://forums.example.com/forum [R=301,L]
RewriteRule ^.well-known/acme-challenge/ - [L]
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# experiment
RewriteRule forum/m3 https://forums.example.com/forum/forumdisplay.php?forumid=191 [R=301,L]
RewriteEngine On
RewriteRule ^.well-known/acme-challenge/ - [L]
# experiment
RewriteRule forum/m3 https://forums.example.com/forum/forumdisplay.php?forumid=191 [R=301,L]
# send the naked URL to the forums index
#RewriteCond %{HTTP_HOST} ^forums.example.com$ [OR]
#RewriteCond %{HTTP_HOST} ^www.forums.example.com$
RewriteRule ^/?$ https://forums.example.com/forum/ [R=301,L]
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# all links to the vp folder:
RewriteRule ^vp(.*)$ https://example.com/vp/$1 [L,R=302]
# all links to the oasys vgui folder:
RewriteRule ^oasys/gui(.*)$ https://example.com/oasys/gui/$1 [L,R=302]
# all links to the m3 vgui folder:
RewriteRule ^m3/gui(.*)$ https://example.com/m3/gui/$1 [L,R=302]
# all links to the youtube shortcuts:
RewriteRule ^youtube(.*)$ https://example.com/youtube/$1 [L,R=302]
# the following 2 lines sends everything through https
# even though this is already in the root .htaccess file, it seems it needs to be here
# or locating directly to the forum doesn't have https, i.e. example.com/forum
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R,L]
# experiment to keep out MSN Bots
RewriteCond %{HTTP_REFERER} ^msnbot/2\.0b [NC]
RewriteRule .* - [F,L]
RewriteCond %{HTTP_REFERER} ^msnbot-media/1\.1 [NC]
RewriteRule .* - [F,L]
RewriteRule ^vp(.*)$ https://example.com/vp/$1 [L,R=302]
Doesn't this potentially create an endless loop? example.com/blahblah
(here I mean literally “blahblah”, i.e. any URL that doesn’t actually exist on the site), where do you end up? Try it in your browser and see what the address bar says.