Forum Moderators: phranque

Message Too Old, No Replies

Correctly Force https and no www in .htaccess File

         

Tater

8:45 pm on Oct 7, 2018 (gmt 0)

5+ Year Member



Hi all,

By way of introduction, the only reason I am a webmaster is that I wanted a domain so I could have permanent email addresses and put some pictures up to be shared with friends and family. I slapped a simple webpage together back in 2007 with Komposer. I just switched hosts, which included a straight copy/paste of my files from old to new. So, all my existing issues have been perfectly replicated in my new home.

Of course, since I am now actually paying attention, I have decided to undertake some maintenance and fixing of longstanding issues. And new ones...

I have just joined this forum after hunting around on it, and the web generally, for answers to the issues I am having with my domain. I haven't found the right information to fix my problems. Some clues, yes, but then I have new questions that aren't answered...

Among my issues is that I found that 40-50% of my hits are from an IP in Moldavia trying to hit a nonexistent page on my site or showing as referred by that nonexistent page. I attempted to block them through the cpanel and, when that didn't seem to work, I asked my host to do something. They said they updated my .htaccess file. The hit numbers have lowered, but are now coming from other IPs for, or referred by, the same nonexistent page.

I also wanted to force https and tried to do that through a redirect in cpanel. Which isn't working...

Now my email clients are suddenly having varying success pulling emails. I suspect that I have caused these issues by trying to change http to https and www.example.com to example.com. Before I blindly try to revert this, I decided to get actual counsel.

Firstly, I suspect that my .htaccess file is screwed up, or at least that it is not correct to achieve my intended outcomes. I am not very familiar with the language used here, but I think the order of commands may be wrong. And I suspect that the actual commands are wrong for the http to https and www to .example.com by redirect.

Also, I am not sure why Frontpage is referenced unless it is a universal fix for that product's "IE is the only browser that exists" issues. I have never had anything associated with it on my domain.

Here is what my .htaccess for public_html looks like: (Both example and elpmaxe are used for obfuscation.)


RewriteEngine on

# -FrontPage-
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
<Limit GET POST>
#The next line modified by DenyIP
order allow,deny
#The next line modified by DenyIP
#deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName example.com
AuthUserFile /home/elpmaxe/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/elpmaxe/public_html/_vti_pvt/service.grp

<Files 403.shtml>
order allow,deny
allow from all
</Files>

order allow,deny
deny from 188.138.188.34
deny from 178.159.37.61
deny from 93.188.34.197
deny from 89.109.2.77
deny from 31.130.2.79
deny from 2.224.128.114
deny from 51.15.88.249
deny from 93.100.128.3
deny from 91.215.106.53
deny from 104.131.214.218
deny from 91.197.174.108

# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php70” package as the default “PHP” programming language.
<IfModule mime_module>
AddType application/x-httpd-ea-php70___lsphp .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit
RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-SSL} off
RewriteCond %{HTTP_HOST} ^example\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} !^/\.well-known/cpanel-dcv/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/(?:\ Ballot169)?
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteRule ^/?$ "https\:\/\/example\.com\/" [R=301,L]


Any suggestions or guidance is gratefully received. Thanks for taking the time to read this whole thing.

TorontoBoy

9:37 pm on Oct 7, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I'll address the errant Moldovians. To find out who is visiting your site you should download your raw access log, available on cPanel. This will tell you exactly who is visiting you and which bots (software) are attacking you. If you have a flat file html web site, they cannot damage your site but are annoying. You will need to decide how much effort you want to put into killing them.
deny from a.b.c.d

This statement will deny a single IP address, and is almost never used. I usually deny all 256 addresses of the last digit.
deny from a.b.c.0/24
will deny a.b.c.0 to a.b.c.255, or 256 IP addresses. The /24 is called CIDR [en.wikipedia.org...] format, very handy to learn.

From your raw access log you can look up their IP addresses and see where they are from. You can then assess how big the ban range you wish to use. There are nothing wrong with your current IP bans, just that they are too narrow to be useful.

lucy24

10:03 pm on Oct 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Of course, since I am now actually paying attention, I have decided to undertake some maintenance
Wheee! That's always fun, and you gotta start somewhere.

But if you personally fix your htaccess, can you be certain it will remain the way you fixed it? Or will your host barge in and replace the whole thing at any random time without telling you?

The quoted list of RewriteConds don't seem to belong with the single RewriteRule given, since the body of the rule explicitly refers only to requests for the root. Which brings us to ...
RewriteRule ^/?$ "https\:\/\/example\.com\/" [R=301,L]
If this is, litteratim, what your hosts wrote, do not allow them to make any more RewriteRules for you. (I don't much care for the form of the domain-name-canonicalization conditions either, but one thing at a time.)

Tater

10:28 pm on Oct 7, 2018 (gmt 0)

5+ Year Member



I'll address the errant Moldovians. To find out who is visiting your site you should download your raw access log, available on cPanel. This will tell you exactly who is visiting you and which bots (software) are attacking you. If you have a flat file html web site, they cannot damage your site but are annoying. You will need to decide how much effort you want to put into killing them.


I read a response that you had made in another thread along these lines and I grabbed a log to look at. Here is what led me to blocking good ol' '188.34:

188.138.188.34 - - [03/Oct/2018:05:02:14 -0700] "GET /delineament-door-foo/ HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 Kinza/4.7.2"

188.138.188.34 - - [03/Oct/2018:05:02:15 -0700] "GET / HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 Kinza/4.7.2"

188.138.188.34 - - [03/Oct/2018:05:02:27 -0700] "GET /delineament-door-foo/ HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"

188.138.188.34 - - [03/Oct/2018:05:02:27 -0700] "GET / HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"

188.138.188.34 - - [03/Oct/2018:05:02:31 -0700] "GET /delineament-door-foo/ HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.51"

188.138.188.34 - - [03/Oct/2018:05:02:31 -0700] "GET / HTTP/1.0" 404 - "http://example.com/delineament-door-foo/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.51"


He likes me! He really likes me!

Anyway.

I have information, I just don't know what to do with it beyond blocking the IP address. All the IPs shown in .htaccess are guilty of the same thing. Same web page, etc. It really is not a big issue, as you noted, my page is flat html. It just annoys and intrigues me at the same time.

More pressing is the issue with my email. Hopefully, I can find a resolution more nuanced than simply reverting back to http and both www and not www.

[edited by: phranque at 1:37 am (utc) on Oct 9, 2018]
[edit reason] some acronyms are worse than others [/edit]

Tater

11:31 pm on Oct 7, 2018 (gmt 0)

5+ Year Member



But if you personally fix your htaccess, can you be certain it will remain the way you fixed it? Or will your host barge in and replace the whole thing at any random time without telling you?


They don't seem to be proactively interested in my site. They even copied over my dormant second domain as an add-on domain (which isn't included in my hosting package...) So, I am not worried about them doing anything extra.

The quoted list of RewriteConds don't seem to belong with the single RewriteRule given, since the body of the rule explicitly refers only to requests for the root. Which brings us to ...
RewriteRule ^/?$ "https\:\/\/example\.com\/" [R=301,L]

If this is, litteratim, what your hosts wrote, do not allow them to make any more RewriteRules for you. (I don't much care for the form of the domain-name-canonicalization conditions either, but one thing at a time.)

The only part that the new hosting company put in, I think, is the IP blocks. The rest is either from cPanel, because I used the GUI (that redirect you point out was definitely cPanel), or the prior hosting company.

That is part of my concern. What little I understand leads me to believe the redirects should be first in the file. But they aren't and they aren't working. And I just don't know if that's because they were created wrong (As you imply), are placed wrong (could be both, both is good) or if it's because the universe is still annoyed with me about that thing that I didn't do when I was supposed to do that thing when I was at that place where that thing was supposed to be done.

In any event, is a redirect the correct way to force https, or isn't it? If it is, what should the code actually look like? And should it be the first set of commands in the .htaccess file? If it isn't the right way, what is the correct way to do it?

I have read dozens of threads, webpages and guides and I still have no idea how to do this. I have seen code snippets that don't look like the ones I have, but I don't understand them so I don't know if they are the correct ones. The guides I have read say that I can do it through cPanel, but that is obviously incorrect. I did do it through cPanel and it does not work. It's rather frustrating.

TorontoBoy

1:51 am on Oct 8, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



With your raw access log you now have forensic information on your attacker. The issue is, what to do with it. Firstly, they are asking for something you do not have, and asking a lot. This means they are looking for something you do not have. This is called reconnaissance, the preparatory step to a site hack.

Go to whois.com and put in the IP address. You will find:
inetnum: 188.138.188.0 - 188.138.188.255
netname: STARNETMD
descr: SC STARNET SRL
descr: Chisinau, Moldova
descr: Region: Chisinau
country: MD

At the bottom of the page you will see: 188.138.128.0/17, which means their range is 188.138.128.0 - 188.138.255.255.

To start I would block 188.138.188.0/24, and then monitor the rest of the range. If, for example tomorrow they use 188.138.189.* you will then expand your block range for this company. They will likely use more of their range in the future, but you now know where they live. If you truly dislike them and you do not care if anyone in that range visits your site, you could use 188.138.128.0/17, ban the whole shebang.
deny from 188.138.188.0/24
or
deny from 188.138.128.0/17

You could look for something unique in the user agent (UA), or the referrer (in this case, blank) in order to ban them. Banning by UA or referrer is much more efficient.

What are they looking for and why, I do not know. They have programmed their bot to probe your site until they find a vulnerability.

lucy24

2:26 am on Oct 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In any event, is a redirect the correct way to force https, or isn't it? If it is, what should the code actually look like? And should it be the first set of commands in the .htaccess file? If it isn't the right way, what is the correct way to do it?
A redirect is the only way to force https. How else would you tell the visitor “This request is unacceptable and you need to make a fresh request using the https protocol”? You can’t quietly rewrite to HTTPS the way you can prettify an URL.

The HTTPS redirect is typically consolidated with the older domain-name-canonicalization redirect--the one that forces either with or without www, according to preference. (Hmm, something tells me you haven't got one, but you will.) It should be the LAST of all your redirects, because the overall ordering of redirects is from most specific to most general.

Start with any specific redirects--the ones that go from example.com/old-silly-url to example.com/new-sensible-url. Most sites accumulate these over the years, though it is theoretically possible not to have any.

Then, second-to-last, is your index redirect, the one that goes from example.com/directory/index.html (or whatever extension you use) to example.com/directory/ alone. Exact form of this redirect will depend on your site structure, but it's generally the second-most-generic of your redirects. (Something tells me you haven't got one of these either, but we can talk more about them.)

All of these redirects will include a complete protocol-plus-hostname in the target, like
https://example.com/directory/pagename.html
That's why they come before the HTTPS redirect; no need to redirect people twice when you can do it in one fell swoop.

This leaves only the requests that were perfectly correct except that they said http when you wanted https, and/or www.example.com when you wanted example.com:
RewriteCond %{REQUEST_URI} !^/robots\.txt
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) https://example.com/$1 [R=301,L]
The business about robots.txt is something I discovered the first time I moved a real site (as opposed to my test site that tries everything first) to https. Some legitimate robots seem to get confused when they meet a redirect after requesting robots.txt, so I decided it's safer to serve robots.txt only at the originally requested hostname and protocol, even if it’s otherwise wrong for the site.

Then there are two conditions separated by [OR], meaning that only one of them has to match. (Pro tip: List them in order of most-likely-to-succeed, and the server can get out of there a nanosecond faster. Order in [OR] otherwise makes no difference.)

By convention we say ^(example\.com)?$ instead of the simpler ^example\.com$ with nothing optional. On shared hosting, or anything involving a <VirtualHost> envelope, it won’t make any difference.

Unlike most redirects, this one will apply to all requests, not just to specific filetypes, directories or names. (Optionally exempting robots.txt as explained above.)

The guides I have read say that I can do it through cPanel,
Funny how everyone assumes that all hosts use cPanel. Mine doesn’t; it’s one of the bigger hosts, so they roll their own. I think you can do a few access-control things in the Control Panel, but in general you proceed directly to making your own htaccess.

keyplyr

4:23 am on Oct 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if you personally fix your htaccess, can you be certain it will remain the way you fixed it? Or will your host barge in and replace the whole thing at any random time without telling you?
Having a host that changed my htaccess would be a deal breaker and cause me to change hosts very quickly... and I *hate* changing hosts.

But all hosts have things that are irritating; just depends on if you can live with them.

Tater

4:32 pm on Oct 8, 2018 (gmt 0)

5+ Year Member



To start I would block 188.138.188.0/24, and then monitor the rest of the range. If, for example tomorrow they use 188.138.189.* you will then expand your block range for this company.


While I was fixing the redirect from lucy 24's guidance (see post below), I changed that IP to ...0/24. And when I checked IP Blocker in cPanel it surprised me by showing the change. I was wondering if I was going to have to remove or modify what IP Blocker had. The answer is obviously that IP Blocker shows what is there as well as allowing you to add or remove IPs. Cool.

So, that led to the obvious question: What about those redirects I manually put into the .htaccess file? I had used the Redirects interface to remove the non-working ones before I added the new ones manually.

I opened Redirects in cPanel and there the new one was. Nice.

Now, it is time to sit back and monitor my traffic stats and log. Thank you for the advice and kicking the dust off my long neglected memories of IP addressing.

And, to think: Once upon a time money was actually spent to teach me some of this stuff...

Tater

5:16 pm on Oct 8, 2018 (gmt 0)

5+ Year Member



lucy24,

Thank you very much. The explanation and redirect code did the trick. I really appreciate both.

I went into cPanel and used Redirect to remove the redirects that were there. My concern was that it would think it had those in place and some years from now I would try to use Redirects and break something. As it turns out, Redirect shows what is there, whether it put it there or not. So after I pasted in the redirect code that you provided, it showed in Redirects.

The code itself works perfectly. I went to my site by all the variables and ended up at the https://example.com just like I wanted.

Now, to see if that also sorts out the weirdness with my mail clients.

As to the guides and cPanel, that is the funny thing: I actually have cPanel, but the utilities don't appear to always do what they are meant to. Using Redirect created the non-working redirects that you gave me code to replace. Both appear exactly the same in the Redirect dialog in cPanel. So it knows what the clean code does, it just doesn't create it. LoL.

Deny IP seems to do a better job. I think I was too impatient when I asked my hosting company to help with the IP blocks. Looking at it now, with a little more thought, I think they looked at the .htaccess and went "The dummy already did what he's asking us to do. Just tell the moron that we took care of it."

I think what I am going to do now, while I wait and see how my mail works out, is look into the rest of the commands used in the .htaccess file and see if I can wrap my head around what it is calling for with some of the commands.

My file very basic because I haven't much going on. But I would like to tidy it up.

Tater

5:27 pm on Oct 8, 2018 (gmt 0)

5+ Year Member



Having a host that changed my htaccess would be a deal breaker and cause me to change hosts very quickly... and I *hate* changing hosts.

But all hosts have things that are irritating; just depends on if you can live with them.

I would think that the less you pay for hosting, the less likely they would be to be "helpful".

I've paid about $30/yr for the next three years of hosting with these folks. I'm not expecting a lot of proactive individual attention. They have been very responsive to direct requests, and helpful, but they can't be nursemaiding me for that pittance.

Tater

5:41 pm on Oct 8, 2018 (gmt 0)

5+ Year Member



As I was rummaging around I looked inside .htaccess~, which I assume is a backup of .htaccess, dated from April 2011. This is a taste:

17 RewriteEngine on
18 RewriteRule ^check_work/$ ./barb/pygmalion.php?checkwork
19 RewriteRule ^barb-(.*)/$ ./barb/pygmalion.php?$1
20 RewriteRule ^pygmalion-(.*)/$ ./barb/pygmalion.php?$1
21 RewriteRule ^cetera-(.*)/$ ./barb/pygmalion.php?$1
22 RewriteRule ^auntie-(.*)/$ ./barb/pygmalion.php?$1
...

416 RewriteRule ^valhalla-(.*)/$ ./barb/pygmalion.php?$1
417 RewriteRule ^amplified-(.*)/$ ./barb/pygmalion.php?$1
418 RewriteRule ^notecards-(.*)/$ ./barb/pygmalion.php?$1


Was pygmalion a virus or something circa 2011? Nothing is coming up in a web search.

lucy24

8:11 pm on Oct 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What the heck are/were those rules even supposed to do? They're all rewriting to something else--but to what? What on earth does a leading “./” in a target even mean? And where are the [L] flags? Did it simply get lost in posting (the same post that inserted the line numbers which are obviously not present in the real thing)? Since all the patterns are mutually exclusive, I can’t see any reason to omit a flag.
Was pygmalion a virus or something circa 2011?
If so it was an exceedingly rare one, since site search turns up nothing. I think you said at the outset that you've had this site for ages, so it isn't just someone else's htaccess from the same domain name in 2011.

:: shrug ::

Tater

11:01 pm on Oct 8, 2018 (gmt 0)

5+ Year Member



I assume it was created by something or someone at my old host, because I had the domain since 2007 at that host.

I added the line numbers manually to show how many lines the commands went on for.

Odd.

TorontoBoy

12:19 am on Oct 9, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Always have backed up versions of your htaccess. If you make a spelling mistake your site might go down. I always upload a new version of my htaccess and save the old htaccess as a different name, so if it does not work I can quickly reverse the changes. I never delete my old htaccess files. This saves me from having a panic attack if something goes wrong. After I upload a new htaccess I always check my site with a different browser, one where the cache is always dumped just before I reload my site.

You could check your htacess at [htaccesscheck.com...] but it is not bullet-proof, but will check basic syntax.

You can also add comments to your htaccess by putting a "#" as the first character. Comments are always good because if you come back to your htaccess after a couple of months you may forget why you did something.

phranque

1:25 am on Oct 9, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Now my email clients are suddenly having varying success pulling emails. I suspect that I have caused these issues by trying to change http to https and www.example.com to example.com. Before I blindly try to revert this, I decided to get actual counsel.

the email problems should have nothing to do with apache unless you are also hosting your webmail (i.e. using HTTP protocol vs SMTP & IMAP/POP3) server.

when you switched web hosts did that affect your email hosting in any way?

assuming your email clients are trying to access SMTP and IMAP/POP3 servers, the problem is probably a matter of who and where your (new?) email service is hosted and whether or not your DNS is properly configured for that.

lucy24

1:43 am on Oct 9, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I always upload a new version of my htaccess and save the old htaccess as a different name, so if it does not work I can quickly reverse the changes.
How admirably thorough. I only keep copies if I've made massive, wholesale changes. Now, what I do consistently do is this: When I've changed htaccess, I leave the text file open while uploading and confirming that the side doesn’t crash. If I do hit a 500 error, meaning some kind of syntax goof, I switch on the text editor's Show Changes highlighter, making it easy to home in on possible problems.

Tater

8:59 pm on Oct 9, 2018 (gmt 0)

5+ Year Member



the email problems should have nothing to do with apache unless you are also hosting your webmail (i.e. using HTTP protocol vs SMTP & IMAP/POP3) server.

when you switched web hosts did that affect your email hosting in any way?

assuming your email clients are trying to access SMTP and IMAP/POP3 servers, the problem is probably a matter of who and where your (new?) email service is hosted and whether or not your DNS is properly configured for that.

I spent time this afternoon on the email. I had been using a direct address for my email servers (server#.hostco.com:2095) for years because mail.example.com just wouldn't work and it was too much of a bother at the time to figure out. A couple months ago I finally switched to mail.example.com and that was working.

I got the new host a couple weeks ago and they use different ports, so I had to change those on my mail clients. It seems that when I went into Opera and changed the port numbers for the new host I used the IMAP ports instead of the POP3 ports. The accounts are set up in Opera as POP3 so that is what caused that weirdness. Now I have set the correct ports and created an IMAP for one of the accounts and thousands of emails from the server have populated. I have some work to do.

Now the weirdness with my webmail is a bit harder. If I go to webmail.example.com I get the login screen which has a warning
The security token is missing from your request
but I can log in and then I can choose horde, roundcube or SquirrelMail.

But horde gives me this error:
A fatal error has occurred
Session cookies will not work without a FQDN and with a non-empty cookie domain. Either use a fully qualified domain name like "http://www.example.com" instead of "http://example" only, or set the cookie domain in the Horde configuration to an empty value, or enable non-cookie (url-based) sessions in the Horde configuration.
Details have been logged for the administrator.

roundcube seems to be working ok and SquirrelMail was refusing to let me past some setup last night, but is now showing me mail.

I looked for guidance on the horde message but all I could find was a tangentially related message on the cPanel Forum. I don't use the webmail myself, usually, but my daughter does. The instructions seem fairly forthright in the error message, but I have no idea how to do those things. I don't see anything in cPanel related to horde and the only thing that stands out to me in the file directory is .cphorde which doesn't have much in it except logs, from what I see. The configuration files are somewhere else.

If I go to example.com:2096 I get the login without error message and all three webmail applications work. I suppose the doctor's infamous response of "Well then, don't do that." comes to mind, but I am trying to get things working correctly after a decade of neglect.

phranque

9:14 pm on Oct 9, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I spent time this afternoon on the email.

since your email- and webmail-related issues are irrelevant to this (Apache) forum, i would suggest that you start a new thread in a relevant forum to discuss those problems - perhaps the Website Technology Issues [webmasterworld.com] forum.

Tater

9:15 pm on Oct 9, 2018 (gmt 0)

5+ Year Member



Always have backed up versions of your htaccess. If you make a spelling mistake your site might go down
I copied the original in a txt file on my computer when I created the obfuscated one for here. And I made a copy of the corrected version too. I don't have anything major riding on the webpages working, but I do value my time. And, fixing it blind would be a timesink.

You could check your htacess at [htaccesscheck.com...] but it is not bullet-proof, but will check basic syntax.
Syntax checks out ok!

You can also add comments to your htaccess by putting a "#" as the first character. Comments are always good because if you come back to your htaccess after a couple of months you may forget why you did something.
That is wonderful advice. I had thought to do that a little while after I made the changes. Right after I had gone back in and annotated what the block of commands did, I came here and read this comment. Spot on.

Tater

9:32 pm on Oct 9, 2018 (gmt 0)

5+ Year Member



since your email- and webmail-related issues are irrelevant to this (Apache) forum, i would suggest that you start a new thread in a relevant forum to discuss those problems - perhaps the Website Technology Issues [webmasterworld.com] forum.


Thanks, I have started a thread on the mail issue over there: [webmasterworld.com...]

Tater

11:59 am on Oct 10, 2018 (gmt 0)

5+ Year Member



The thought occurred to me this morning, as I reviewed my visitor logs, that I have an inordinate number of hits looking for my aforementioned nonexistent page allegedly coming from a referrer that claims to be that very nonexistent page. So, I thought: "If I block the nonexistent referrer, then I won't have to block the individual random IPs that keep looking for the nonexistent page."

So I looked up the code for blocking access by referrer and I could find no examples of the code showing a block of only a specific page on a domain. They all showed blocking the whole domain. The targets were always represented as "www.example.com", never "example.com/that-page/". Since my target is a spoofed reference to a nonexistent page my own domain, I need a more granular block. Wouldn't blocking all referrals from my own domain break all my internal page links?

I found the tool here on WebmasterWorld [freetools.webmasterworld.com ] and it didn't throw a flag at "mydomain.com/that-page/". So I figured I'd give it a go.

I added the following code to my .htaccess file. I added some referrers that were also looking for the nonexistent page, too. My testing, so far, shows no problem with links from my domain working properly. It's too early to know if it is blocking the referrals yet. Do any of you see any issues with this code that I should be aware of?
# Block spammy referrer
RewriteCond %{HTTP_REFERER} ^http://.*mydomain\.com/that-page/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example\.foo\.foofoo\.pw [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*in\.foofroo\.foo\.foo\.in [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*foo\.confoos\.foo\.in [NC]
RewriteRule .* - [F]


I appreciate all of your assistance with this project. I am learning quite a bit. Thank you.

lucy24

6:05 pm on Oct 10, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You don't actually need the ^http business--especially in conjunction with .* which makes it all wildly inefficient. And the rule itself is wrong: never put something in a Condition that can go in the pattern of the rule itself.

RewriteCond %{HTTP_REFERER} \b(example|otherexample|thirdexample)\b
RewriteRule ^that-page - [F]
If the request isn't for that-page, the server does not need to backtrack and evaluate conditions at all.

The \b anchor means “word boundary”. This saves you worrying that, along with blocking example.com, you're also blocking goodexample.com or examplefriend.com which are perfectly benign.

Don't use [NC] unless you have genuinely seen more than one casing of the offending domain. It's just extra work for the server, as it first has to flatten the casing of the request before running the comparison. All those picoseconds can really add up.

Tater

12:40 am on Oct 11, 2018 (gmt 0)

5+ Year Member



lucy24, you need to remember that I am basically a moron. I don't understand what ^ and .* actually mean or what they do. I just know that they are in the code and must be commands or switches. They might as well be runes to me. I just copied code that was presented as being the solution on [whoishostingthis.com...] and plugged in the addresses.

I don't I understand, but at least I have questions. This is what I think you are saying the command should look like:
RewriteCond %{HTTP_REFERER} \b(mydomain\.com/that-page/|example\.foo\.foofoo\.pw|in\.foofroo\.foo\.foo\.in|foo\.confoos\.foo\.in )\b
RewriteRule ^that-page - [F]

Each of the referrers should be listed without preamble (no http or .*?) and separated with a |? If I get more referrers do I just keep adding them to the chain of |? Then the RewriteRule has a ^, but what is "that-page" supposed to be? Is that the nonexistent webpage that is being called for by the referrers? Is it "/that-page" assuming the directory on the server or does it have to be a complete web address (mydomain.com/that-page)? And, since I am basically looking to block them from the domain entirely if they come in on the spoofed that-page referrer (or are calling for that-page), should "that-page" be in the last line or the whole domain?

Don't use [NC] unless you have genuinely seen more than one casing of the offending domain.
I am not sure what you are saying. 40% of the hits to my domain were direct calls to and referred calls from the nonexistent that-page. And I don't actually know what [NC] does. And I'm unclear on the contextual meaning of the term casing in this instance, as well.

I have added 70-some IPs I harvested from my logs to my IP deny from list in my .htacess file today. With the current RewriteCond in place I am seeing 0 byte hits (I assume the forbidden is working) from new IPs that come in on the spoofed referrer and get nothing three times and then do a direct request to my domain and get the index.html.

I am pretty sure that moron may be too charitable a description of myself now that I have reviewed this post. I appreciate your efforts to bring light to my dimness.

keyplyr

1:00 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The thought occurred to me this morning, as I reviewed my visitor logs, that I have an inordinate number of hits looking for my aforementioned nonexistent page allegedly coming from a referrer that claims to be that very nonexistent page. So, I thought: "If I block the nonexistent referrer, then I won't have to block the individual random IPs that keep looking for the nonexistent page."
Why bother to block at all? So what if these hits come from some page that has a non-valid link to your site. They get 404s.

What I've done is written a "custom 404" pages that tells these wayward visitors they've followed a bad link, then offer them a search utility (or site navigation) to find what they're looking for. This way all these lost visitors can turn into real visitors.

To use your own custom 404 page (instead of the server's default error page) just create the page and add this to your htaccess file:

ErrorDocument 404 /custom-error-page.html

Note: you can name the custom-error-page anything your like

Tater

2:51 am on Oct 11, 2018 (gmt 0)

5+ Year Member



Why bother to block at all? So what if these hits come from some page that has a non-valid link to your site. They get 404s.


These are bots trying to do something. Most of the IPs show up in databases of known abusive IPs. Forum spam, DDOS, etc. Each hit is a few bits of data and the volume is annoying. With the blocking, they aren't using any data, at least until they do the reversion to the domain address and get the index.html/.

keyplyr

3:03 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, with all the topic drift in this thread, it wasn't clear if you've determined the hits are automated. In that case you have some blocking choices.

Read here: Blocking Methods [webmasterworld.com]

lucy24

4:17 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



what is "that-page" supposed to be? Is that the nonexistent webpage that is being called for by the referrers? Is it "/that-page" assuming the directory on the server or does it have to be a complete web address (mydomain.com/that-page)?
I now realize I read your post too fast, and my brain inserted ${REQUEST_URI} where you actually said Referer ... but I think we're still on the right track. I assumed that the “aforementioned nonexistent page” was some specific page. And if so, you should absolutely put it in the body of the rule, since it only ever applies to requests for that page. In fact, since the page doesn't exist in the first place, why bother with conditions at all? Block them, regardless.

It may help to understand that RewriteRules work on a system of “two steps forward, one back”. The server looks at the rule itself before it evaluates that rule's conditions. If the pattern given in the body of the rule already doesn't match--for example if the rule says “that-page” and the current request happens to be for “some-other-page” or something that isn't a page at all--then the server won't bother to look at the conditions.

The ^ is an anchor meaning “beginning of whatever you’re currently looking at”. So for example if a RewriteRule (in htaccess) says ^somepage then the rule will only apply to requests for example.com/somepage and not to example.com/directory/somepage or even example.com/osomepage.

The body of the rule only looks at the URL path. Not protocol, not hostname, not query string. All of those need to go in Conditions--like the www and https rules discussed upthread.

Now, here's another clever trick. If you don't want those robots snuffling around and using up server resources, but you also don't want them to know that you're onto them, you could do it like this:
RewriteRule ^nonexistent-page - [R=404]
This means: return a 404 response, but do it manually without putting the server to the work of looking for the page. To the visitor, it will look identical to the ordinary 404 that they would have ended up with anyway. Normally we think of the [R] flag as various kinds of redirects, like 301 or 302, but in fact you can append absolutely any numerical code.

Ordinarily, RewriteRules are only concerned with the request. mod_rewrite doesn't know and doesn't care whether the requested file physically exists on your server--unless you have specifically told it to look. (I don't recommend this except in special circumstances.)

Oh yes and ... It isn't what you already know, whatever that may happen to be. It's what you are capable of learning. Trust me, it does not take long to figure out where a given asker belongs ;)

wilderness

9:59 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Tater,
These types of threads have a history of getting out of hand very fast. What seems to get lost in the shuffle is 'KISS for beginners'
I'm going to address some of your questions and responses you've received that you didn't grasp.

lucy24, you need to remember that I am basically a moron.

All beginners with htaccess need to understand some basic syntax that are rarely expressed these days.
^ begins with
$ ends with
^$ exactly as (with exact string and/or URL
no punctuation contains

After eighteen years of using htaccess, I find these most basic syntax functions useful in most Rewrites.

In one of your examples you used foo multiple times. In lucy's reply she just replaced foo with 'that-page'.
IMO and considering the next quote, you simply need to test
# If Referer contains foo than deny
RewriteCond %{HTTP_REFERER} foo
RewriteRule .* - [F,L]
(Note; should your htaccess contain multiple refer lines for different key words than your be required to use the [OR] at each lines end except the last line.

[NC] means no case (either upper or lower)

By way of introduction, the only reason I am a webmaster is that I wanted a domain so I could have permanent email addresses and put some pictures up to be shared with friends and family.

What seems to be lost here is that your not trying to kill giants, rather that your pages and site is simple.

Tater

5:23 pm on Oct 12, 2018 (gmt 0)

5+ Year Member



I'm sorry for the lost day, I have been... well. Life.

Sorry, with all the topic drift in this thread, it wasn't clear if you've determined the hits are automated. In that case you have some blocking choices.

Read here: Blocking Methods [webmasterworld.com]


I assumed that was implied. Of course, only I could see the numbers that made that obvious. I'm sorry. About 40% of my hits were coming from a single IP address looking for something that isn't, and never has been, there. That would be unlikely to be a single person hitting F5 over and over again. That IP showed up more than 2000 times in ten days.

I also went through the last ten days of raw logs and identified the IPs calling for, or claiming referral by, the nonexistent page. I now have over 300 different IPs blocked. Those are IPs that I assume are part of a single bot army, since they are all calling for or "referred" by the same page.

I followed the link you supplied and have set up a spider trap and block of http 1.0. That was interesting and I caught two spiders already. One benign and one not. The http 1.0 block filtered out a bunch of hits, too.

I assume the spider trap will also catch the IPs that show up and try to hit every conceivable .php file on my domain (I assume looking for an exploitable file). That should keep traffic to a dull roar.
This 31 message thread spans 2 pages: 31