Forum Moderators: open

Message Too Old, No Replies

Blocking Google's Proxies

         

dstiles

8:41 am on May 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I get a relatively high number of hits from G's proxy range, as far as I can tell all to /.well-known/traffic-advice. This is no more than annoying but it IS annoying. I've tried robots.txt but G's bot does not obey it.

Does anyone (apart from G!) have a good reason why I should not block the range 66.249.93.0/24?

tangor

2:58 am on May 7, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unless you have set up a /.well-known/ folder (it's a metadata kind of thing, just ignore it and serve the 404 (that's what I do). However, if your 403 is smaller in bytes then you can deal with it as request in .htaccess and nuke it that way.

Either way, g ignores refusals and just keeps trying.

The prefetch side of g on the other hand, I dismiss with prejudice!

dstiles

7:47 am on May 7, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, tangor. I was really wondering if that /24 was used for any other purpose than the well-known bot.

lucy24

3:25 pm on May 7, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're getting bombarded with requests for nonexistent files (OP doesn't say whether you actually have a /traffic-advice) you might choose to return a 404 manually. It looks exactly the same to the visitor, but saves your server the trouble of having to look for the file.

dstiles

8:12 am on May 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I should have been more explicit in the OP. :(

The only well-known I SHOULD get is letsencrypt, which seems to reference it but since it does not exist I have no idea what for.. Other than that I have none. The bot is getting 403's but that makes no difference to G - or any other bot come to that; they treat them all 403, 404, 410 as 200 - i'm thinking of sending 418 but they'd only think it was a 200 again!

lucy24

3:53 pm on May 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



which seems to reference it but since it does not exist I have no idea what for
I think someone hereabouts managed to explain this in a way that fit into my brain (after multiple tries, probably). At the moment of requesting, the letsencrypt file does exist, because the server administrator has created it. Once it has been successfully retrieved, it has done its job and is again deleted.

Quick detour to a randomly chosen site confirms that the directory /.well-known/acme-challenge/ physically exists, but is normally empty. In logs, I find clusters of eight requests every two or three months (why not exactly three? who knows) spaced a few seconds apart. And final detour to one site's htaccess confirms that this is one situation where I do return a manual 404.

dstiles

8:18 am on May 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm. I do not have .well-known on any site but letsencrypt still updates ok - as long as I have not accidentally blocked port 80 for various pseudo-random servers such as digitalocean; but block as many servers on 443 as I like. Perhaps it also creates the directory and then removes it if originally empty, although I would have thought that was a bit risky if others could use it.

lucy24

4:49 pm on May 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think letsencrypt has two or even three ways of validation. The /.well-known/ route is one of them, but when I first had letencrypt they used some other method. Then, over the years, /.well-known/ started showing up on sites. My original thought was that the directory is aliased to some other physical location on the server, so I wouldn't see it myself, but that doesn't seem to have been the case. (Leading dot is obviously not the explanation, or I wouldn't be able to see my .htaccess files. That one's a preference setting in Fetch.) Oh, and the first /.well-known/ came with its own little htaccess with a couple of RewriteRules, but the later ones seem to have dispensed with this.

If you genuinely don't have a /.well-known/ you could try returning a manual 404 to any and all requests. And then see if letsencrypt reports a problem the next time they need to update the certificate.

tangor

6:48 pm on May 11, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After a year of no /.well-known/ on my site, and running LetsEncrypt I can say there's been no problems encountered.

dstiles

8:12 am on May 12, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Lucy, Tangor.

So the only thing left is: is it ok to block G's proxy IPs?

tangor

9:23 am on May 12, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are the numbers EXCESSIVE? >50+/day? Maybe, but I don't see numbers in that range. What are you experiencing?

As I noted above g will continue EVEN IF YOU NUKE 'EM as far as the /.well-known/ ... as I said it is a metadata now in common use, much like robots.txt is common use.

Since I don't have one it gets a 404. I do allow all "robots.txt" access just to see who's out there.

In this case, my opinion, dealing with the request/ua is preferable than an IP range. Less work for me ... then again, I am a bit on the lazy side. :)

dstiles

7:35 am on May 13, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Overall, not excessive, I admit. Just annoying that the G bot does not follow robots.txt and is persistent. About two dozen hits/day over about two dozen sites; not all sites per day, I had 18 on one site yesterday. Now I consider THAT a bit excessive.

I will probably leave alone, feeding it 403's; as noted above, G seems not to differentiate between 4xx responses and I already feed it 403.

SumGuy

12:06 am on May 15, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Every 3 months I run a letsencrypt script from the command line on a windoze PC - it places a file or something in my webserver acme-what-ever directory that it creates, and creates some cert files that I take and create new certs in my web server (Abyss). Just before I run the script, I log into my router and disable a HUGE IP blocking list (blocking for port 80 and 443). Among what I'm blocking are ALL AWS and google user IP ranges. I could check from the last time I did this, but I'm pretty sure that as part of these cert updates letsencrypt hits me / you from either AWS or google user IP. And it's never the same IP or even CIDR so it's not like you can white-list those.

While on the topic of letsencrypt, for those that update manually, did you notice that no cert expiry emails have been sent out for the past few weeks? I discovered by chance that my certs had been expired for a few hours, this was about a week ago.

Another reason to block AWS and GoogleUser IP's are the many, many other hits on other ports, port scanning, etc, and also spam on port 25 (if you're running your own mail server). Nothing but garbage traffic do I see from those IP's.

dstiles

8:15 am on May 15, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SumGuy - I need LE autoupdate on several web sites plus mail. I ran into the blocking problem some time ago and asked on their forum. Yes, their hits come from a large number of different IPs and they won't tell you which ones for security reasons. I now block AWS, DigitalOcean (another LE scanner) and most of G among many others but only on port 443 as LE uses port 80. The hits on other ports are relatively light.

For some reason my mail server failed to update its cert last month. First I know was when my thunderbird complained about a cert expiry. Fixed within 20 minutes but annoying; I don't know why it happened (i check the logs each day and found no obvious renewal error) and yes, there was no email.

Apropos mail server hits, I assume you know you can block most broadband ranges since mail should not originate from them, but remember to keep your customers' IPs open, I also block as much of china and russia as I can plus large swathes of south america; my customers receive no genuine mail from them.

SumGuy

3:38 am on May 16, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



I don't use certs for email (my server is open for any incoming connections from external servers on port 25 for SMTP and that doesn't use certs or TLS, etc). Local users grab pop mail on port 110, no TLS/SSL login, on the local LAN, and some external access is allowed but is IP-white-listed in the router. (wouldn't a self-signed cert be sufficient for user mail retrieval?)

I have separate blocking lists for SMTP (19k CIDR's) and WWW (61k CIDR's) both managed by the router. Fewer SMTP entries but they are larger CIDR's in general (some /8's even). Yes, residential IP's should not send mail (but many in the third world do).

LE was having problems with their mail system for sending expiry warnings, I don't know if it's fixed or not. If you have a unix/linux-based system then I thought there were automated ways for cert renewals that made e-mail notifications unnecessary?

dstiles

8:23 am on May 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm never comfortable with self-signed certs. I have a dummy web site on the mail server to simplify letsencrypt auto-renewal, open only to port 80. I haven't allowed port 110 for years. Mail is retrieved on ports 993 (IMAP) and 995 (POP) using SSL/TLS and Normal Password. Hence the need (in my opinion) for an LE cert. (This is using Postfix.)

I have separate servers for mail and web, so iptables are unique. Mostly they comprise CIDR blocks but I have a separate section on the mail srever for single-IP probes and port scanners, of which there are suddenly, over the past year or so, a plethora. Third world is not, for me a problem. I and my customers receive almost no mail thence and what there is can be easily whitelisted.

All of my web servers' LE certs were auto-renewed with no problem. It was only the mail server that failed (and with no log entry to that effect that I could see). I sometimes get a warning from LE about an imminent expiry on web certs but that's because I've either discontinued the site or moved it to another server with a new cert.

lucy24

6:23 pm on May 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't remember if you said exactly how you’re blocking unwanted IPs. But if you're concerned about inadvertently blocking legitimate LE activity, you can probably find a way to poke holes: Deny from suchandsuch UNLESS the user-agent is
:: detour to logs ::
Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)

So far, no robot seems to be using this UA for any nefarious purpose.

Looking it up, I see they use a 4:4 pattern: four requests for some particular filename in the /acme-challenge/ directory, all from different IPs; wait a few seconds; four requests for a new filename, again from random IPs. I also note that if the site in question has an IPv6 address, LE will use IPv6 for all those requests.

dstiles

7:59 am on May 17, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, but allowing by UA might work seems a bit dodgy to my mind. I think. I'll stick with what I know for now. :)

Brett_Tabke

12:01 pm on May 21, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I was getting hundreds of requests through the proxy here. Sure seemed like googlebot too me. I banned the whole c block. So far no issues.

dstiles

8:05 am on May 22, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, Brett. I'll keep that in mind. I'm still wavering...