Forum Moderators: open

Message Too Old, No Replies

Safe to Block HeadlessChrome?

         

glakes

7:07 pm on Jun 16, 2019 (gmt 0)



We're getting attacked from hundreds of Amazon cloud IP addresses that hit our homepage first then hit our site search with clothing, shoe, etc. terms that are not related to what it is we sell. My guess is it is a negative SEO attack as this has persisted for weeks, and the search terms used are far different than what we sell. The user agent strings all have "HeadlessChrome" in common. Is it safe to block in htaccess with the following?

BrowserMatchNoCase HeadlessChrome bad_bot
Order Deny,Allow
Deny from env=bad_bot


We've tried block by IP address and ranges, though they just hop onto one of many other unblocked Amazon IP addresses. I'm tired of the whack a mole game...

not2easy

8:30 pm on Jun 16, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Some of these things depend on your Apache version but from what I've seen it is often backward compliant so I use some old standards.

I use several lines to block UAs so I would just add it in there. If you don't already have a set of UA blocks, one standalone that uses a RewriteRule format:
RewriteCond %{HTTP_USER_AGENT} (HeadlessChrome) [NC]
RewriteRule .* - [F]


This belongs in your rewrite rules, before any canonical rewrites.

glakes

9:43 pm on Jun 16, 2019 (gmt 0)



@not2easy

Thanks for your help. I've added your code to my htaccess file as per your instructions (before rewrites). I hope this stops them in their tracks...

tangor

9:57 pm on Jun 16, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a similar problem of bad request for things not on the site ... but many come with apparently valid UAs, or bing or g UAs ... Some of these go to 301 for some reason while most go to 404 ... the 301s all exhibit the same request such as

/realfolder/bogusstuff/here or /realfolder/bogusstuff/here/

There are no file extensions. I get between 1500-6000 per day! (site is only 700 valid html pages with about twice that in image files.

Easy enough to filter out of the logs ... but I'd like to make it stop altogether!

lucy24

11:10 pm on Jun 16, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is it safe to block in htaccess with the following?
You’re really asking two different questions: Is this an appropriate access-control method, and can this UA string be blocked without ill consequences?

I can't remember reading about headless browsers being used for law-abiding human purposes, so that's one half of the question.

The other half depends on what you're currently using for access control. The quoted lines are correct for Apache 2.2. If you're on 2.4, the setenvif part stays the same, but the Deny part would be replaced by something involving a Require envelope. Using mod_rewrite for access control is also technically correct, but can be troublesome if you want to apply rules globally to multiple sites. (To a certain extent this issue becomes smaller in 2.4 because there are more inheritance options.) In general, if you've found a method that works, stick with it so you can keep all your rules in the same place. You can even use a setenvif + rewrite combo, though you don't see it very often.

wilderness

9:20 pm on Jun 17, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



glakes
FWIW, you'd likely catch all with a shorter version of the rewrite

RewriteCond %{HTTP_USER_AGENT} Head
RewriteRule .* - [F]

lucy24

10:50 pm on Jun 17, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



shorter version of the rewrite
Hee. I don't find anything in logs that fits Head but not Headless,* so yeah, get that server outta there four bytes sooner ;)


* Barring an image file called Headache.jpg which I'd forgotten all about.

glakes

12:54 am on Jun 25, 2019 (gmt 0)



Thanks for the tip wilderness. I'll have to take a look at my logs and see if there is anything using head. But as it stands, the code from not2easy stopped them dead in their tracks. Everyone's help is much appreciated.

aristotle

1:16 am on Jul 26, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder why someone would go to all the trouble and expense to set up this attack, while doing in such a way that it can be blocked so easily.