Forum Moderators: phranque

Message Too Old, No Replies

Bots Compose 42% of Overall Web Traffic; Nearly Two-Thirds Malicious

         

tangor

3:08 am on Jul 3, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Akamai Technologies, Inc. (NASDAQ: AKAM), the cloud company that powers and protects life online, today released a new State of the Internet (SOTI) report that details the security and business threats that organizations face with the proliferation of web scraping bots. Scraping Away Your Bottom Line: How Web Scrapers Impact Ecommerce finds that bots compose 42% of overall web traffic, and 65% of these bots are malicious.

With its reliance on revenue-generating web applications, the ecommerce sector has been most affected by high-risk bot traffic. Although some bots are beneficial to business, web scraper bots are being used for competitive intelligence and espionage, inventory hoarding, imposter site creation, and other schemes that have a negative impact on both the bottom line and the customer experience. There are no existing laws that prohibit the use of scraper bots, and they are hard to detect due to the rise of artificial intelligence (AI) botnets, but there are some things companies can do to mitigate them.


What kind of steps are you taking with the above?

Kendo

5:07 am on Jul 3, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see it here too... 50% of hits are malicious scripts probing for weaknesses to exploit databases. Whether they are are looking for user details, trying to inject SEO spam or trying to take the site offline is not clear. Most of it is looking for WordPress which I don't use. Databases are secured by localhost limitations and all requests are sanitised.

My sites are safe but I am interested to hear if such exploits have caused a problem for others.

lucy24

5:49 am on Jul 3, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



42% ? ! ? !

I would have guessed more like 94.2%.

Mark_A

9:44 am on Jul 3, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just reeling from an attack of email link checking bots.
They came from all over.

Kendo

10:18 pm on Jul 3, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How they find us will be interesting. One new site developed earlier this year was set upon with 24 hours of registering the domain.

blend27

4:46 pm on Jul 24, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Kendo,

--How they find us will be interesting.--

Just for kickers: create a random-word-subdomain.yourdomain.com. Do not share the name of this subdomain with anyone.

Watch traffic trickle in, half hour.... LOG/Block all requests across all of your domains by IP Ranges, with fury....

@tangor,

really nothing new than we have not spoken about in [webmasterworld.com...] threads over the years...

Kendo

3:33 am on Jul 26, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just for kickers: create a random-word-subdomain.yourdomain.com

Done.

Stay tuned :-)

Kendo

11:37 pm on Jul 27, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Creating a site for alias.example.com went unnoticed. No bots after 3 days. Creating a new domain might do the trick so will try that next.

brotherhood of LAN

7:25 am on Jul 28, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Just for kickers: create a random-word-subdomain.yourdomain.com

It might be worth trying one with a TLS cert and one without. I think one of the easiest ways to discover new domains is transparency lists
[certificate.transparency.dev...]

Kendo

12:11 am on Jul 29, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> new domains transparency lists

This may be the case. For now I have registered a new domain name to see how that goes. The hard part was coming up with a name that was worth paying for so I am creating a site for my partner's home gallery. Then later we can try a real SSL cert.

If it is the ssl registry that miscreants are tapping into, one could be asking what use is that service?

Kendo

3:29 am on Jul 29, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without an SSL cert I already got 11 different IPs hitting the home page within 1 hour. No malicious requests yet. Will let this run as is for a while.

brotherhood of LAN

7:15 pm on Jul 29, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Without an SSL cert I already got 11 different IPs hitting the home page within 1 hour. No malicious requests yet. Will let this run as is for a while.

Interesting, did they have a host name in their requests? vs random IP traffic.

Can't think of an obvious other way where people discover subdomains.

blend27

8:25 pm on Jul 29, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- Creating a site for alias.example.com went unnoticed. No bots after 3 days --

Less than an hour for me with free LetsEncrypt ssl. We don't do none-SSL for a while now, the script installs LE SSL when subdomain is created.

AS a matter of fact we only use subdomains for internal apps(allowed by IP only) access and for cdn for assets(public).

Kendo

5:20 am on Jul 30, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> did they have a host name in their requests
In this case several sites share the same IP address. So they are hitting the home page of that domain.

I redirect 404s to that logging page to trap the url they requested. The first day was still hits on the home page, but by the next day I am logging hits on all sorts of wild guesses like /admin, /app, /site.js, /app, /wordpess/wp-includes, /xmlrpc.php

I added some content with Under Construction - Opening Soon and a contact form to let me know how they found the site. Don't expect much from that :-)

Kendo

5:27 am on Jul 30, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> I would have guessed more like 94.2%.
Mine would a lot higher also, especially on well established sites. But most of those are on Windows server which kills malformed requests before they can be logged. So my 50% of bad requests is 50% of what gets past that which is logged by AWstats.

Such a waste of everything catering for this, and all the while we see lots of campaigning for privacy and VPNs so that users cannot be logged/tracked.

blend27

4:34 pm on Jul 31, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I redirect 404s to that logging page to trap the url they requested. The first day was still hits on the home page, but by the next day I am logging hits on all sorts of wild guesses like /admin, /app, /site.js, /app, /wordpess/wp-includes, /xmlrpc.php

I know what the code/uri paths on sites I work on, none of them include any mentioned above or so many others being requested by bots...

So,

I personally ABORT all requests on IIS/Firewall level to those "crap" requests. And i don't even see it in the logs. It is a big waste of time unless-eseseseses one wants to learn for a minute about what happens... In reality, serving 404s, 404 pages, 403s is a waste of CPU electricity(climate change impacts) -- trying to process those requests, then burning electricity(oil, or midnight oil if you will..) to power up a screen and a computer to view the logs.

Waste Time on it? - for each one of us - we only have xx number of years to be on this planet and time better spent Gardening or taking a Walk in a park and having a Cold 1 after that...

Kendo

1:13 am on Aug 1, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ABORT all requests on IIS/Firewall level to those "crap" requests

I am interested to know how you did this on IIS.

blend27

6:46 pm on Aug 2, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am interested to know how you did this on IIS.
So the core functionality in IIS has been used:

In the root IIS would have a file called web.config. this file would have references to 2 other files(these are treated like includes where one can do magic).

Here are the example of web.config using references to rules and maps:
web.config
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rewriteMaps configSource="rewritemaps.config"></rewriteMaps>
<rules configSource="rewriterules.config"></rules>
</rewrite>
<security>
<ipSecurity allowUnlisted="true" denyAction="AbortRequest">
<!-- START Digital Ocean -->
<add ipAddress="138.197.0.0" subnetMask="255.255.0.0" /> <!-- 138.197.0.0/16 -->
<!-- END Digital Ocean -->
</ipSecurity>
<requestFiltering>
<fileExtensions>
<add fileExtension=".pl" allowed="false" />
<add fileExtension=".php" allowed="false" />
<add fileExtension=".asp" allowed="false" />
<add fileExtension=".aspx" allowed="false" />
<add fileExtension=".env" allowed="false" />
<add fileExtension=".vscode" allowed="false" />
.....etc.....
</fileExtensions>
</requestFiltering>
</security>
</system.webServer>
</configuration>


notice note 2 sections here:
1. ipSecurity which drops entire load of requests for 138.197.0.0/16 range, you will not see anything in your logs coming from 138.197.0.0/16 at all after implemented.
2. requestFiltering will prevent any files with extensions of .php, .env ... etc to processed by your IIS...

Now lets look at some code from above:
<rewrite>
<rewriteMaps configSource="rewritemaps.config"></rewriteMaps>
<rules configSource="rewriterules.config"></rules>
</rewrite>


You can also use rules to control/interrogate all conditions that are supplied in HTTP Headers and use <action type="AbortRequest" />.

For that research URLRewrite 2 module for IIS.

Many rules and works wonders: [learn.microsoft.com...] AND many examples on the web on how to do what you want to accomplish...
<rules>
<rule name="BlockUnwantedExtentions" patternSyntax="Wildcard" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{URL}" pattern=".php|.git" />
<action type="AbortRequest" />
</conditions>
</rule>
</rules>


Many, many rules can be configures to control the flow of the request using either XML based code or MMC for IIS.

Kendo

12:25 am on Aug 11, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The latest on my logging of hits to a new domain that is unlinked is that there are not as many different users as first seen. I am still getting more unique visitors but what I now see is mostly revisits from users using TOR or a VPN to rotate their IP address.... same list of requests and same user-agent... over and over again.

But the question remains, if they are getting the url from newly registered domains, are they getting them from the registrars? I do know that domain squatters sign up as domain resellers but I suspected that getting a list of expired domains to jump was more an inside job, coming from people working for the registrar.

brotherhood of LAN

2:50 am on Aug 11, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>getting the url from newly registered domains

Zone files are fairly accessible for all gTLDs and many ccTLDs. Lots of people also trying to keep lists for the expired domain thing. Subdomains though, potentially registrars may leak them. I'd have leaned towards the TLS cert thing. The chosen DNS authority would be another avenue.

If you wanted to dig deeper, have one of your own servers as the DNS authority for a newly created subdomain and see where/when the lookups start. Maybe do it a bunch of times with the same domain, then another domain with a different registrar.

There's a bunch of "open dns" type services that may open the can of worms and make a new subdomain common knowledge, would be interesting to see how it all starts though.

lucy24

4:10 pm on Aug 11, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Zone files are fairly accessible for all gTLDs and many ccTLDs.
Isn't this one of the canonical differences between RIPE and ARIN? If you register a dot com or similar, you can expect to find all manner of robots--whether legitimate or il--swarming over it within days. But if you’re in Europe, you have to tell them you exist.

thecoalman

11:05 am on Aug 13, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use Cloudflare, 99% of legitimate traffic should be coming from US for my site. All requests outside of US and CA get a JS challenge, something like .02% are successful. This easily cuts out half of overall traffic if not more and most of the bot activity. The JS challenge should be fairly seamless for legitimate users. Cloudlfare also has tools for automatically detecting and blocking illegitimate bots/AI bots.

Ports 80 and 443 are firewalled on origin server except Cloudflare IP's, this prevents any bots using IP ranges for those ports. This is also critical step to mitigate DDOS.

I check my stats occasionally and will block abusive networks, e.g OVH. You only need the ASN for blocking entire network with Cloudlfare. You need to be careful with this. DuckDuckGo uses AWS which is another network I have seen a lot of bot activity. I'm undecided what to do with TOR and VPN's because I see an increasing amount of bot traffic from them. I don't necessarily want to block legitimate traffic from them.

I have not deployed any rate limiting rules but that is next on the list.

blend27

4:12 pm on Aug 20, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ports 80 and 443 are firewalled on origin server except Cloudflare IP's, this prevents any bots using IP ranges for those ports. This is also critical step to mitigate DDOS.

But what/how much fun is in that?

thecoalman

9:26 pm on Aug 20, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It sucks not having to see all those requests for /phpmyadmin coming through the IP. ;P