Forum Moderators: phranque
Akamai Technologies, Inc. (NASDAQ: AKAM), the cloud company that powers and protects life online, today released a new State of the Internet (SOTI) report that details the security and business threats that organizations face with the proliferation of web scraping bots. Scraping Away Your Bottom Line: How Web Scrapers Impact Ecommerce finds that bots compose 42% of overall web traffic, and 65% of these bots are malicious.
With its reliance on revenue-generating web applications, the ecommerce sector has been most affected by high-risk bot traffic. Although some bots are beneficial to business, web scraper bots are being used for competitive intelligence and espionage, inventory hoarding, imposter site creation, and other schemes that have a negative impact on both the bottom line and the customer experience. There are no existing laws that prohibit the use of scraper bots, and they are hard to detect due to the rise of artificial intelligence (AI) botnets, but there are some things companies can do to mitigate them.
I redirect 404s to that logging page to trap the url they requested. The first day was still hits on the home page, but by the next day I am logging hits on all sorts of wild guesses like /admin, /app, /site.js, /app, /wordpess/wp-includes, /xmlrpc.php
I am interested to know how you did this on IIS.So the core functionality in IIS has been used:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rewriteMaps configSource="rewritemaps.config"></rewriteMaps>
<rules configSource="rewriterules.config"></rules>
</rewrite>
<security>
<ipSecurity allowUnlisted="true" denyAction="AbortRequest">
<!-- START Digital Ocean -->
<add ipAddress="138.197.0.0" subnetMask="255.255.0.0" /> <!-- 138.197.0.0/16 -->
<!-- END Digital Ocean -->
</ipSecurity>
<requestFiltering>
<fileExtensions>
<add fileExtension=".pl" allowed="false" />
<add fileExtension=".php" allowed="false" />
<add fileExtension=".asp" allowed="false" />
<add fileExtension=".aspx" allowed="false" />
<add fileExtension=".env" allowed="false" />
<add fileExtension=".vscode" allowed="false" />
.....etc.....
</fileExtensions>
</requestFiltering>
</security>
</system.webServer>
</configuration> <rewrite>
<rewriteMaps configSource="rewritemaps.config"></rewriteMaps>
<rules configSource="rewriterules.config"></rules>
</rewrite> <rules>
<rule name="BlockUnwantedExtentions" patternSyntax="Wildcard" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{URL}" pattern=".php|.git" />
<action type="AbortRequest" />
</conditions>
</rule>
</rules> Zone files are fairly accessible for all gTLDs and many ccTLDs.Isn't this one of the canonical differences between RIPE and ARIN? If you register a dot com or similar, you can expect to find all manner of robots--whether legitimate or il--swarming over it within days. But if you’re in Europe, you have to tell them you exist.