Forum Moderators: open

CloudFlare Releases AI Crawl Control

         

Brett_Tabke

3:17 pm on Aug 28, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Their block-the-ai-bots is out of beta and ready for all sites to start charging AI companies for their content.

[searchengineworld.com...]


also they updated a new robots.txt cloudflare [searchengineworld.com] standard.

londrum

4:25 pm on Aug 28, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can't see this working because no AI company is going to pay for content unless you're a huge site that has the clout to sue them.

What CloudFlare need to do is lump tens of thousands or hundreds of thousands of websites into a 'collective' (or whatever they want to call it), and then negotiate a single price with the AI companies for access to the whole lot.

If there are enough websites signed up then AI companies might actually pay. And then CloudFlare can share the money out based on traffic

Kendo

9:14 pm on Aug 30, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now how original is that?

I knew they would be the next to try.

Kendo

9:36 pm on Aug 30, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What will be appreciated is someone from Cloudfare paying for the time that I spent researching and developing that solution.

tangor

4:37 am on Aug 31, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another protection racket that can fleece bucks down the road.

However, at this point, this is after the fact and merely protects the bones from being picked over.

Sad thing is that Copyright Laws exist almost everywhere, but are upon the INDIVIDUAL creator to protect and nobody's pockets are that deep, nor do we have the TIME to protect what has already been stolen.

WORSE, nation/states are all in on AI and the small fry are the collateral damage in these grandiose plans.

What is needed is a plug in widget that does what cf promises and is FREE (as if that will happen anytime soon).

Left with:

Grin and bear it
or
Grin and bare it

Either way, what's yours is theirs and there's not a lot that can be done except DENY DENY DENY if you can IDENTIFY.

Kendo

3:01 am on Sep 1, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If anyone has been following the progress of my research here, they will know it is not easy to filter bots. Sure Cloudfare talks about bot filtering but they cannot tell the difference between a bot and a web surfer when a popular web browser is being used for scraping. In fact they can only distinguish a few popular web browsers because their fingerprinting is limited. Yes there was mention that while Brave is not so well known it does pass their fcriteria, but that was because Brave had to change many features to satisfy Cloudfare demands.

As previously discussed, the only sure way of preventing bots while allowing innocent web surfers is by using web page encryption and a special web browser that can decrypt and display those pages - that is the best solution but it is also a limitation.

For more information on how bot protection can be implemented including paywall management, see:

[artistscope.com ]
[safeguard.media ]

DEMO: Both links provide the means to test-drive any website.

Note: CMS plugins are available for WordPress and Moodle websites to use that service.

Brett_Tabke

11:37 am on Sep 1, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Who remembers Incredibill [webmasterworld.com]? And his "CrawlWall". That was over a decade ago.

SumGuy

1:32 am on Sep 9, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



I don't get it. I see all the standard AI bots very easily - they all have well defined UA's and seem to use them consistently.

I know there are bots or scrapers, don't know if they're "AI" or not, that don't explicitly id themselves by UA, they use incredibly static or stale UA's or rotate through a bunch of easily-fingerprint-able UA's. I have many UA rules, and I give them a static page when detected. Some of these, a very small amount, are real people using VPN's. Many are bots using the same VPN's, many of these VPN's are residential.

And if you IP-block the usual suspects (the data centers) then you've just eliminated a huge chunk of rogue bots (ai or not).

Kendo

11:16 pm on Sep 9, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Logging referrer can tell more. I have an include that appears on all pages on one site and can change the criteria by simply editing that include. The catch of the day is currently "referrer". Before that it might have been "known search engines".

It makes interesting reading, especially when seeing hits for a large collection of pages with referrals coming from Kentucky Fried Chicken. IBM, Zoom, TripAdvisor, etc.

Soon AI will be the new sauce to have with chicken wings.

Brett_Tabke

11:16 am on Sep 10, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> I don't get it. I see all the standard AI bots very easily

This is Deepseek and possibly kimi:

146.59.44.6 vps-dd89a20d.vps.ovh.net Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36

190-48-84-126.speedy.com.ar Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36

That is one of 1000+ ip's from that group that hit here already this morning.

Worst I have seen is some group from brazil (.br) that is using a huge number (10,000+) ips to dl stuff.

I keep tagging links with specific utms and page watermarks, and wondering where it will show up.

[searchengineworld.com...]