Welcome to WebmasterWorld Guest from 54.162.19.123

Forum Moderators: Ocean10000 & incrediBILL & phranque

Chinese Scraper 3B143: How to ban?

scraper,china,3B143

     
1:31 pm on Mar 31, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


I'm getting a large, organized scraping attack from China. Can anyone give me an idea as to how to block it? All individual IPs, no repeating aa.bb.cc., few repeating aa.bb. I cannot block by IP. I've narrowed it down to East China, so only half of the 1.4B are suspects!

UA is the same for all. it seems very common. The "3B143" seems unique but I get lots of hits on Google search.

ua: Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1
CNCGROUP-HE
CHINANET-HE
CHINANET-TJ
dns184.online.tj.cn
China Unicom Jiangsu
CHINANET-ZJ Hangzhou
CHINANET TIANJIN

Thanks for any help.
7:59 pm on Mar 31, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13677
votes: 440


It's Not Nice, but there is always
SetEnvIf Accept-Language ^zh badlang
with optional extra
SetEnvIf Accept-Language ^zh-(tw|TW) !badlang
... and take it from there.
8:22 pm on Mar 31, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


Not workable, as I have both English and Chinese language content and am trying, albeit difficult, to have my content indexed on Baidu, Sougou, Yisou.
9:22 pm on Mar 31, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13677
votes: 440


Oh dear. Scratch that then. (That's assuming they send an Accept-Language header at all. Do they?)

Do you get any human visitors with the identical UA? If not, you could ban it (using the whole string).

:: detour to raw logs ::

Looks like they peaked in 2015, but I find it from non-Chinese humans as recently as January of this year. (The significance of "non-Chinese" here is that you can't postulate an older or pirated OS being used in select regions long after its official sell-by date.) But then, beginning pretty abruptly last week, I do start seeing robots with that exact UA. Possibly also a few Chinese humans (with image requests it's hard to tell), but not in significant numbers.

So if you feel confident that your human users are

:: pause for wild exitement as I catch sight of red-clawed crab that I haven't set eyes on in a week so I was starting to think it had died ::

Ahem. If you feel confident that none of your human users are using a somewhat outdated UA--with iOS you've really got no excuse not to upgrade--then you could block it.

:: further check for "13B143" element alone ::

You're right, that's all you need, and it sure makes it easier to construct the rule. Very common in 2015, rare in 2016, rarer in January 2017, just robots since then.

:: final check of headers ::

Last two months, nothing but robots--including an occasional botnet whose distinguishing feature is: iOS UA, requests front page (which nobody ever does in real life) and sends piwik cookie even though they have never been here before.
4:28 pm on Apr 2, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


Thanks for checking, Lucy. I'll use the "13B143". Your database of log activity and experience is very helpful.

I, too, often wonder if a fish has died when I don't see it for a couple of days! Good for the red-clawed crab!
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members