chinese "testing" link hacking attempt?

Forum Moderators: phranque

Message Too Old, No Replies

chinese "testing" link hacking attempt?

hacking attempt

risusSardonicus

11:21 pm on Dec 7, 2017 (gmt 0)

Hi there. I use Clicky to keep realtime tracking of visitors. I haven't been promoting my website recently and there are very very few visitors. But, I'm getting repeat visits from an ISP in China and they use this link:

&wd=test

So there is my domain name, a slash and then the ending, like this:

https://www.example.com/&wd=test

Can anyone understand what they are doing?

[edited by: phranque at 1:56 am (utc) on Dec 8, 2017]
[edit reason] exemplified domain [/edit]

TorontoBoy

2:59 am on Dec 8, 2017 (gmt 0)

Interesting discovery. I track Chinese bots, or at least I try to, and I have not noticed this before. These entries are from my 2017 Nov log.

Conclusion: They are going after javascript and CSS, and ignore all content. They are not trying to break into anything, only GETs. Suspicious activity, no doubt, but not malicious. I have banned a bunch of them before, as they show up as 403s, which means they have in the past scraped me, tested my security or posted spam. The docs they are reading contain English only and no Chinese. In summary, I have no clue what they are doing, but they are not malicious. For sure they are bots and not human.

Maybe someone else can tease out a meaning or intent?

The referral from Baidu has a ?WD=RAA or some other 3-4 letter combo I also have no clue. It might be some encoding from Chinese?

14.215.176.12 CHINANET Guangdong
14.215.176.4 CHINANET Guangdong
111.206.36.17 China Unicom Beijing
180.97.35.36 Chinanet Jiangsu
123.125.143.151 China Unicom Beijing
123.125.143.151
115.239.212.197 CHINANET-Zhejiang Hangzhou

All are referred from Baidu. I tried to replicate a referral from baidu but could not get a ?wd= result. All are going after my wordpress site. Three GETS are looking for a tag "developer.android.com". I have many documents that are findable with this criteria, but not as a tag. There are 3 GETs that are looking for the drug indometacin, which I do not have. The last GET is for "visaforchina.org", which I also have in a couple of documents, but my server returned a 404. All but one were rejected by my server.

All user agent names are "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0", which was used 155 times in my logs. It is pretty rare for a Chinese bot to have a proper bot UA.

Thanks for bringing this forward. I have an interest in the murky shadows of roaming Chinese bots.

14.215.176.12[22/Nov/2017:06:31:26GET /wp/tag/developer-android-com/&wd=test HTTP/1.140416905http://www.baidu.com/s?wd=RAAMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
14.215.176.4[23/Nov/2017:09:21:29GET /wp/tag/developer-android-com/&wd=test HTTP/1.140416905http://www.baidu.com/s?wd=EEHMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
111.206.36.17[23/Nov/2017:20:32:25GET /wp/tag/developer-android-com/&wd=test HTTP/1.140313http://www.baidu.com/s?wd=GISMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
180.97.35.36[28/Nov/2017:23:14:10GET /wp/?p=4054&indometacin-over-the-counter&wd=test HTTP/1.140313http://www.baidu.com/s?wd=6CJMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
123.125.143.151[30/Nov/2017:01:21:34GET /wp/?p=4054&indometacin-over-the-counter&wd=test HTTP/1.1301-http://www.baidu.com/s?wd=JVDWMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
123.125.143.151[30/Nov/2017:01:21:35GET /wp/2011/12/01/my-wordpress-blog-hijacked-the-pharma-hack/?indometacin-over-the-counter&wd=test HTTP/1.120045657http://www.baidu.com/s?wd=JVDWMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
115.239.212.197[30/Nov/2017:13:44:13GET /wp/tag/visaforchina-org/&wd=test HTTP/1.140416917http://www.baidu.com/s?wd=W9GMozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0

Here is the first IPs log entries. They are going after javascript and css but no content. It is odd that they are specifically targeting http/1.1404, but why?

GET /wp/tag/developer-android-com/&wd=test HTTP/1.1404
GET /wp/wp-includes/js/wp-emoji-release.min.js?ver=4.9 HTTP/1.1200
GET /wp/wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.9 HTTP/1.1200
GET /wp/wp-includes/js/jquery/jquery.js?ver=1.12.4 HTTP/1.1200
GET /wp/wp-includes/css/dashicons.min.css?ver=4.9 HTTP/1.1200
GET /wp/wp-content/themes/ribosome-child/style.css?ver=1.0.0 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/css/font-awesome-4.7.0/css/font-awesome.min.css?ver=4.9 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/style.css?ver=4.9 HTTP/1.1200
GET /wp/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.4.1 HTTP/1.1200

Similarly, only JS and CSS. Here they target HTTP/1.1200, except for the first GET which goes after HTTP/1.1301

GET /wp/?p=4054&indometacin-over-the-counter&wd=test HTTP/1.1301
GET /wp/2011/12/01/my-wordpress-blog-hijacked-the-pharma-hack/?indometacin-over-the-counter&wd=test HTTP/1.1200
GET /wp/wp-content/themes/ribosome/style.css?ver=4.9.1 HTTP/1.1200
GET /wp/wp-content/themes/ribosome-child/style.css?ver=1.0.0 HTTP/1.1200
GET /wp/wp-includes/css/dashicons.min.css?ver=4.9.1 HTTP/1.1200
GET /wp/wp-includes/js/wp-emoji-release.min.js?ver=4.9.1 HTTP/1.1200
GET /wp/wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.9.1 HTTP/1.1200
GET /wp/wp-includes/js/wp-embed.min.js?ver=4.9.1 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/css/font-awesome-4.7.0/css/font-awesome.min.css?ver=4.9.1 HTTP/1.1200
GET /wp/wp-content/plugins/yet-another-related-posts-plugin/style/related.css?ver=4.9.1 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/js/ribosome-scripts-functions.js?ver=1.0.0 HTTP/1.1200
GET /wp/?live-comment-preview.js HTTP/1.1200
GET /wp/wp-content/plugins/akismet/_inc/form.js?ver=4.0.1 HTTP/1.1200
GET /wp/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.4.1 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/js/navigation.js?ver=20140711 HTTP/1.1200
GET /wp/wp-includes/js/jquery/jquery.js?ver=1.12.4 HTTP/1.1200
GET /wp/wp-content/themes/ribosome/css/font-awesome-4.7.0/fonts/fontawesome-webfont.woff?v=4.7.0 HTTP/1.1200

lucy24

3:24 am on Dec 8, 2017 (gmt 0)

Here they target HTTP/1.1200, except for the first GET which goes after HTTP/1.1301

Seems like �HTTP/1\.\d\d� in and of itself would be grounds for denial. I�ve never seen multiple decimal places in my life. (I�m on shared hosting. Is it the kind of thing they would block at the gate?) Are those even legitimate numbers, or is it another bizarre kind of typo? Looking it up, I find it only as a response header. Which, in turn, explains �1.1404� and �1.1200�.

GET /wp/2011/12/01/my-wordpress-blog-hijacked-the-pharma-hack/?indometacin-over-the-counter&wd=test

Says it all, doesn�t it.

They are going after javascript and css but no content.

In the case of WP, aren�t these highly specialized script- or stylenames an indicator of exactly which plugins, addons, themes, skins and assorted software variations you�re using? It gives them information, but is liable to attract less attention than an up-front file request.

risusSardonicus

6:35 am on Dec 8, 2017 (gmt 0)

Seems like we might have a couple in common, not sure. For me both are the &wd=test:

ip: 111.206.36.9 - Beijing, China - Organization= China Unicom Beijing -- Platform: Firefox 43.0 / Windows 7 / 1024x768

ip: 14.215.176.16 - Guangzhou, China - Organization=China Telecom Guangdong -- Platform: Firefox 43.0 / Windows 7 / 1024x768

TorontoBoy

12:48 pm on Dec 8, 2017 (gmt 0)

"Seems like �HTTP/1\.\d\d� in and of itself would be grounds for denial."

This looks like a hook we could use. Thanks for that. I'll need to do more research and go back a couple of months to review their behaviour. Just a regular human read of any Wordpress post will return all these GETs as well as the content. There's no way for me to hide these js and css GETs. For now I will monitor.

I have been trying to encourage human Chinese access to my site for a number of years, so have been encouraging the Chinese search engines such as Baidu and Yisou to index my site, as well as adding more Chinese language to my posts. As long as anyone from China (or anywhere else) is not actively malicious I tend to leave them alone.

risusSardonicus+, are you using your raw access log to track these Chinese bots? The raw access log will give you much more info. After the initial "&wd=test" these IP addresses then have some interesting activity, but do not use the "&wd=test". It is this activity that I'd like to track.

TorontoBoy

1:22 pm on Dec 8, 2017 (gmt 0)

I did an IBM X-force lookup [exchange.xforce.ibmcloud.com] and found:

14.215.176.16 some reported bot activity, infected devices
111.206.36.9 some reported bot activity, infected devices
14.215.176.12 some reported bot activity, infected devices
14.215.176.4 some reported bot activity, infected devices
111.206.36.17 risk 4.3 some reported bot activity, infected devices
180.97.35.36 risk 7.1 scanning IPs for vulnerabilities
123.125.143.151 risk 4.3 some reported bot activity, infected devices, spam
115.239.212.197 risk 4.3 some reported bot activity, infected devices

Only 1 IP, 180.97.35.36, scans for vulnerabilities. The rest seem to be infected devices used in botnet attacks.

timemachined

3:07 pm on Dec 16, 2017 (gmt 0)

I'm about to ban China, is it serious? I keep banning Chinese ips but the referrer now appears to continually be www.baidu.com so I'm guessing my site must be in their engine. Perhaps a site remove request, is that possible? I think it is possible to fake a referrer so may not be baidu.

TorontoBoy

4:03 pm on Dec 16, 2017 (gmt 0)

Go on baidu.com and search for your site. As China has banned Google search, baidu is the largest search engine. Chinese spammers are difficult to eradicate, and state sponsored entities hide in public IP ranges to do their work.

LeisureMan

6:04 pm on Feb 28, 2018 (gmt 0)

My sites don't show 404 errors. People get redirected to the home page. And I get an email telling me what they were looking for and who referred them. Then I try to fix it.

What I did for these idiots is I created a directory called
/&wd=test

And in it is a redirect, that sends them to
www.someplaceelse.com/<Their.IP.ADDRESS>

http://www.someplaceelse.com/IPaddress:<% =Request.ServerVariables("REMOTE_ADDR") %>

The someplace else could be anything. Like the FBI, CIA, whatever you think would be funny.

So, I sending them all off to get 404 errors on the that website and showing their IP address.

Good idea? Funny?

lucy24

9:10 pm on Feb 28, 2018 (gmt 0)

My sites don't show 404 errors. People get redirected to the home page.

:: sitting on hands ::

The someplace else could be anything. Like the FBI, CIA, whatever you think would be funny.

Do not do this.

keyplyr

9:26 pm on Feb 28, 2018 (gmt 0)

@LeisureMan - a 404 is informative. It tells humans & robots the page does not exist as requested. This is a helpful thing, for your visitors & for Search Engines and other resources that may benefit your interests.

Redirecting these requests to your homepage is not only unhelpful, it is dishonest. It also may get your homepage devalued, since the original requested document may have a lesser rank, this will get passed onto the target page.

As lucy24 says... do not do this.

LeisureMan

9:46 pm on Feb 28, 2018 (gmt 0)

More than once I have had a newspaper or magazine write about a site I run and in the article they included the a non-working URL. If the site was wrong there is nothing you can do, but most times it is something like
visit www.website.com/information

when what they wanted to say was
visit www.website.com/information.asp

Instead of all the people reading the article landing on a 404 Error, I would create a directory named /information and in it place a redirect to the correct page.

Seeing 100 404 errors in this case is not informative. I want the 100 people who read the article to actual find the correct page, and as the newspaper cannot go back and fix the link this is the only choice.

===

In the case above, sending Chinese robots off into oblivion is not hurting my site as the page does not exist and they are not real users.

lucy24

11:15 pm on Feb 28, 2018 (gmt 0)

I would create a directory named /information and in it place a redirect to the correct page

What's the purpose of the directory? You can just redirect in the first place, from /information to /information.asp. A 404 doesn't enter into this scenario at all. When you know someone has posted a bad link, and it isn't in your power to change it, a 301 at your end is the appropriate solution.

If it makes you happy to redirect unwanted visitors--and, indeed, some robots do go away faster this way--do it without involving a third party. You could for example redirect to 127.0.0.1 or to their originating IP. In the unlikely event that they actually follow up on the redirect, this may even result in their getting kicked off a server that otherwise doesn't care what its users do. Very serious caution: Do not do this if you have reason to believe the requests are fake (that is, nobody at aa.bb.cc.dd has really asked for your page). This is fairly uncommon, though.

keyplyr

12:38 am on Mar 1, 2018 (gmt 0)

You could for example redirect to 127.0.0.1 or to their originating IP.

Bad webmastering & unethical IMO.

In the unlikely event that they actually follow up on the redirect, this may even result in their getting kicked off a server that otherwise doesn't care what its users do. Very serious caution: Do not do this

So why mention this in a public forum? You know someone will try this stupid trick.

lucy24

4:51 am on Mar 1, 2018 (gmt 0)

Heck, I've done the 127.0.0.1 redirect myself. Never the originating IP version, granted.

I do also think highly of the manual 404, in situations like a RewriteRule where any response is equally easy to code. Most of the time it's technically true--as when they're asking for wp-admin files--but it conveys absolutely no information, not even the minimal �Hah, we�re onto you� of a 403.

What was this thread about again?