homepage Welcome to WebmasterWorld Guest from 54.204.182.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
New stupid referer spammer
Referer spam via agent string (mistyped)
yaimapitu



 
Msg#: 4626980 posted 10:40 pm on Aug 9, 2013 (gmt 0)

In my website logs there are a few entries of the following kind:

60.190.129.52 - - [08/Aug/2013:06:30:08 -0400] "GET /http:/example.com/" 404 1398 "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
221.7.11.69 - - [08/Aug/2013:06:30:12 -0400] "GET http://example.com/http:/example.com/ HTTP/1.0" 410 - "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
host-43-254.adc.net.ar - - [08/Aug/2013:06:30:23 -0400] "GET /http:/example.com/" 404 1398 "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
kaproxy04.answers.sp1.yahoo.com - - [08/Aug/2013:06:30:34 -0400] "GET /http:/example.com/ HTTP/1.0" 410 - "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
177.11.136.39 - - [08/Aug/2013:06:30:51 -0400] "GET /http:/example.com/" 404 1398 "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
116.112.66.102 - - [08/Aug/2013:06:30:52 -0400] "GET /http:/example.com/" 404 1398 "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
163.23.70.129 - - [08/Aug/2013:06:30:56 -0400] "GET /http:/example.com/ HTTP/1.0" 410 - "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"
211.142.236.135 - - [08/Aug/2013:06:31:18 -0400] "GET /http:/example.com/ HTTP/1.0" 410 - "http://example.com/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"


and


77.94.48.5.satgate.net - - [09/Aug/2013:08:14:52 +0800] "GET /how2get.html HTTP/1.0" 200 4567 "http://www.meshielving.biz/" "Opera/9.80 <a href=\"http://www.meshielving.biz/\">meshielving</a> (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"
77.94.48.5.satgate.net - - [09/Aug/2013:08:14:54 +0800] "GET / HTTP/1.0" 200 4740 "http://www.meshielving.biz/" "Opera/9.80 <a href=\"http://www.meshielving.biz/\">meshielving</a> (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"

173.232.7.104 - - [10/Aug/2013:01:41:56 +0800] "GET / HTTP/1.0" 410 - "http://markets.financialcontent.com/gatehouse.rrstar/news/read/24660120/godofseo.co_provides_new_tips_to_increase_conversion_rates" "Mozilla/5.0 search marketing (<a href=\"http://markets.financialcontent.com/gatehouse.rrstar/news/read/24660120/godofseo.co_provides_new_tips_to_increase_conversion_rates\">your input here</a>) (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"
173.232.7.104 - - [10/Aug/2013:01:42:06 +0800] "GET / HTTP/1.0" 410 - "http://markets.financialcontent.com/gatehouse.rrstar/news/read/24660120/godofseo.co_provides_new_tips_to_increase_conversion_rates" "Mozilla/5.0 search marketing (<a href=\"http://markets.financialcontent.com/gatehouse.rrstar/news/read/24660120/godofseo.co_provides_new_tips_to_increase_conversion_rates\">your input here</a>) (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"


Note the mistyped "/http:/" instead of "http://"

In the first block, half of the hosts are in China, the others in Taiwan, Argentine, Brazil - clearly is a distributed approach...

EDIT: sorry, the "code" formatting doesn't seem to work

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4626980 posted 1:40 am on Aug 10, 2013 (gmt 0)

I'm a little worried about those 410 responses. They imply that www.example.com/http://example.com used to exist and you've now removed it.

Most of the posted examples are simply, well, stupid robots. Serving a 403 may be more emotionally satisfying than a 404, but the end result is the same. With some robots a 404 may even be better: "Nothing there" vs. "Hm, what don't they want me to see?"

I doubt you'll be excluding any honest humans if you blacklist the UA element

<a href

The ones from China would never get in my front door, but those are individual preferences.

yaimapitu



 
Msg#: 4626980 posted 12:16 am on Aug 11, 2013 (gmt 0)

Thanks for the comments!

The 410 responses in this case only mean that I keep the server load to a minimum by not serving any data (the 410 page is empty). I don't use the 410 page for anything else but blocking known unsavory visitors (no humans ever see a 410; they get 403 or 404, in each case with an explanation as to why they can't get the page they requested and what they can do to get at the info they may be looking for).

I doubt you'll be excluding any honest humans if you blacklist the UA element

<a href

Right... I have added it to my list of blocked UAs...

The ones from China would never get in my front door, but those are individual preferences.

Understand. Where possible I keep CN out, but the domain in question carries contents in Chinese and aims at viewers in east Asia... :)

yaimapitu



 
Msg#: 4626980 posted 2:46 am on Aug 14, 2013 (gmt 0)

On another site where I had not yet added the block for a\ href in the UA string, the following spam caused a 500 error (note: to render the spam URL shown here - four instances of it - inoperable, I have replaced "/" with "|" in all instances). All sites will get the new spam block today. ;)

219.159.198.nn - - [14/Aug/2013:00:15:12 +0800] "GET / HTTP/1.0" 200 4695 "http://www.example.com" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36"
219.159.198.nn - - [14/Aug/2013:00:15:14 +0800] "GET /how2get.html HTTP/1.0" 500 - "http:||www.bangmod-tutor.com|" "Mozilla/5.0 <a href=\"http:||www.bangmod-tutor.com|\">\xe0\xb8\xaa\xe0\xb8\xad\xe0\xb8\x99\xe0\xb8\x9e\xe0\xb8\xb4\xe0\xb9\x80\xe0\xb8\xa8\xe0\xb8\xa9\xe0\xb8\x97\xe0\xb8\xb5\xe0\xb9\x88\xe0\xb8\x9a\xe0\xb9\x89\xe0\xb8\xb2\xe0\xb8\x99</a> (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"
219.159.198.nn - - [14/Aug/2013:00:15:16 +0800] "GET / HTTP/1.0" 500 - "http:||www.bangmod-tutor.com|" "Mozilla/5.0 <a href=\"http:||www.bangmod-tutor.com|\">\xe0\xb8\xaa\xe0\xb8\xad\xe0\xb8\x99\xe0\xb8\x9e\xe0\xb8\xb4\xe0\xb9\x80\xe0\xb8\xa8\xe0\xb8\xa9\xe0\xb8\x97\xe0\xb8\xb5\xe0\xb9\x88\xe0\xb8\x9a\xe0\xb9\x89\xe0\xb8\xb2\xe0\xb8\x99</a> (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"


The text associated with the link is in Thai and means something like "teaching at home" (the name of the spam ULR, "bangmod-tutor" suggests a connection to Bangkok, as well). The IP address is in China (I look after some pages where access from China cannot be blocked wholesale).

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4626980 posted 7:50 am on Aug 14, 2013 (gmt 0)

Where did the 500 error come from? If my own error logs are typical, there will probably be more information than with a 403. Where a 403 just says "client denied by server configuration"* the 500 will give a nice history of just what happened and who tried to do what. On shared hosting this obviously depends on settings you can't control. But they'll include things like mod_security activity that you would otherwise not even know about.


* That is literally all it EVER says. But at least all 403s and 404s are listed; apparently at lower logging levels they might not be.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4626980 posted 12:20 am on Dec 1, 2013 (gmt 0)

:: bump ::

I ran smack into this thread while looking up a UA. So although it's no longer an analytics question-- in fact it probably belonged in SSID all along-- I'll continue here.

Nichrome/self/19

Does anyone know what this is in real life? Like the OP, I found it attached to an unambiguous robot. Matter of fact looks like the identical UA string, down to the last sub-digit. The quoted bit comes at the end, where you expect to find add-ons.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4626980 posted 7:50 pm on Dec 1, 2013 (gmt 0)

The HTTP mode was HTTP/1.0 which is often used by robots.

I have a block on all HTTP/1.0 EXCEPT for known (and acceptable) proxies, which often run in this antiquated mode.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4626980 posted 11:08 pm on Dec 1, 2013 (gmt 0)

:: detour to check ::

The one I posted about yesterday was /1.0. But I found a few from last spring that were /1.1. Those were part of what I call the index.php botnet-- one of those infuriating robots that you can only identify after-the-fact by behavior.

Those look like this:
212.107.116.232 - - [22/Mar/2013:05:16:20 -0700] "GET /fonts/index.php HTTP/1.1" 403 2333 "http://www.example.com/index.php" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5 Nichrome/self/19"

(The IP claims to belong to Saudi Arabia. Shrug.)
The unchanging UA is warning enough. What human would be on the identical version of Chrome for eight months, when they do a full-number upgrade every other week? I can't help suspecting a "Download this toolbar, get a trojan free" type of thing.

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4626980 posted 2:58 pm on Dec 8, 2013 (gmt 0)

Looks like a badly programmed Chinese referrer spammer "SEO" thing. It was fairly active on one of my sites and was using Chinese/Taiwanese ISP ranges.

Regards...jmcc

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved