Welcome to WebmasterWorld Guest from 54.159.246.164

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Semalt

Referrer spamming gone mad.

   
8:24 pm on Feb 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seems like another on of those pesky SEO/SEM firms from mother country.

User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36
REF: http:// semalt .com/crawler.php?u=domain.com

Several domains hit, mostly from LACNIC ranges. Greece and Italy IPs there 2.

Me thinks it's a bot, never(almost) had visitors or competitors from Colombia or Peru.

Net Alex, nechorosho....
10:02 pm on Mar 2, 2014 (gmt 0)



Yes, when they find you they come in droves, and from places all over (such as Brazil) that never send any real traffic.
Referers are of the type
"http://semalt.com/crawler.php?u=http://www.example.com"
- no proof that they always identify themselves, but at least the many hosts that use this referer are easily blackholed.
1:12 am on Mar 3, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



"they come in droves, and from places all over"

They been getting turned away in droves, however had one get through recently from a Comcast Philly IP, which I assumed was a compromised machine.
10:17 pm on Mar 13, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Huh. Came by to ask about them, since I've been much vexed in recent days by Brazilian humanoids giving semalt+mysite as referer.

Could it be some type of preview? All requests are complete, including the with-javascript version of piwik. The latest attempt was locked out because they had the wrong version of the site name in the referer slot. (I didn't think to include an opening anchor.) With a robot that would have been the last of it. But here it led to paired requests for

shared stylesheet
error stylesheet
piwik js (from the 403 page)
piwik php

Why paired? Because the original request was effectively an auto-referer and got blocked at its originally requested (wrong) hostname. So each request for a supporting file came in to the wrong hostname and was duly redirected.

So either a botnet or a preview.

(Aside: Since the vast majority of robots don't ask for non-page files, I find it more efficient on the whole not to block non-page requests other than general IP blocks. Botnets are a different story.)

Wonder if I've ever had a legitimate human visitor from Brazil? The rest of LACNIC occasionally takes a look at Perez the Mouse, but that wouldn't apply to Brazil. Hm. Maybe I should just start blocking them as the occasion warrants, same as I do with botnets that happen to come from eastern Europe.
10:21 pm on Mar 13, 2014 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



OK, double checked my feeble memory with someone else.

This sounds like a site that might be listed in on the "Sites" list in your AdSense stats if you run AdSense.

I looked at a site for someone today and the .com version was listed there.

Could that be through the preview thing Lucy24 mentioned?

[added] OK, checked my own AdSense "Sites" list and sure enough, there it is.
11:05 pm on Mar 13, 2014 (gmt 0)



[blog.semalt.com...] :)
7:54 am on Mar 14, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




First I've seen them. Blocked by putting my domain name in their UA (I do allow a couple exceptions) and of course by the obvious buzz term.

112.202.157.196 - - [13/Mar/2014:23:14:22 -0700] "GET / HTTP/1.1" 403 879 "http://semalt.com/crawler.php?u=http://my-domain.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
9:09 am on Mar 14, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Blocked by putting my domain name in their UA

When your fingers typed "UA" did your brain mean "referer"? Every time I think I've found another useful auto-referer block, I remember that search engines also include my sitename in the referer string :(

Annoying that the Bad Word "crawler" is also in the referer, instead of in the UA where it would do some good.
9:22 am on Mar 14, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





When your fingers typed "UA" did your brain mean "referer"?



Why yes, yes it did :)
9:53 pm on Mar 14, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Awright, that does it. Three more requests-- on my personal site, whose front page rarely gets three humans in a single day. (I am not a front-driven site at the best of times. On this one, humans go straight for the /games/ directory.)

SetEnvIf Referer semalt keep_out


I don't normally use this form-- in fact it looks as if I've never used "SetEnvIf Referer" before and had to go check the wording-- but it's that or add three separate RewriteRules in three separate htaccess files.

Hmph.
1:38 am on Mar 15, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I already had the rewrite block for "example" accompanied by two different allow lists, one with ^ and one without.
8:42 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Follow-up:

I think it's a botnet. I searched raw logs for the two affected sites. Sample:

189.47.122.156 - - [11/Mar/2014:12:11:29 -0700] "GET / HTTP/1.1" 200 2558 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" 
187.127.119.103 - - [11/Mar/2014:18:01:30 -0700] "GET / HTTP/1.1" 200 2558 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.39.167.155 - - [12/Mar/2014:10:03:02 -0700] "GET / HTTP/1.1" 200 2558 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
186.244.221.156 - - [12/Mar/2014:12:53:59 -0700] "GET / HTTP/1.1" 200 2558 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
200.203.64.88 - - [12/Mar/2014:16:43:48 -0700] "GET / HTTP/1.1" 200 2558 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"

et cetera. 29 front-page requests in all, beginning abruptly on 11 March on both sites. Notice the unifying theme? Every single request had the identical UA

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36


The UA itself-- separate search-- first showed up in the first half of February. I guess that's when Chrome 32 was released; someone will know.

Why do I think a botnet? Because what I pasted above is only what I get in a referer search. If I do an IP search there are five times as many hits, because it looks like this:

177.158.151.67 - - [12/Mar/2014:16:03:28 -0700] "GET / HTTP/1.1" 403 1642 "http://semalt.com/crawler.php?u=http://example.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" 
177.158.151.67 - - [12/Mar/2014:16:03:31 -0700] "GET /sharedstyles.css HTTP/1.1" 301 588 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:31 -0700] "GET /boilerplate/errorstyles.css HTTP/1.1" 301 608 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:32 -0700] "GET /sharedstyles.css HTTP/1.1" 200 4842 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:32 -0700] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 1790 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:32 -0700] "GET /piwik/piwik.js HTTP/1.1" 301 586 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:32 -0700] "GET /piwik/piwik.js HTTP/1.1" 200 22980 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:33 -0700] "GET /piwik/piwik.php?action_name=The 403 Page&idsite=1&rec=1&r=511966&h=20&m=3&s=12&url=http://example.com/&urlref=http://semalt.com/crawler.php?u=http://example.com&_id=a73e83474fb77af2&_idts=1394665392&_idvc=1&_idn=1&_refts=1394665392&_viewts=1394665392&_ref=http://semalt.com/crawler.php?u=http://example.com&cookie=1&res=1366x768 HTTP/1.1" 301 1182 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"
177.158.151.67 - - [12/Mar/2014:16:03:33 -0700] "GET /piwik/piwik.php?action_name=The 403 Page&idsite=1&rec=1&r=511966&h=20&m=3&s=12&url=http://example.com/&urlref=http://semalt.com/crawler.php?u=http://example.com&_id=a73e83474fb77af2&_idts=1394665392&_idvc=1&_idn=1&_refts=1394665392&_viewts=1394665392&_ref=http://semalt.com/crawler.php?u=http://example.com&cookie=1&res=1366x768 HTTP/1.1" 200 302 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"

(disencoded for readability) Most robots ask only for html, don't always follow redirects, and almost never execute javascript. This is human behavior.

See the "res=" part? That's the only thing that changes. Assorted monitor sizes, all in the middling desktop range. Botnet with fake UA, or a vulnerability in one version of Chrome.

Oh, yes, the IPs. I found isolated specimens from the US, Malaysia and Indonesia, and a handful from other South American countries, but the overwhelming majority are Brazil as noted earlier.
8:58 pm on Mar 19, 2014 (gmt 0)



Semalt stuffed up my nice stats as well...
Block them out by simply adding this to your .htaccess file in your root folder:
RewriteEngine on
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* - [F]
11:29 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



They're blocked. Unfortunately this leads to more requests, because...

A non-blocked request will be redirected from example.com to the preferred form www.example.com. All supporting files will then be requested from www.

A blocked request is blocked at its originally requested URL. All supporting files are therefore requested from without-www, leading to each separate one being redirected. These requests of course give the 403 page, not semalt, as referer, so they can't be blocked. Well, short of blocking the entire nation of Brazil, which seems overkill. There are humans in Brazil aren't there?

It now occurs to me that the one thing they don't request is the favicon (which typically has no referer at all). Huh.
12:10 am on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





There are humans in Brazil aren't there?

Not during Carnival
3:43 am on Mar 20, 2014 (gmt 0)



Lucy24, I am sure you only need to give your stats some time to flush Semalt out... With adding the mentioned code in the .htaccess file it completely blocks Semalt.

keyplyr! And we were not invited! ha ha
9:14 am on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



it completely blocks Semalt.

It would block them if they were a robot. But they appear to be running on infected human machines. That means they request all supporting files, not just the html. My error document happens to call on two stylesheets (yes, this is excessive, but I hate having to say the same thing twice-- and I especially hate having to change it twice when I redesign) and also analytics. All of this is intended to detect humans who got locked out by mistake. Botnets are, I guess, collateral damage.

At least the 403 page doesn't have pictures ;)

Edit: I went the SetEnvIf route because this way I can put it in my shared htaccess, protecting all five sites. RewriteRules are site-specific-- so, again, I'd have to give the same rule at least two or three times, depending on how many sites I want to cover.
11:59 am on Mar 20, 2014 (gmt 0)



This is not a human visit... Its a spider bot of some sort.
6:40 pm on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



:: sigh ::

What we have here is a failure to communicate.
9:17 pm on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Lucy - not necessarily a botnet. Think panscient and other distributed bots: they all run from (mostly) uninfected machines. Well, uninfected apart from the idiot bots themselves.
3:41 am on Mar 21, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



uninfected apart from the idiot bots themselves

This may be an academic distinction :)

:: idly wondering how many people would unwittingly sign up for a Distributed Robots venture if it were presented in just the right way ::
9:26 pm on Mar 21, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Majestic MJ12? It used to be popular; maybe still is.
2:10 am on Mar 22, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



idly wondering how many people would unwittingly sign up for a Distributed Robots venture


FunWebProducts ;)