Forum Moderators: open

Message Too Old, No Replies

another one for the profilers

         

lucy24

12:16 am on Dec 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



File under: Once is happenstance, twice is coincidence, three times is a botnet.

Consider this log excerpt:

5.228.70.abc - - [04/Dec/2014:08:40:04 -0800] "GET /ebooks/aelfric/aelfric_full.html HTTP/1.1" 200 427034 "http://yandex.ru/yandsearch?text=searige&lr=213" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MAARJS)" 
{supporting files snipped}
128.72.134.abc - - [04/Dec/2014:08:40:06 -0800] "GET /ebooks/horn/KingHorn_KH.html HTTP/1.1" 200 119187 "http://yandex.ru/yandsearch?text=toryues+boston&lr=213" "Mozilla/5.0 (Windows NT 5.1; rv:26.0) Gecko/20100101 Firefox/26.0"
{supporting files snipped}
95.220.135.abc - - [04/Dec/2014:08:40:06 -0800] "GET /ebooks/paston/paston5.html HTTP/1.1" 200 289460 "http://yandex.ru/yandsearch?text=maknon+judith&lr=213" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
{supporting files snipped}

Each individual request is utterly plausible: Some human from a Russian IP searches Yandex for a string in Roman script (occasionally including thorn or even yogh, not evident in today's example), and gets all supporting files including analytics.

But, but, but...
#1 Requests always come in sets of 2 or 3, within one or two seconds of each other, from the same search engine. ("lr=213" means Moscow area. Someone in these forums once pointed me to a page that lists all the "lr" values Yandex uses.) Requests are so close together that they're tangled up in logs. On my site, and particularly for these pages, that kind of clustering does not naturally occur. Trust me on this.
#2 Requests are always for ebooks in some form of early English (I've got a clutch of them, spanning the range from OE to barely-Early-Modern).
#3 Some requests are from currently or previously blocked IP ranges-- not server farms but assorted infection-prone machines. As far as I can tell they're all in Russia; don't know if they're really all in Moscow.

It's been going on sporadically for a couple of months. The pattern is so weird that I noticed it right away, but I remain stumped.

Thanks to the unusual content, I have no idea what the equivalent pattern would look like on anyone else's site. About all you can search for is multiple occurrences /yandsearch with matching hour-and-minute timestamp.

Angonasec

5:20 pm on Dec 5, 2014 (gmt 0)



Glad I deny all 5. and all Yandex intrusions I come across.

Good sleep, nyet headaches.

aristotle

6:42 pm on Dec 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Those are pretty big files that are being downloaded. If it were me, I would probably block this thing in case it builds up to something bigger. Well I already block .ru country code anyway (surprised that you don't, since it keeps out a lot of referral spam).

lucy24

8:46 pm on Dec 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Those are pretty big files that are being downloaded.

Yes, that's a characteristic of all the ebooks in this group. On "normal" pages the total download may be close to the same aggregate size-- but not in the HTML alone. In fact I long ago started blocking auto-referers for 5-10 specific, named files for that very reason. (Within htaccess it can only be done on a page-for-page basis, not universally.)

I already block .ru country code

If I did it by name I'd have to go into Lookups mode, ugh. I do block .ua/.ru referers with selected exemptions. But it seems unjust to lock out a law-abiding human just because some other idiot at the same ISP got download-happy with the toolbars (or whatever it is that you do to get infected).

I've occasionally wondered if there is something robots find attractive about extra-large html files. Is there something particular that they're more likely to find?

aristotle

10:59 pm on Dec 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well i block country codes the same way I block certain referers:
# BLOCK COUNTRY DOMAINS
RewriteCond %{HTTP_REFERER} \.(ru|su|ua|cn|md|kz|pl|lv|ro)(/|$) [NC,OR]
RewriteCond %{HTTP_REFERER} \.(by|bg|hr|cz|al|rs|kp|hu|pw|uz|jp)(/|$) [NC]
RewriteRule (^|\.html|/)$ - [F]

# BLOCK REFERERS
RewriteCond %{HTTP_REFERER} (formatn|kochanelli|chimiver|poker|thepostemail) [NC,OR]
RewriteCond %{HTTP_REFERER} (sugarkun|trustcombat|escort|letseks|tipkiller) [NC,OR]
RewriteCond %{HTTP_REFERER} (semalt|buttons|prostitutki) [NC]
RewriteRule (^|\.html|/)$ - [F]

Seems to work

aristotle

12:47 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops I just noticed that my code wouldn't work for your cases, where the referer is yandex.ru/... instead of just yandex.ru

But it does work for most referer spam from those country domains.

lucy24

2:36 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



my code wouldn't work for your cases, where the referer is yandex.ru/... instead of just yandex.ru

Why wouldn't it? That's the whole point of the
(/|$)

locution. Matter of fact, I wouldn't be surprised if you got those lines from me ;) In my case I've explicitly poked a hole for yandex (also google.ru and a couple of others) in the area that blocks country-based referers. Yandex is a perfectly legitimate search engine. It just happens to be what robots cite in faked referers, in the same way that a US-based robot might claim to come from google.com.

Well i block country codes the same way I block certain referers:

Hate to break it to you, but the quoted lines have no effect whatsoever on visitors who happen to come from .ua, .su or what-have-you. It only works if their referer is from one of the offending regions.

wilderness

6:57 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's easier to deny ALL the two-letter domain extensions, and just ALLOW the few you choose ;)

RewriteCond %{HTTP_REFERER} ^http://[a-z]{2}\. [OR]
RewriteCond %{HTTP_REFERER} ^http://.*\.[a-z]{2}/

Than make exceptions!

lucy24

8:16 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_REFERER} ^http://[a-z]{2}\.

? This would seem to exclude only two-letter (sub)domains, and then only when using the http protocol. Why bother with the
http://
part at all? You could simply say
\.[a-z][a-z](/|$)

(I tend to think that when it's down to {2} it is less work for the server to simply repeat [a-z][a-z].) The chances of locking out a legitimate human referer on the pattern of
http://www.example.com/subdir.wx/pagename.html

are pretty remote.

aristotle

11:05 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy I very well could have gotten that code, or an earlier version of it, from you. I know that I've gotten some other valuable code from you, and very much appreciate it.

Hate to break it to you, but the quoted lines have no effect whatsoever on visitors who happen to come from .ua, .su or what-have-you. It only works if their referer is from one of the offending regions.


Yes I knew what it does. It blocks referal spam, not human visitors.That's what I meant to say.

aristotle

11:28 am on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness: It's easier to deny ALL the two-letter domain extensions, and just ALLOW the few you choose ;)


Yes that's probably an easier and better way to do it. What I have now was built up over time as I gradually added new things to the lists. Although I do try to keep them short by only blocking things that become a big nuisance.

wilderness

3:03 pm on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This would seem to exclude only two-letter (sub)domains


lucy,
That syntax came from Jim more than a decade ago.
Back then, all websites used to get refers from both sub & domain extensions (free hosting was fairly common).

things that become a big nuisance


aristotle,
In 'the wilderness', 99.99% are a nuisance ;)

lucy24

8:52 pm on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Back then, all websites used to get refers from both sub & domain extensions (free hosting was fairly common).

Huh. I think today I could count those on the fingers of one hand. And one of them is, I think, "i9" meaning that the pattern should really be \w\w rather than [a-z][a-z].

I know what you mean, though. My last free site was in the form "example.hostname.net", where "example" is the same name I later used for my first domain, because back then the dragons were not as active as today.

But none of this addresses the original question of what the ### is up with those Russian botnets? It's as if the virus periodically sends out a directive that says "At 12:23:34, pick one page-and-referer pair randomly from this list, and pay a visit." And then the occasional time variation is because the infected computer's own clock is off by a second or two. Sometimes the logs themselves are slightly out of whack, suggesting that the same botnet is hitting other sites on the same server at the same time. (But requesting what pages, with what putative query? Who knows...)

keyplyr

9:52 pm on Dec 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RE: blocking .ua, .ru or Yandex

I get a pretty good legit traffic stream from .ua and .ru and tripple-digit daily referrers from Yandex, all versions... but like everything else, it depends on the type of site you have.

I do get a little referrer spam from an SEO guy promoting Russia sites. The pattern is redundant and easily blocked, but I just ignore it.

wilderness

5:16 am on Dec 7, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But none of this addresses the original question of what the ### is up with those Russian botnets?


lucy,
The only .ru visits that I'm getting are refer log spams and they do not contain strings.

They almost always request the same 'directory' and file.

I'm so concerned about that I'm not even willing to spend the time copy and pasting the DEC 2014 log lines and then doing a sort on the number of requests (all 403'd) and the number of different IP's.

lucy24

6:00 am on Dec 7, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



refer log spams

In another thread I learned that referer spam is often identifiable by a particular header configuration, which you can then block using either mod_rewrite or mod_setenvif. I use the latter because I've got five domains sharing an umbrella htaccess.

Normally I don't even look at 403s, so I've no idea what I've been locking out over the last two years. But I think next month I'll do another round of At Home with the Robots (I skipped last year) and then I'll see what they've been up to.

trintragula

2:48 pm on Dec 8, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month




Glad I deny all 5. and all Yandex intrusions I come across.

Good sleep, nyet headaches.


Hang on a sec, that's dashed unsporting old boy! I've a squadron of chaps from blighty here who regularly fly 5./8s. Sky Broadband is their handle. About a dozen on them on my forum. Top chaps all of them. I think there are a few others here who fly 5./8s for Queen and Country.

wilderness

4:22 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hang on a sec, that's dashed unsporting old boy!


I've had 5's & 8's denied for more than a decade and my widgets are 'sporting' ;)

trintragula

4:53 pm on Dec 8, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, you're now denying access to one of the top ten broadband providers in the UK - Sky Broadband. Just pointing out that some of us will have customers there...

lucy24

4:57 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



trintagula, wilderness's widgets don't travel well-- and they're not the kind of thing a norteamericano vacationing in blighty might spontaneously order for a relative back home-- so he can be as cold-blooded as he likes ;) But if, like me, you trade only in information and entertainment, you'll suffer agonies of remorse every time a human is locked out.

trintragula

5:09 pm on Dec 8, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I suspected as much, but reading this thread I might be led to believe that 5. and 8. could be treated as equally unlikely. I've never seen anything human from 4. or 8., but 5. is another story.

wilderness

7:31 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- so he can be as cold-blooded as he likes


I resemble that remark.

There are both cold-blooded and warm-blooded widget categories ;)

lucy24

11:08 pm on Dec 8, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen anything human from 4. or 8.

I block 8 too-- the quintessentially gratifying
Deny from 8

-- but I've met humans (rare, admittedly) from 4. And 5 isn't all server farms, though it may seem that way sometimes.

Angonasec

6:42 am on Dec 9, 2014 (gmt 0)



@TT:
We block per...lenty of your darlings' UK CIDRs; Sky, Google, and of course the notorious nefarious TalkTalk.

Having "customers" will impose blinkers, whilst scanning your logs, so your liberty is restrained a tad. :)

trintragula

9:01 am on Dec 9, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



We block per...lenty of your darlings' UK CIDRs; Sky, Google, and of course the notorious nefarious TalkTalk.

Oh you'll want to watch out for those talktalk guys - football watchers, the lot of them... ;)
I was thinking they're kind of like AOL, but on reading around I find they do actually sell under the AOL brand sometimes.

aristotle

8:04 pm on Dec 9, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy -- to return to your original question, I wonder if what your seeing in your logs is associated with some kind of app that's popular in Russia among people looking for free ebooks, and the app uses yandex to find book recommendations.

dstiles

8:14 pm on Dec 9, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Sky, Google ... TalkTalk

Well, google does not run DSL directly in the UK. Sky and TalkTalk are amongst the top UK DSL providers, so being a UK service I suppose I should block them and not let my clients' sites be bothered by them (with deference to wilderness!). Personally, I find BT is one of the top offenders, apart obviously from US providers, and I cannot afford to block any of them EXCEPT if they misbehave, when the block is automatically applied before it can get a single page. CN and RU are fairly high in the offenders' list but being kind I again only block baddies, not everyone.

> talktalk guys - football watchers, the lot of them

Er... My brother uses TalkTalk. Certainly NOT a football watcher. :)

lucy24

10:31 pm on Dec 9, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



some kind of app that's popular in Russia among people looking for free ebooks, and the app uses yandex to find book recommendations

That would be reassuring ... except that the search strings are always individual words, not something like "free ebooks" or even "Paston letters online". How many people are looking for books containing the word "searige"?

:: detour to look up, which I hadn't previously bothered to do ::

"searige" occurs once in the critical apparatus to Aelfric as a variant reading of "forsêarige" (based on the Latin gloss I think it means "sow"). The most interesting one is "and mane[gh]a to[gh]ædere cômon and hê tô heom spræc"

:: pause to swear at Forums, followed by heavy editing ::

because this does not actually lead to anything of mine on Yandex. On Google, interestingly, it leads to nothing at all, while Bing heroically and hilariously tries guessing. That's, ahem, using the actual characters. There's a couple of yoghs and some long vowels. In fact it sounds vaguely Biblical (at a guess: "and many came together, and he spoke to him" or possibly "to them" because I don't actually know Old English).

It now occurs to me that this may be the same entity that formerly sent a fair number of people to an unrelated ebook (in modern English). That one was so common, I finally set up a redirect to a "convince me you're human" page, because the original involved some 40 separate requests. This time around the queries are too random to be handled that way, unless I redirect all yandex inquiries employing my full lexicon of four words of php.

aristotle

12:22 am on Dec 10, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well maybe the app already has entries for your ebooks in its database together with associated search terms that it can feed into yandex to guide the user to your site. Of course I'm just guessing, and the real process could very well be different in its details, but offhand I don't know of a better general explanation.

lucy24

10:03 pm on Dec 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Follow-up: After the latest trio (in yesterday's logs), I added this redirect, expanding on an existing, more narrowly targeted version:

RewriteCond %{HTTP_REFERER} ^http://yandex\.ru/yandsearch\?text=[^&]+&lr=213(&|$)
RewriteCond %{REQUEST_URI} ^(/.*)
RewriteRule \.html$ http://example.com/boilerplate/redirect.php?oldpage=yandex&newpage=%1 [R=301,L]

(The second Condition is just for capturing purposes, to save the server work.)

The page "redirect.php" already exists. For a few common pages (whether request or referer) it's got a tiny little associative array; that's why one parameter is simply called "yandex". If there's no match for the "newpage" value it uses the full URL, ending up with a page that says-- after loud apologies in assorted languages--

You’ve accidentally replicated the behavior of an undesirable robot, so we have to take this brief detour.

And then there are two links: one for "back where you came from" (not in those words) and another for the requested page.

If any of my mystery visitors are human, I'd expect them to follow the link. If they're not, they will go away with only a small php page instead of a multi-hundred-K ebook with supporting files.

The rule should have been constrained to the /ebooks/ directory, but sometimes-- most recently yesterday-- they request a /hovercraft/ page instead. Fortunately so far not the one very large page in that directory, but I'm not taking chances.
This 38 message thread spans 2 pages: 38