Forum Moderators: open

Message Too Old, No Replies

slifty.github.io referrals

Not typical referral spam

         

NickMNS

3:23 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As of this morning I started getting a large number of referral from slifty.github.io. This is not typical referral spam, these are actual server calls on my server and not simply GA pings.

The website explains that it is an application that uses the user's browser to randomly visit a very large number of websites in order to fill your browsing history with a bunch of noise with the goal of protecting the users privacy from their ISP.

So as this is invalid / useless traffic it should be blocked. What is the best way to block this traffic? I am assuming than the ip addresses for this traffic will be different in each case(I have not checked my logs yet). So the referrer will need to be blocked. I wonder if this will start a new whack-a-mole war?

Is anybody else seeing traffic from this referrer.

Note to mods: not sure if this is the correct forum, so feel free to move it.

[edited by: phranque at 3:26 pm (utc) on Apr 1, 2017]
[edit reason] email notifications [/edit]

not2easy

4:27 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can add the referrer to your deny list and serve a 403, but you can't stop their efforts. Eventually they may remove you from the list of places they pretend to visit.
in htaccess:
SetEnvIf Referer (gratis|semalt|slifty)$ goaway
then add
deny from env=goaway
to the list of IPs you block

lucy24

8:03 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is not typical referral spam, these are actual server calls on my server and not simply GA pings.
Actually that is typical referer spam, old-style :). It's the fake analytics requests that are new-style atypical referer spam. But this gives you a huge advantage: if they are "really" visiting, you can "really" block them.

not2easy, is that $ in your rule a typo? Seems like it would break the rule, since "slifty" is not in fact the very last thing in the referer string. Maybe \b would be better, if you're thinking of law-abiding referers that are called "usemaltedmilk" or, er, whatever.

keyplyr

8:19 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since there are so many free tools offered there, I just block any referrer containing "github"

There's no valid reason for any remote access using any tool at github IMO.

not2easy

8:22 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, lucy24, I was cobbling things from a few places in a rush. Thanks, wouldn't want that in there.

keyplyr

8:37 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Actually that is typical referer spam, old-style
In a sense.

There are lots and lots of tools at Github available free, or at least you can use them for a limited time & take them for a test drive. Doing so with include their refer in the UA string.

NickMNS

9:02 pm on Mar 31, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@keyplyr I think your idea to block all github referrals is a good one.

@not2easy, I have only ever encountered the "new style" up to this point.

NickMNS

12:49 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I finally got hold of my raw server logs.
Each referral is from a unique IP address, the addresses appear legitimate, most from the US, some from Europe. Each one goes to a unique page and the page is served normally with requests for all the images and other resources. But only a single page is served. They seem always to come from Apple devices iphone and macs.

Here is an excerpt from the my log:
....index.html HTTP/1.1" 200 47214 "https://slifty.github.io/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30" 2040

keyplyr

1:03 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes NickMNS, this is typical. I see the the same on several sites I manage.

Note: I invite you to consider monitoring your raw server logs on a daily basis (or even several times per day) to get an understanding of exactly what goes on. Do not rely on stats reports, not even from your host or from Google. You can only determine who/what goes on with your server by watching the logs & you will learn a lot by doing so :)

NickMNS

3:28 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@keyplyr good point I should get into the habit. But GA is so easy. I should really look deeper because there is lot of activity that is not seen by GA and that is likely the most nefarious.

lucy24

4:24 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But GA is so easy.

Think of them as two different things.

Script-based analytics (GA, piwik, and so on) gives you information you can't get from access logs: Requests for in-page fragments, records of where people went after they left your site (as noted elsewhere, I love knowing that when people leave, they're going where I sent them), repeat visits to the same page (if it's five minutes later it probably won't be requested again from the server) ...

Server access logs give you information you can't get from analytics: unsuccessful or redirected requests, an exact record of supporting files (if they didn't get your images there's something hinky), favicon requests not connected with page visits ...

A third category of information is headers; some requests might as well be waving a big sign “Look at me! I’m a robot!”. But that involves some preliminary setup on your part.

keyplyr

5:53 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And here's examples of "github" being used in the UA string: [webmasterworld.com...]
[webmasterworld.com...]

lucy24

10:56 pm on Apr 2, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, ###. Here I am on the slifty fake-referer list. I thought I was too small to notice. (Query: If the purpose is to mess with the user's own browser logs, wouldn't it be more useful not to send a referer?)

As a nice added touch, the exact URL given in the latest referer line (slifty dot github dot io) ... doesn't exist, although github dot io does. (Idle query: Is the British Indian Ocean Territory yet another of those impoverished island nations that makes some pocket money by selling its TLD?)

I would shrug them off, but they happened to request a page with lots of large images, so ### that.
SetEnvIf Referer slifty bad_ref

keyplyr

11:41 pm on Apr 2, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, if you choose to block, I would recommend:
Block github as referrer
Block github as UA

If you start blocking for every specific tool at Github, you'll need to buy more RAM.

Note: Code examples/discussion should be done in the Code Forum [webmasterworld.com]

NickMNS

12:36 am on Apr 3, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy
The site still exists, only and specifically slifty dot github dot io returns a 404
the site's url is at https: slash slash slifty dot github dot io slash internet_noise slash index.html

As for Github If you would like you can report abuse here:
[github.com...]

.io is a tld that is commonly used for tech companies such as github.

lucy24

2:38 am on Apr 3, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.io is a tld that is commonly used for tech companies such as github.

Yeah, that's why I figured the British India whateveritis is turning a profit, since it's their country code. (Similarly, one can only hope that Greenland and Belgium make G### pay through the nose for their oh-so-convenient short URLs.)

aristotle

4:50 pm on Apr 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is not typical referral spam, these are actual server calls on my server and not simply GA pings.

Hmm.. What about sites like mine that don't have any Analytics code on their pages? How does it work in that case?
I have only ever encountered the "new style" up to this point.

Well you must have the only site on the web that doesn't get ordinary referal spam. Or maybe the reason you haven't encounterted it before is because you look at Analytics instead of your server stats.

At any rate, you've got it backwards when you call Analytics pings "typical" referal spam. The "typical" referal spam has nothing to do with Analytics.

NickMNS

5:05 pm on Apr 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry Aristotle, you lost me.

This is probably due to semantics, my understanding of the definition of referral spam was that goal of the spammer was to have a URL show-up frequently in the affected site's referrals such that the webmaster would then go to that site. In order to do this efficiently the spammers would simply ping the analytics server with the fake referrer. So I fail to see the point of referral spam if the referrer information doesn't show up in analytics. What's the point?

I'm not suggesting that there isn't other forms of spam going on, such as bots trying to find and hack into admin pages and the like. All I'm saying is that the commonly held definition of referral spam is as I explained above. Further to this, the slifty crap was not that, although, in GA, it appears to be.

Regardless, github.io is blocked as refer in my htaccess file and this has not been an issue since. Thanks to all for your help in solving this annoying issue.

lucy24

5:21 pm on Apr 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So I fail to see the point of referral spam if the referrer information doesn't show up in analytics. What's the point?

The point is that analytics is not everyone's be-all and end-all. Some site administrators prefer to look at raw access logs--either instead of, or in addition to, analytics.

github.io is blocked as refer in my htaccess file

And, again, this approach only works if the offender is actually visiting your site. It won't stop them from sending in bogus requests to third-party analytics such as GA. This, in fact, is one of the strong arguments in favor of using an analytics program that lives on your own server (as the present site does): if they can't get in for real, they also can't claim to be getting in.

NickMNS

5:40 pm on Apr 8, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The point is that analytics is not everyone's be-all and end-all.

Yes I get that, but spammers generally target the most common demographic, and I am willing to guess that there are more webmaster using Google Analytics then not for analyzing their traffic.

only works if the offender is actually visiting your site.

True, and for those that are pinging my GA, I set up a filter on the hostname to exclude all traffic that does not have my domain name.
Solved and solved... :0)

What is not solved is all the other spam and bots that are trying malicious activities that do not appear in GA. So far this has not been a problem since I am not using, PHP, SQL or Wordpress, but that is not to say that at some point it will be a problem, the only thing is that it is impossible to predict an unknown threat. That is you do not know the source of threat nor the specific target.

keyplyr

4:05 am on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...it is impossible to predict an unknown threat
Unknown but predictable. That's why we block known server farms*, the source of 85% of all the malicious activity. The remaining threats usually come from compromised ISP accounts or just people using tools to get our files. These can be blocked by malformed header fields, behavior patterns or UA string attributes.

* Blocking Server Ranges [webmasterworld.com]

iamlost

2:21 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A side note:
From a revenue perspective a solid understanding and use of various visitor analytic software and crawler blocking methodologies is pretty much a necessity if one aims to get:
1. Third party advertisers to whitelist your site aka specifically target it.
2. Direct ad space sales.
3. High performing special affiliate arrangements.
Bot defence is a continuing pita even if one likes a challenge; it is, however, a base requirement for the step up from upload and wait for 'them to come'.

aristotle

3:00 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



iamlost =-- most of them are just a nuisance .

If you've got several websites, it can be very time-consuming, as well as an unpleasant task, to try to keep up with all the new ones and block all of them. For the most efficient use of your time, take action against the worst offenders but don't bother with the others.

lucy24

5:52 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's why we block known server farms

I moved to header-based access controls just over a year ago and I'm never ever going back ;)

aristotle

6:52 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



moved to header-based access controls just over a year ago and I'm never ever going back wink

Well Lucy, most of us don't know how to do it that way. Or at least I don't

lucy24

7:48 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first step is to--

Whoops! I see keyplyr advancing with the scissors wearinga purposeful expression. Different venue.

keyplyr

2:52 am on Apr 10, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Everyone uses what's best for their needs. I see thousands of bots with (faked) normal headers. Some I allow, most I block.

IMO *only* checking headers will not be adequate in today's malevolent world. That's why, as I mentioned earlier, in addition to headers, it is essential to cover known bad neighborhood IP ranges (the largest category) behavior patterns & UA string attributes.

NickMNS

3:06 am on Apr 10, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Like Aristotle said, blocking bots almost seems like a full time job. Where do you start? And what do you prioritize, given that one doesn't have time for yet another full time job?

keyplyr

3:19 am on Apr 10, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



NickMNS - Only you can decide whether protecting your online investments are worth the time & effort. Do your lock your car or home?

The down side of *not* protecting your property can be ambiguous. Nothing negative may happen, or something very significant may. I read a lot of stories about stolen images, plagiarized content, hacked sites, lost SE ranking, hijacked sites... the list goes on. Damage to branding & reputation is often non-repairable.

As far as where do you start, this thread [webmasterworld.com...] has listed many server farm ranges, all worthy of blocking.

User Agents are also listed [webmasterworld.com...] and many should be blocked, but many may be benneficial depending on your interests. Those are the 2 categories to start with IMO. The other tricks are easier to understand once you have some experience with the first 2.

I spend less than 20 minutes a day on my personal site.

Discussion on what code to use for blocking should be done in the Code Forum [webmasterworld.com]

not2easy

4:25 am on Apr 10, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Getting started can seem overwhelming, but if the idea that doing something about an ongoing problem is the objective rather than to identify and block every suspect right now, it gets to be almost a pastime or hobby or game that requires much less time than one might think. That old thing about the longest journey beginning with a single step comes to mind.

As far as IPs go - I started out trying to organize them into some kind of order but with new culprits being added constantly I made it much simpler on myself. Once a month I start a new collection and paste all the information in the order I come across it. Some are my own lookups from what I find in the logs. Some are from the shared findings here.

At the end of the month they all go into a folder, simple plain text files. When I spot something, I search that folder using the a and b positions with a space in front: ' 11.22' in multi-file search and it gives me a short list to check out. That whole lookup takes under a minute and 9 times out of ten I find exactly the block I need. If not, I do a whois lookup, then search here and share or find new details. I don't think I spend 20 minutes a day, less than an hour a week and I don't see those "clawbacks" at the end of the month.

Of course there was much more to it in the beginning. My methods work for me, you know what works for you. Hint: search here for Amazon ranges, those are a good place to start.