Welcome to WebmasterWorld Guest from 23.20.37.222

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

No-Referer No-UA

How many stealth are evil versus lazy or bashful.

     
2:17 am on Mar 24, 2012 (gmt 0)

5+ Year Member



I started to route requests from no-referrer no-UA log entries including
"-" "-"
And then looked up their domains.

Then I pulled that filter, at least for now, when I found an innocent-looking domain involved, i.e. educational institution (high school)

I was initially of the mind that "if they don't announce themselves when they knock on my door, I'm not going to open it for them. Go away." But that may be too harsh.

Any current experience of what portion of the "-" "-" are not necessarily evil? (I have elsewhere read that some people set their browsers to be stealthy thinking about privacy, or whatever.)
2:22 am on Mar 24, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You forgot stupid.

Some aren't smart enough to know to set the UA or refer, or if they do, don't know how.

Blocking default and blank UAs is kind of bot blocker 101 ;)
2:50 am on Mar 24, 2012 (gmt 0)



@knonymouse,
"Innocent looking" is where many of them come from. The more Innocent (read: security unaware) you are, the more likely your server has been or will be infected by something bad.
6:58 am on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You forgot stupid.

Some aren't smart enough to know to set the UA or refer, or if they do, don't know how.

You mean, like, the way google's faviconbot hasn't got enough sense to put some clothes on before knocking at the door?

If someone in a high-school science class is building robots for educational purposes, let them start by learning how to do it right ;)

Incidentally, I don't think I've ever met a visitor who had a referer but no UA. But they're blocked by UA alone, so I wouldn't notice unless I tripped over them while searching for something else.
7:12 am on Mar 24, 2012 (gmt 0)

5+ Year Member



Assuming a conventional browser is being used in the conventional way, is it easily possible to set it up any usually available browser to show no referer?

In other words, how much effort must a conventional user put in to hiding that tracks? Is that a preference by legitimate users in China, given the state of the internet there?

I'm banning the no referrer, no UA, as I write, and there's quite a number of them.
7:53 am on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Assuming a conventional browser is being used in the conventional way, is it easily possible to set it up any usually available browser to show no referer?

Yes, it's trivial to do this in Firefox. I do this myself to prevent accidentally leaking urls from admin areas. Some security systems will also block the referral data, but in that case it usually won't be blank.
10:36 am on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I certainly would not block no referrer, unless it is combined in a rule with other specifics.

A growing number of legit users nowadays do not send a referrer. AFAIK the top 4 browsers (IE, Firefox, Chrome & Safari) all offer a setting to not send a referrer.
9:08 pm on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



And, of course, anyone with an URL previously bookmarked or memorized will come through as no referer. You certainly don't want to block those :)
9:25 pm on Mar 24, 2012 (gmt 0)

5+ Year Member



Agreed, no referer, alone shouldn't be blocked.

I'm concerned with the combination of no-referer with no-UA.

For now, such requests are directed to page that explains the robot problem, asks for feedback if a real person gets there, and I suggest viewing the page using the Google cache on the search page.
11:53 pm on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





I just block blank UA strings, but filter it through a whitelist of IP addresses since there are a handful of agents who do not send a UA, but are beneficial to me.
12:42 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



And, of course, anyone with an URL previously bookmarked or memorized will come through as no referer. You certainly don't want to block those


When it comes to images no referrer SHOULD be blocked to avoid hot-linking because the images should be referred from the page loading them.

If you're just filtering out all referrers and can't see my images, well tough!
2:27 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





If you're just filtering out all referrers and can't see my images, well tough!

Well that will teach 'em!
2:56 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



What about the ISPs that don't send a referer for associated files-- images, css etc? They're easy to spot with your eyeballs looking at logs, but that doesn't help the server.

And then there are search engines. It's headline news when they do send a referer. In fact that was my original reason for exempting blank referers in the hotlinking routine, though by this time I've got so many areas blocked from search engines that it may be time to fine-tune the rules.

Hotlinks otoh do send a referer-- it just isn't your page. (There should be a way to deal with this in mod_rewrite but so far I haven't got the hang of it. Same goes for robots sending auto-referers.)
6:36 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hotlinks otoh do send a referer-- it just isn't your page. (There should be a way to deal with this in mod_rewrite but so far I haven't got the hang of it.)

I put something like this in an htaccess file in each image directory (not the base directory.)

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://my-syte\.com
RewriteCond %{HTTP_REFERER} !^http://www\.my-syte\.com
RewriteCond %{HTTP_REFERER} !^https://my-syte\.com
RewriteCond %{HTTP_REFERER} !^https://www\.my-syte\.com
RewriteCond %{HTTP_REFERER} !^http://some-site\.com
RewriteCond %{HTTP_REFERER} !^http://some-site\.com
RewriteCond %{REMOTE_ADDRESS} !^some\.ip\.address
RewriteCond %{REMOTE_ADDRESS} !^some\.ip\.address
RewriteRule \.(jpg|gif)$ /image/thief\.png [NC,L]


Image requests that do not contain my referrer are instead served thief.png that says "I am a low life thief. I am trying to steal an image from "www.my-site.com"

The blank referrers do see my image until the next day when (if warranted) I change the name/location.
7:38 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_REFERER} !^http://example\.com
RewriteCond %{HTTP_REFERER} !^http://www\.example\.com
RewriteCond %{HTTP_REFERER} !^https://example\.com
RewriteCond %{HTTP_REFERER} !^https://www\.example\.com


simplifies to

RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.com


If you have several sites to list and they are all .com then change
?example\.
to
?(example1|example2|example3|example4)\.
8:34 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks g1smd, much simpler. If I was writing them all again (over 7 sites each having several image directories) I'd abbreviate them now, but I think I wrote these about 10 years ago when my RegEx skill was more limited.

I should add that I use the noarchive meta tag on all web pages, so those SEs that support the tag don't cache the pages. The others I list as allowed URLs. However, most have switched to preview snapshots anyway, so it's not much of an issue any longer.
9:04 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



RewriteCond %{HTTP_REFERER} !^$

You want
!^-?$
because blanks tend to come through as - rather than nothing at all. (I thought it was my server inserting them until I met one that was genuinely "" null.)

simplifies to

RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.com

It only recently occurred to me that I don't need to say (www\.)? for my own site, because by the time they reach a page they've already been redirected to the correct form of the address, either with or without. And it's that correctly named page that will be the referer.

I also rearranged the sequence to list the conditions in order of probability since RewriteConds work on the "sudden death" principle. (I looked it up.) My own site comes first because it's the condition most likely to fail. Then comes "-" for the search engines. And then the short list of authorized hotlinkers.
9:14 am on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You want
!^-?$

Doesn't make any difference, try it.
9:54 pm on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I 403 accesses with no UA but allow valid UA with no referer PROVIDING the referer itself is valid. It should begin with http or https.

This past few days I've begun seeing the referer given as about:blank rather than actual blank from (I assume) a clean browser access. No sign of http, of course. No idea how long this has been going on but I suspect not too long (only 3 cases this month).

Why it's sent from the browser I have no idea (I know WHY it's shown in the browser).

I expected this to result from Firefox but to my surprise it was MSIE 8 in various guises BUT always (at least in these three instances) with GTB7.3, so I'm assuming G has something wrong somewhere (surprise!).

To avoid losing real visitors I've now allowed about:blank as a genuine referer. The things we do to appease G users. :(
11:35 pm on Mar 25, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



allow valid UA with no referer PROVIDING the referer itself is valid

?
12:39 am on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess there could be no more valid referrer than Google but what now since its referrer string shows : q=&esrc,
hiding the search string

It could be any bot parading as Google since they don't need to know any relevant keyword from the site
1:09 am on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.
1:23 am on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.

That was my first response but then I realized we're talking about the referer. Search engines don't operate on the fly like a shopping service :)

It could be any bot parading as Google

You mean, it could be any bot parading as a human sent by google. But unless you've got a completely free-standing page with no peripheral files, you can at least see after the fact if it behaved like a human or a robot.
1:24 am on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.


Aye.
Especially those numerous google fakers (many old threads), which appear frequently.
11:14 am on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First there were the fake UAs
then came referrer = page accessed
are we now going to see referrer is Google with q=&esrc but it will be a bot and not a human ?
Finding out after the fact is pretty pointless
7:57 pm on Mar 26, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Lucy - Oops! Should have been...

"I 403 accesses with no UA but allow valid UA with no referer OR valid UA with valid referer. It should begin with http or https."

Sorry. Got carried away.
8:54 pm on Mar 26, 2012 (gmt 0)

WebmasterWorld Administrator bakedjake is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Timely that this came up now - I'm struggling with this myself right now. No UAs are a lot worse to me than no referrers.

I have no problems blocking No UAs.

I'm currently blocking no referrers but am reconsidering my position. That said, I'm not sure I'd agree that there are "lots" of legit no referrers out there. There are certainly some legit users, but my no referrers also catch a lot of evil traffic as well.
3:12 am on Mar 27, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I just took a detour to grab a chunk of raw logs and look at the no-referer visits. They ran about 10 times as many total daily hits as my human page visits, which was disheartening :( (Also inaccurate, since the initial count included all images, but still.)

What was more disheartening was that after filtering out the empty UAs (about 20% right there), and the ones that would have been blocked anyway, and the ones that would have been allowed anyway (that is, known robots or pre-approved files like robots.txt) ... I was still left with about 2% utterly unaccounted for.* And that included visitors from an area I really don't want to exclude, and one whom I know in person-- and, ahem, at least one WebmasterWorld member.

By the time I got all the named robots filtered out, recent Firefox versions were definitely over-represented, especially among the requests that don't send a referer for images and associated files. More I can't tell. But I blocked a few IPs just so I wouldn't feel my time was entirely wasted.


* Bing in its various authorized forms also totaled around 20%, thanks mainly to its insatiable appetite for robots.txt. Then Yandex and Google. I myself accounted for about 2% of the no-referers. A new one on me is msnbot-NewsBlogs, which never showed its face in January when I was fine-tooth-combing all robots.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month