homepage Welcome to WebmasterWorld Guest from 54.224.53.192
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
No-Referer No-UA
How many stealth are evil versus lazy or bashful.
knonymouse

5+ Year Member



 
Msg#: 4432838 posted 2:17 am on Mar 24, 2012 (gmt 0)

I started to route requests from no-referrer no-UA log entries including
"-" "-"
And then looked up their domains.

Then I pulled that filter, at least for now, when I found an innocent-looking domain involved, i.e. educational institution (high school)

I was initially of the mind that "if they don't announce themselves when they knock on my door, I'm not going to open it for them. Go away." But that may be too harsh.

Any current experience of what portion of the "-" "-" are not necessarily evil? (I have elsewhere read that some people set their browsers to be stealthy thinking about privacy, or whatever.)

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 2:22 am on Mar 24, 2012 (gmt 0)

You forgot stupid.

Some aren't smart enough to know to set the UA or refer, or if they do, don't know how.

Blocking default and blank UAs is kind of bot blocker 101 ;)

DeeCee



 
Msg#: 4432838 posted 2:50 am on Mar 24, 2012 (gmt 0)

@knonymouse,
"Innocent looking" is where many of them come from. The more Innocent (read: security unaware) you are, the more likely your server has been or will be infected by something bad.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 6:58 am on Mar 24, 2012 (gmt 0)

You forgot stupid.

Some aren't smart enough to know to set the UA or refer, or if they do, don't know how.

You mean, like, the way google's faviconbot hasn't got enough sense to put some clothes on before knocking at the door?

If someone in a high-school science class is building robots for educational purposes, let them start by learning how to do it right ;)

Incidentally, I don't think I've ever met a visitor who had a referer but no UA. But they're blocked by UA alone, so I wouldn't notice unless I tripped over them while searching for something else.

knonymouse

5+ Year Member



 
Msg#: 4432838 posted 7:12 am on Mar 24, 2012 (gmt 0)

Assuming a conventional browser is being used in the conventional way, is it easily possible to set it up any usually available browser to show no referer?

In other words, how much effort must a conventional user put in to hiding that tracks? Is that a preference by legitimate users in China, given the state of the internet there?

I'm banning the no referrer, no UA, as I write, and there's quite a number of them.

Rosalind

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4432838 posted 7:53 am on Mar 24, 2012 (gmt 0)

Assuming a conventional browser is being used in the conventional way, is it easily possible to set it up any usually available browser to show no referer?

Yes, it's trivial to do this in Firefox. I do this myself to prevent accidentally leaking urls from admin areas. Some security systems will also block the referral data, but in that case it usually won't be blank.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 10:36 am on Mar 24, 2012 (gmt 0)

I certainly would not block no referrer, unless it is combined in a rule with other specifics.

A growing number of legit users nowadays do not send a referrer. AFAIK the top 4 browsers (IE, Firefox, Chrome & Safari) all offer a setting to not send a referrer.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 9:08 pm on Mar 24, 2012 (gmt 0)

And, of course, anyone with an URL previously bookmarked or memorized will come through as no referer. You certainly don't want to block those :)

knonymouse

5+ Year Member



 
Msg#: 4432838 posted 9:25 pm on Mar 24, 2012 (gmt 0)

Agreed, no referer, alone shouldn't be blocked.

I'm concerned with the combination of no-referer with no-UA.

For now, such requests are directed to page that explains the robot problem, asks for feedback if a real person gets there, and I suggest viewing the page using the Google cache on the search page.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 11:53 pm on Mar 24, 2012 (gmt 0)



I just block blank UA strings, but filter it through a whitelist of IP addresses since there are a handful of agents who do not send a UA, but are beneficial to me.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 12:42 am on Mar 25, 2012 (gmt 0)

And, of course, anyone with an URL previously bookmarked or memorized will come through as no referer. You certainly don't want to block those


When it comes to images no referrer SHOULD be blocked to avoid hot-linking because the images should be referred from the page loading them.

If you're just filtering out all referrers and can't see my images, well tough!

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 2:27 am on Mar 25, 2012 (gmt 0)



If you're just filtering out all referrers and can't see my images, well tough!

Well that will teach 'em!

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 2:56 am on Mar 25, 2012 (gmt 0)

What about the ISPs that don't send a referer for associated files-- images, css etc? They're easy to spot with your eyeballs looking at logs, but that doesn't help the server.

And then there are search engines. It's headline news when they do send a referer. In fact that was my original reason for exempting blank referers in the hotlinking routine, though by this time I've got so many areas blocked from search engines that it may be time to fine-tune the rules.

Hotlinks otoh do send a referer-- it just isn't your page. (There should be a way to deal with this in mod_rewrite but so far I haven't got the hang of it. Same goes for robots sending auto-referers.)

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 6:36 am on Mar 25, 2012 (gmt 0)

Hotlinks otoh do send a referer-- it just isn't your page. (There should be a way to deal with this in mod_rewrite but so far I haven't got the hang of it.)

I put something like this in an htaccess file in each image directory (not the base directory.)

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://my-syte\.com
RewriteCond %{HTTP_REFERER} !^http://www\.my-syte\.com
RewriteCond %{HTTP_REFERER} !^https://my-syte\.com
RewriteCond %{HTTP_REFERER} !^https://www\.my-syte\.com
RewriteCond %{HTTP_REFERER} !^http://some-site\.com
RewriteCond %{HTTP_REFERER} !^http://some-site\.com
RewriteCond %{REMOTE_ADDRESS} !^some\.ip\.address
RewriteCond %{REMOTE_ADDRESS} !^some\.ip\.address
RewriteRule \.(jpg|gif)$ /image/thief\.png [NC,L]


Image requests that do not contain my referrer are instead served thief.png that says "I am a low life thief. I am trying to steal an image from "www.my-site.com"

The blank referrers do see my image until the next day when (if warranted) I change the name/location.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4432838 posted 7:38 am on Mar 25, 2012 (gmt 0)

RewriteCond %{HTTP_REFERER} !^http://example\.com
RewriteCond %{HTTP_REFERER} !^http://www\.example\.com
RewriteCond %{HTTP_REFERER} !^https://example\.com
RewriteCond %{HTTP_REFERER} !^https://www\.example\.com


simplifies to

RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.com

If you have several sites to list and they are all .com then change
?example\. to ?(example1|example2|example3|example4)\.
keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 8:34 am on Mar 25, 2012 (gmt 0)

Thanks g1smd, much simpler. If I was writing them all again (over 7 sites each having several image directories) I'd abbreviate them now, but I think I wrote these about 10 years ago when my RegEx skill was more limited.

I should add that I use the noarchive meta tag on all web pages, so those SEs that support the tag don't cache the pages. The others I list as allowed URLs. However, most have switched to preview snapshots anyway, so it's not much of an issue any longer.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 9:04 am on Mar 25, 2012 (gmt 0)

RewriteCond %{HTTP_REFERER} !^$

You want
!^-?$
because blanks tend to come through as - rather than nothing at all. (I thought it was my server inserting them until I met one that was genuinely "" null.)

simplifies to

RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.com

It only recently occurred to me that I don't need to say (www\.)? for my own site, because by the time they reach a page they've already been redirected to the correct form of the address, either with or without. And it's that correctly named page that will be the referer.

I also rearranged the sequence to list the conditions in order of probability since RewriteConds work on the "sudden death" principle. (I looked it up.) My own site comes first because it's the condition most likely to fail. Then comes "-" for the search engines. And then the short list of authorized hotlinkers.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 9:14 am on Mar 25, 2012 (gmt 0)

You want
!^-?$

Doesn't make any difference, try it.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4432838 posted 9:54 pm on Mar 25, 2012 (gmt 0)

I 403 accesses with no UA but allow valid UA with no referer PROVIDING the referer itself is valid. It should begin with http or https.

This past few days I've begun seeing the referer given as about:blank rather than actual blank from (I assume) a clean browser access. No sign of http, of course. No idea how long this has been going on but I suspect not too long (only 3 cases this month).

Why it's sent from the browser I have no idea (I know WHY it's shown in the browser).

I expected this to result from Firefox but to my surprise it was MSIE 8 in various guises BUT always (at least in these three instances) with GTB7.3, so I'm assuming G has something wrong somewhere (surprise!).

To avoid losing real visitors I've now allowed about:blank as a genuine referer. The things we do to appease G users. :(

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 11:35 pm on Mar 25, 2012 (gmt 0)

allow valid UA with no referer PROVIDING the referer itself is valid

?

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4432838 posted 12:39 am on Mar 26, 2012 (gmt 0)

I guess there could be no more valid referrer than Google but what now since its referrer string shows : q=&esrc,
hiding the search string

It could be any bot parading as Google since they don't need to know any relevant keyword from the site

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4432838 posted 1:09 am on Mar 26, 2012 (gmt 0)

Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 1:23 am on Mar 26, 2012 (gmt 0)

Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.

That was my first response but then I realized we're talking about the referer. Search engines don't operate on the fly like a shopping service :)

It could be any bot parading as Google

You mean, it could be any bot parading as a human sent by google. But unless you've got a completely free-standing page with no peripheral files, you can at least see after the fact if it behaved like a human or a robot.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4432838 posted 1:24 am on Mar 26, 2012 (gmt 0)

Anything claiming to be a SE bot but coming from a non-SE IP range gets knocked back.


Aye.
Especially those numerous google fakers (many old threads), which appear frequently.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4432838 posted 11:14 am on Mar 26, 2012 (gmt 0)

First there were the fake UAs
then came referrer = page accessed
are we now going to see referrer is Google with q=&esrc but it will be a bot and not a human ?
Finding out after the fact is pretty pointless

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4432838 posted 7:57 pm on Mar 26, 2012 (gmt 0)

Lucy - Oops! Should have been...

"I 403 accesses with no UA but allow valid UA with no referer OR valid UA with valid referer. It should begin with http or https."

Sorry. Got carried away.

bakedjake

WebmasterWorld Administrator bakedjake us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4432838 posted 8:54 pm on Mar 26, 2012 (gmt 0)

Timely that this came up now - I'm struggling with this myself right now. No UAs are a lot worse to me than no referrers.

I have no problems blocking No UAs.

I'm currently blocking no referrers but am reconsidering my position. That said, I'm not sure I'd agree that there are "lots" of legit no referrers out there. There are certainly some legit users, but my no referrers also catch a lot of evil traffic as well.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4432838 posted 3:12 am on Mar 27, 2012 (gmt 0)

I just took a detour to grab a chunk of raw logs and look at the no-referer visits. They ran about 10 times as many total daily hits as my human page visits, which was disheartening :( (Also inaccurate, since the initial count included all images, but still.)

What was more disheartening was that after filtering out the empty UAs (about 20% right there), and the ones that would have been blocked anyway, and the ones that would have been allowed anyway (that is, known robots or pre-approved files like robots.txt) ... I was still left with about 2% utterly unaccounted for.* And that included visitors from an area I really don't want to exclude, and one whom I know in person-- and, ahem, at least one WebmasterWorld member.

By the time I got all the named robots filtered out, recent Firefox versions were definitely over-represented, especially among the requests that don't send a referer for images and associated files. More I can't tell. But I blocked a few IPs just so I wouldn't feel my time was entirely wasted.


* Bing in its various authorized forms also totaled around 20%, thanks mainly to its insatiable appetite for robots.txt. Then Yandex and Google. I myself accounted for about 2% of the no-referers. A new one on me is msnbot-NewsBlogs, which never showed its face in January when I was fine-tooth-combing all robots.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved