Welcome to WebmasterWorld Guest from 54.196.2.131

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Google referers

Not HTTPS

     
10:34 am on Oct 11, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3165
votes: 8


Thinking all google URLs were HTTPS now I blocked referers carrying HTTP - www and non-www.

Today I had to unblock 30-odd blocked IPs because of it.

Is anyone actually seeing HTTPS in google referers? And google - what are you playing at?!
10:53 pm on Oct 11, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


I have this line in shared htaccess:
SetEnvIf Referer ^http://(www\.)?google\.com/?$ bad_ref
See the closing anchor? Some people do still come in from http google--especially http google dot something-other-than-com, which I haven't bothered about yet. But there will always be a visible query. If not, they're fakers.

Edit: Cursory research points especially to Androids using assorted national googles.
2:38 am on Oct 12, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


Well Google Inc and the various Google proxies don't necessarily use a secure protocol. They probably will at some point.
10:18 am on Oct 12, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3165
votes: 8


Thanks, Lucy. I had that previously but changed it to trap all non-SSL google referers.

Keyplr - I know that now! :) Considering their attempt to enforce SSL on everyone it seems stupid not to apply it to their referers.
10:25 am on Oct 13, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3165
votes: 8


Well, I put back the block on bare google URLs and got two rejects which I am 90% sure are genuine with http[://]www[.]google[.]com/ referers (my []). So for now, only trapping the non-www version.

Google - the company of which it is said "What?!"
5:35 pm on Oct 13, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


Don't forget, anyone can lease Google proxy ranges and the lower half of Google Inc was used as development space for a long time.
9:05 pm on Oct 13, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


By weird coincidence, only yesterday I found a blocked human ... and poring over logs, the culprit seems to have been a bare
http://www.google.com/

referer. No query.

UA was an iPhone. Further investigation confirms that I've never bothered to unset my bad_ref variable for iPhones, only for Androids. But, dammit, what is the point of unsetting every possible environmental variable for select user-agents when all a robot has to do is claim to be that user-agent? So far, mercifully, bad_ref is rarely the only factor.
9:36 am on Oct 20, 2017 (gmt 0)

New User

joined:Aug 21, 2016
posts:17
votes: 0


The requests I receive are exclusively with an HTTPS scheme.

^https?://(?:www\.)?google\.com/?$

works fine for me in this case.
8:23 pm on Oct 20, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


Why do you want to block all requests with a google.com referer? Do you deny the Googlebot, or set noindex tags, so you know upfront that they're not really from Google?
8:40 pm on Oct 20, 2017 (gmt 0)

New User

joined:Aug 21, 2016
posts:17
votes: 0


Lucy, this wouldnt block all requests (end anchor). It actually is almost identical to the regular expression you posted earlier, with the difference that it is also includes https and does not capture the www (which does not matter much matching-wise).
1:11 am on Oct 21, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


this wouldnt block all requests

I stand corrected. It would only block requests from google.com with no national extension--i.e. the "default" google, as found in the US. If you are based in another country, this may or may not be a risk worth taking.
1:48 am on Oct 21, 2017 (gmt 0)

New User

joined:Aug 21, 2016
posts:17
votes: 0


It would only block requests claiming to come from Google's root directory, actual Google links should contain a path.

But again, that regular expression is identical to yours, only with the HTTPS difference.
5:22 am on Oct 21, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


actual Google links should contain a path
Do you happen to have a site that has already been HTTPS for years? On ordinary, non-secure sites, HTTP Google referers include the query string, while HTTPS referers don't; that's the point of making the distinction. Even on secure sites, there will be legitimate HTTPS Google referers that come in without a path and/or query. And, as we've recently established, some smartphones don't send a query string even with HTTP Google.

Edit: I spent some time poring over headers for my HTTPS site. The https google.com/ referers tend to coincide with the "Upgrade-Insecure-Requests" header, but I don't think it's dispositive. It's a pretty common header--and an awfully small sample size.
11:43 am on Oct 21, 2017 (gmt 0)

New User

joined:Aug 21, 2016
posts:17
votes: 0


My bad, this time I shall stand corrected. I wasnt aware of Referrer-Policy and its implications.

In this particular case the referrer should only be sent if the destination is on HTTPS as well and on HTTP not at all. Even in the case of HTTPS however only the origin (only protocol and host, no path or query string). I havent tested it with other browsers yet, but Chrome at least seems to always send the origin anyhow.

Under these circumstances I do retract my s? "enhancement" :)
6:06 pm on Oct 21, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


In this particular case the referrer should only be sent if the destination is on HTTPS as well and on HTTP not at all. Even in the case of HTTPS however only the origin (only protocol and host, no path or query string).
I'll be darned. I always thought the rule was that in https >> http only the query string is to be omitted. (This would have been a quaint, old-fashioned rule, since “friendly” URLs from the CMS of your choice means there is absolutely no difference between path and query. Everything-after-the-hostname makes more sense.)

In the fine tradition of experiments with only a single data point, I made a link to my personal https site, Previewed, clicked, and then scurried over to my logged headers. And sure enough, there is no referer at all, not even the bare sitename. That's in Firefox 56.

###. And ### again. Does that mean that many of the visits I've been thinking of as direct--assuming bookmarks and the like--actually were sent by some site or other?

And double ###. Does this, in turn, mean that when my “real” sites go HTTPS, then piwik--which lives on my personal site--will break, since it depends in its entirety on a long complicated query string being sent from one place to another?
10:19 pm on Oct 21, 2017 (gmt 0)

New User

joined:Aug 21, 2016
posts:17
votes: 0


I'll be darned. I always thought the rule was that in https >> http only the query string is to be omitted. (This would have been a quaint, old-fashioned rule, since “friendly” URLs from the CMS of your choice means there is absolutely no difference between path and query. Everything-after-the-hostname makes more sense.)

To be completely honest, up until my last response I was not even aware of this kind of "special" referrer handing when TLS is involved.

Section 15.1.3 of HTTP's "old" RFC 2616 is pretty clear about that
Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.

It is a should and not a must but still quite clear.

In Google's case it appears to be sent as their referrer policy explicitely requests the hostname to be sent.

And double ###. Does this, in turn, mean that when my “real” sites go HTTPS, then piwik--which lives on my personal site--will break, since it depends in its entirety on a long complicated query string being sent from one place to another?

If it takes the long complicated query string from the referrer, I'd assume it would break. You could change Referrer-Policy but that would likely still not include the query string.


At the end of the day though, learnt something new today (respectively yesterday :) ) .... TLS completely screws with the referrer :D
12:29 am on Oct 22, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


HTTP to HTTP = referrer
HTTP to HTTPS = no referrer usually, but depends on first request made by browser*
HTTPS to HTTP = no referrer
HTTPS to HTTPS = referrer

* it is not the page alone that affects this, it is also the browser's negotiation with the server.

Workarounds:
<meta name="referrer" content="always">
Header set Referrer-Policy: "no-referrer" 
Header set Referrer-Policy: "no-referrer-when-downgrade"
Header set Referrer-Policy: "origin"
Header set Referrer-Policy: "origin-when-cross-origin"
Header set Referrer-Policy: "same-origin"
Header set Referrer-Policy: "strict-origin"
Header set Referrer-Policy: "strict-origin-when-cross-origin"
Header set Referrer-Policy: "unsafe-url"
Personally, I use "no-referrer" to respect my visitor's privacy and to not misrepresent being a secure website.
2:24 am on Oct 22, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


In Experiment A, linking from this site to my personal HTTPS site, no referer was sent.

In Experiment B, linking from my HTTPS test site to my personal HTTPS site, both living on the same server, a full referer was sent.

Is it really "referrer" with two Rs* just when we've got totally used to misspelling it?

:: wandering off to figure out what the heck a “Refer(r)er-Policy” header does ::

Oh. It's brand-new. That explains why I've never heard of it. (Not, ahem, that I would necessarily have heard of it anyway. It just gives me a better excuse.)


* Yeah, yeah, all right. Four Rs.
3:00 am on Oct 22, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


Is it really "referrer" with two Rs* just when we've got totally used to misspelling it?
Proper spelling is with two Rs. Some geek spelling uses one. Pretty stupid. The only reason I even care is which way works with the server.
4:19 am on Oct 22, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


I need to be doubly careful here, because misspelled headers are one of my robot flags. Yes, it's an actual environmental variable called bot_header.
5:10 am on Oct 22, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


BTW - using Header set Referrer-Policy: "no-referrer" on your HTTPS site does not stop referrers in your logs, it stops reverse tracking from your page to the next HTTPS page/site the visitor goes to, if following a direct link from your page.

Using this Header (or meta tag equivalent) protects your visitor (as I mentioned above) and IMO is what all this secure stuff is really about.
5:30 pm on Oct 22, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


What does “reverse tracking” mean?

I can see where “Header set” can’t possibly affect your logs, since that’s a record of requests, while what you’re setting is a response header. The request has already happened.
5:59 pm on Oct 22, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


By "reverse tracking" I meant someone like you, looking at their piwik, or other stats/analytics report, and following referrers. This is also done effectively by bots, and when it's done all the way to the point of sale origin, it could be compromising.
6:08 pm on Oct 22, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


You mean like tracking outlinks from my site? Heh, that's my favorite feature of piwik. I love knowing that when they leave, as leave they must, they’re going where I sent them.
6:32 pm on Oct 22, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10116
votes: 550


Not you... someone *like* you (sounds like a broadway musical tune.)

Other people, man-in-the-middle hacks, marketing bots, etc are blocked from seeing where the user came from by using the Referrer-Policy: "no-referrer"
11:15 am on Oct 23, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3165
votes: 8


Referrer policy is not the only HTTPS security header required to make a site REASONABLY secure. I've made a new posting on this called "HTTPS Security Headers".
6:34 pm on Oct 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14259
votes: 552


One more google-referer quirk. In recent weeks I've seen a fair number of robots.txt requests coming in with a bare-google referer, whether http or https. This is of course a flagrant lie--also a pointless one, since absolutely everyone is allowed to see robots.txt--but wtf?