Forum Moderators: open

Message Too Old, No Replies

cookies and redirecting serps

GOOGLE.COM COOKIES and DATAMINING

         

plumsauce

11:05 pm on Aug 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



+++

A while ago there was a thread about seeing redirecting
url's as the link in the google search result pages.
These typically had endings including e=933

At that time I started noticing the same thing was
happening when I searched google myself for the keywords
that I have an interest in. I also noticed that I
*always* had redirections as target links rather than
direct target links.

Today, I decided to experiment. As soon as I deleted and
banned the google.com cookie, I started getting direct links
again. Allowing a cookie to be set by going to the advanced
search preferences did not start the redirections again. I
may not have made it back into the lab rat sample just yet.

If you actually examine the cookie, there is a lot more
there than needed for the preferences that were set.

Especially a ID= which looks like a unique hash to
these eyes.

This might be irrelevant to most users, whether or
not they believe the party line that no "personally
identifiable" information is collected or retained.

However, if you constantly check certain websites
for keywords, and commands like site:, allinurl:, etc.
then having a unique id attached to all those searches
might not be a good idea.

It doesn't take too much imagination to see the
next step.

A data miner might say, hey, I have all these clustered
searches on related keywords, site statistics, and sites
*ALL CORRELATED BY A UNIQUE ID*, that I can run through my
BANBOT. Oh! And I can correlate that against ADWORDS
account logins since the cookie domain is google.com.
Great!

So who's using cookies and seeing these redirects in
the serps most of the time?

And yes, negative effects have been observed on
certain sites.

+++

GoogleGuy

7:29 am on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi plumsauce, welcome to WebmasterWorld. It's a long-known fact that Google sometimes does url redirections in order to do quality checks. Some of the other members may be able to provide pointers to this discussion in past threads. If you want more info, read about it in our privacy policy:
[google.com...]
(It's in the "Links to Other Sites" paragraph.)

For what it's worth, many search engines track every click, instead of only a tiny sample of clicks as a spot check. I think AV and ATW always do url redirects but use mouseover code to make it look like the clicks are direct, for example.

[P.S. Searching on Google will work fine if you want to disable cookies. We just won't be able to store your search settings/preferences.]

plumsauce

8:32 am on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googleguy, thanks for the welcome.

The first thing I did was search for previous threads
before posting using site search. And read them in full.

I have read the google privacy policy before. It is,
in my opinion, less than forthcoming. This is neither
your fault or mine. It is also not unusual in its
vagueness in the internet world.

I am not commenting on the fact that google is usable
without cookies, although it is less than convenient,
since I used to set results=100 in the preferences.
Nor, am I commenting on the use of cookies by other
search engines. Partly since this is a google forum,
and partly because of the market power of google.

What I *am* commenting on is the datamining possibilities
given the use of a unique id assigned to a machine, or
in the case of windows nt/ie, machine user profile.

The fact should not be glossed over, that once my profile
became of interest, perhaps due to the number of searches
done per day, that the redirection became permanent until
I removed the cookie.

To take this a bit further, I would say that all of the
search preferences recorded in the cookie would be just
as effective from a user perspective without the need
to also record and pass back a unique user id hash.

To these eyes, the ID= value in the cookie looks very
much like a MD5 hash. These, when properly generated
are unique to well past values in the trillions.
Actually, the number of unique possibilities has
39 digits in it. For the sake of comparison, this
is greater than the number of all available IP4
addresses.

Finally, to emphasise how weak the privacy policy
is, I will say that while the data may not be
"personally identifiable", this is not very far
from *uniquely identifiable*.

One additional note to readers,

Since the cookie domain is .google.com, it is also
passed back to adwords.google.com as part of standard
browser behaviour. Unless you disable cookies.

+++

andrewrab

12:23 pm on Aug 3, 2003 (gmt 0)

10+ Year Member



Plumsauce... nice post... and something I had noticed before myself...

I've always taken significant efforts to avoid 'tracking' of any kind --- but only by search engines -- I'm not paranoid in general...

What you're pointing out -- though maybe not 'USED' -- could very easily be used... just like the Google API for checking results, etc.

That's why I (and I'm guessing you have for good now too) stopped allowing cookies, stopped using the API, and all sorts of things... heck, better to be safe than sorry... even though we're NOT spamming -- we do A LOT and have A LOT of sites and client sites, and the last thing I need is for a new rule that says, 'okay, you're not spamming, but you're doing so much and getting such good results that you must be a good SEO and therefore, we're going to make life a lot harder for you.'

Great post though... and I think an important one.

Dolemite

6:55 pm on Aug 3, 2003 (gmt 0)

10+ Year Member



[P.S. Searching on Google will work fine if you want to disable cookies. We just won't be able to store your search settings/preferences.]

Hey GoogleGuy,

How about an in-between option that can store these options and not any identifiers? I block google cookies, but sometimes end up allowing them briefly to store preferences.

That's my personal PITA, which I can deal with, but it seems like if Google took a stand on privacy, it might become the "hip" thing to do.

Yidaki

7:26 pm on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Unless you disable cookies.

... well said. :)

You can also trash your cookie on a daily or even more frequent basis.

However, all in all, i think "giving away the secrects of your google usage" is to your advantage. And you can allways disable cookies if you're doing seo research or pron surfing. :)

Kackle

8:17 pm on Aug 3, 2003 (gmt 0)



A typical Google cookie:

Expires on January 17, 2038

PREF=ID=54c7d98b63a83c65

TM=1059941268

LM=1059941268

S=UFB_20EwIfi4mr8T

The ID number is your very own. If you delete your Google cookie, you'll get a new cookie with a new ID the next time you visit Google.

The toolbar also sends the cookie if you phone home for PageRank.

You may have some preferences set between the ID and the TM.

TM=1059941268 means 2003-08-03 20:07:48 GMT -- the time when you first got this cookie.

LM=1059941268 means 2003-08-03 20:07:48 GMT -- the last time you set some preferences.

S= is probably a checksum to insure data integrity. If you fiddle with your cookie, it will probably be detected and you'll get a new one.

Obviously, the ID number is not needed to set preferences. Google is using it for some sort of tracking. Your IP number is recorded at the same time your cookie is read, and your search terms are recorded, and it's all date-time stamped. The only thing that remains is to zap it over to John Poindexter (if they aren't doing this already).

Yidaki

8:23 pm on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Expires on January 17, 2038

brrr - this makes is really evil, no!?

>The only thing that remains is to zap it over to John Poindexter (if they aren't doing this already).

Really? Damn!

hahahaha ... you guys sound realy paranoid.

Don't tell me your daily searches - i might get paranoid as well. :)

>Obviously, the ID number is not needed to set preferences.

hmm, so how should google identify you (to set your personal prefs) without storing and reading a unique value? IP isn't unique, Timestamp isn't unique, User-Agent isn't unique ... so how?

Dolemite

9:04 pm on Aug 3, 2003 (gmt 0)

10+ Year Member



For what it's worth, many search engines track every click, instead of only a tiny sample of clicks as a spot check. I think AV and ATW always do url redirects but use mouseover code to make it look like the clicks are direct, for example.

Yes, other SE's are generally worse about this than google. Though the cookie that expires is 2038 is a bit Orwellian.

The fact that some other SE's track every click and most others are at least worse than Google implies that they rely on this information more than Google.

If Google took a stand on privacy, it would make their competitors do the same, and assuming they do indeed rely upon this information more, it would affect them more significantly. A simple link on google.com that said "We Respect Your Privacy" and a page or two explaining the situation would set things off in a big way.

IMO, privacy is and will be one of the linchpin issues of the new internet. 95% of users might not know what a cookie is, but tell them that you're respecting their privacy (and follow through with it), then that's something they'll understand.

claus

9:55 pm on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> how should somebody .. set your personal prefs without storing and reading a unique value?

If the prefs aren't unique this is easy, just store and read the prefs. If prefs are unique, no chance. With the site in question i doubt that there is a setting not shared by at least two cookie-recipients.

So, to store prefs a unique id is not really necessary - plumsauce is absolutely right here.

What the id is good for, on the other hand, is tracking. Personally, i don't know John Poindexter (really, i don't) or what he could potentially use a bunch of cookie-id's for, but assuming he is not a G employee this amount of data would probably be of no use to him. Even if you combine the id's with a total search history over, say 35 years i doubt it would be of interest to anyone outside Google.

Just the other day i G'ed a special term and found out that this had a significant other meaning as an obscure niche in the pr0n industry. I say this only to make a point not for the amusement - i'm not particulary abused by pr0n as it's been legal here since 1972 but that's not really the point either, let's say it was some extremist site in stead or whatever is on the public opinion's agenda-of-the-day.

I've been using (cookie-based) tracking stats for years, and they are really important and can be extremely useful. What one tends to forget, though, is that the figures derived from such monitoring are never looked at at an individual level. I'll never look at, say, YOUR prefs or YOUR searches - i'll always look at THE TOTAL NUMBER of such prefs or such searches. Individuals - no matter how important they are, are just not interesting enough from a tracking perspective.

(unless, of course they are presidents or something else that may be able to generate a good story in the news, but that we just never know about when looking at the stats - plus, then it't won't be tracking, there's another term for that: journalism)

A sample size of 1 is for almost all purposes statistically insignificant and largely unreliable - you'll always need a certain amount of individuals before you can conclude anything.

It's stupid that firms using these methods forget to inform clearly about this. I've had many talks with privacy-concerned people over the tracking methods i've used, and they are always comforted when i tell them just how totally ignorant i am to their pr0n-viewing habits and that i really don't give a d*** about who they are. Openness about this helps a great deal i have found.

Oh... of course i don't know if G really track SEOs or others on an individual basis. The "worst-case-scenario" is indeed possible as well, although it's far from personal in the literal sense (individual, yes - personal, no). Without information, however, one tends to add imagination.

/claus



<added>

Individual information is needed in order to derive the aggregate numbers. It is not important if it's jack, joe, or jill - what matters is that it's not all jack. Individual tracking is thus not necessarily individual - it's only a prerequisite to assure that it's not just one persons data you observe, as well as to derive some sense of scale (how large a set of observations are we watching?).

Collection and use of data is two seperate processes, it's not the same. The individual nature of the data collection is actually ensuring us that we are not looking at individuals when we analyze data.

Here "we/us" does NOT mean Google as i'm in no way affiliated with the company. Rather, "we/us" refers to the special breed of people that find stats amusing or perhaps even work with such issues... they are out there, beware ;)

</added>