Forum Moderators: martinibuster

Message Too Old, No Replies

Tracking what searches are being performed through AdSense websearch

Using a form of a dynamic image bean

         

asp4bunnies

3:02 am on Jun 20, 2004 (gmt 0)

10+ Year Member



I'm not sure if Google intended to give us this kind of functionality or not, but the ability for us to include our own logo on their web search page gives us the ability to see the following on our own:

a) Exactly how many searches are performed on their site. This is useful in comparing our statistics with what Google shows us in the Search report statistics.
b) Exactly what search terms are being performed on our sites on specific pages. This is useful for a variety of reasons, most of which involve getting to know exactly what your visitor is looking for when they visit your page and providing it to them so they won't need to search. There is also a major concern for privacy abuse, which I will address below.

So here's how to do it. First off, I'll state for the record that you DO need some ASP, CGI or PHP background. I will be developing a script in ASP that does what I'm about to describe (except for the privacy-abusive aspects) and I will post it in this thread when it's ready. If someone could produce one in PHP that would be great:

1) In ASP: Get a third party dll, such as ASPJpeg, AspImage or Shotgraph (google search for them), that allows you to import an image and display it's output via your script. This is not even needed though, as you can do this largely through XML functionality (though it's not as stable to do it this way). PHP and ASP.NET have this functionality built in (fairly stable too), so no third party dll is needed.

Essentially what you'll be doing with the script is parsing binary data (your image file) and outputting it through your script (i.e. ASP). So even though your script ends in .php or .asp, when you go to that script url on the browser, all you will see is the image file it has processed.

2) Add logging functionality. Because of your access to different data via your script you can see and log the following relevant data:

a) the http referrer of your image script (i.e. the google page that called your image script when attempting to display it as your logo). This referrer is the actual page url of google's search results.
b) the page it was performed from (you'll need to be using page-specific channels to get this data).
c) the time it was performed.
d) the ip address of the person who performed the search (privacy issues - be careful with this one), which can be useful for determining the searchers geographic location, but probably can't be used for much else. If someone is searching for child pornography however, the fact that your website is tracking by IP address on an individual basis might create some obligations on you, the webmaster, to alert authorities, depending on the laws of your country.
e) the cookies of the person who performed your search (as applies to the web domain where the image script is located, NOT to google's cookies). This is also a MAJOR privacy issue, but is useful if you run a community site and store a members' username in their cookies. This would theoretically allow you to track searches performed by identifiable individuals to which you have more information than just an ip address.

3) Post the script into Adsense: When you go to fill in the web search form in Google's Adsense report page, simply give the url of the script in the spot where you are supposed to put in the logo.

That's all there is to it. You now have data-mining gold. Or a very ethically-sticky situation.

EDIT: I don't think I'm going to create the script after all, as I see too much room for abuse. It's my personal recommendation that Google limit the file extensions to .gif, .png or .jpg to limit this ability in webhosts (though that still won't limit everyone).

richmondsteve

3:36 pm on Jun 20, 2004 (gmt 0)

10+ Year Member



asp4bunnies wrote:
EDIT: I don't think I'm going to create the script after all, as I see too much room for abuse. It's my personal recommendation that Google limit the file extensions to .gif, .png or .jpg to limit this ability in webhosts (though that still won't limit everyone).

If your web server is logging image file types then this type of data is recorded in your web server logs anyway. On standard Apache installs everything you mentioned except cookie details is logged. It seems you're already aware that a web server can be configured to parse these image files as PHP, ASP or another language regardless of the file extension used. Mining the data to learn about user search behavior can definitely be quite useful. Thanks for pointing this out for others asp4bunnies.

asp4bunnies

3:48 pm on Jun 20, 2004 (gmt 0)

10+ Year Member



Yes, I was. I'm still rather surprised that Google is letting websites track this kind of information by letting us use our own logo file. If it's an oversight, it's not the kind I'd expect from Google's engineers. If it's deliberate, then what's in it for them? As far as I can see, associating their logo with this kind of potential for privacy abuse can only have negative consequences.

killroy

2:02 am on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm, doesn't give me anything my own site search doesn't give me already. Where is the great potential for abuse?

PS: Doesnt't the age-old regualr sitesearch let you do the same thing for years now?

SN

level80

5:45 am on Jun 21, 2004 (gmt 0)

10+ Year Member



This facility to track searches through the logo was available before their websearch was extended to Adsense. For a very long time Google offered a free website search (extremely similar to Adsense's except you didn't get paid) which allowed exactly the same tracking of search terms used by checking through the log files for the logo you are using.

To be perfectly honest all you need in the raw log file. In any type of analyser - Analog (even a text editor) you can seperate out the hits to the logo graphic.

From this you can see the search terms used. When I first discovered this (at least a year ago) there were so many search terms for pages not on the site that I made a list - from which I tried to add a few pages that people wanted there but hadn't found. Of course I sorted the search terms list by frequency - and due to the popularity of the site wouldn't have been physically able to keep adding pages with all the search terms in.

jomaxx

5:53 am on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think "abuse" is rather an exaggeration. Plus if someone does a web search on a page prominently co-branded with your site, I don't see how there can be a very high expectation of privacy anyway.

killroy

12:03 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In fact tehre is more potential for abuse in the fact that google has this info (which rightfully belongs to you, if you use your own in-site search engine) as they can combine many searches from all over the web.

So it's not really the issue that you know, you SHOULD know, it's a search of YOUR site after all. The issue is more that's it's none of Google's business what people search on your site.

SN

jonathanleger

1:01 pm on Jun 21, 2004 (gmt 0)

10+ Year Member




function ast_logsearch(e) {
var bug = new Image();
var sd = (ast_formObj.sitesearch[0].checked)?"web":escape(ast_formObj.domains.value);
bug.src = 'http://www.mydomain.com/at.php' +
'?ref=' + escape(document.location.href) +
'&q=' + escape(ast_formObj.q.value) +
'&sd=' + sd +
'&r=' + escape(document.referrer) +
'&dt=' + new Date().valueOf();
}

var elements;
var ast_formObj;
elements = document.getElementsByTagName("form");
for (var i = 0; i < elements.length; i++) {
if(elements[i].action.indexOf('www.google.com/custom') > -1)
{
if (document.layers)
{
elements[i].captureEvents(Events.ONSUBMIT);
}
elements[i].onsubmit = ast_logsearch;
ast_formObj = elements[i];
}
}

This is the script I'm using to capture the WebSearches and post them into a MySQL database (modified somewhat to pull out the PHP tags).

It traps the submit event of the form so that when the form is submitted it can post the search keywords, search domain, referrer and page data to the PHP script.

It's working great for me. :)

asp4bunnies

3:15 pm on Jun 21, 2004 (gmt 0)

10+ Year Member



Well the abuse comes from the combination of Google's search platform and your access to your members identifiable cookies.

If a user sees the Google logo they may think it's purely a separate form and isn't tracked by you. They know that Google doesn't have their personal user information and might do a search for "porn" whereas on your own site search, where they know you do have access to their identity, they wouldn't perform such a search.

That's my take on it anyway. I'm not saying we're not entitled to know what people are searching on on own site. But where someone thinks he's searching the web through google only, (and doesn't realize we can track the info too), that's where it gets tricky.

As for the ability to already track this through Google's web search, I hadn't known that, but it does make sense.

jomaxx

10:05 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It definitely is a security loophole of sorts, but think about the fact that every time someone clicks on a search engine link to get to a website, that site is shown the exact search the user performed in the referrer field.

I have a mainstream site, but nonetheless I've seen some pretty funky searches in my logs.

Referrer field info is the sort of thing that would be considered a major security breach if it weren't absolutely commonplace. Like the way email is transmitted unencrypted, or the way ad networks can track you from site to site to site.

level80

11:46 pm on Jun 21, 2004 (gmt 0)

10+ Year Member



If a user sees the Google logo they may think it's purely a separate form and isn't tracked by you.

There's nothing to stop you having a 1 pixel by 1 pixel white logo. ;)