Welcome to WebmasterWorld Guest from 54.198.46.95

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

Secret AdSense Publisher ID Data Harvesting by Domain Company

Privacy compromised revealing entire domain/site portfolios

     
2:08 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I just happened to Google my AdSense publisher ID a short time ago--not sure why, just curiosity, I guess--and, low and behold, I find a domain company has been crawling the web with its bots, paying special attention to the Adsense ID of each website it crawls. Then it automatically compiles a list of every single website that uses that ID.

I can't reveal the URL of the company doing it, but you may want to see if your ID has been lifted from your site. (I checked somebody else's ID found via View Source, but Google doesn't have it indexed in SERPs.)

Hmmm... not sure it's a bad thing for me personally, but I do have a mixture of websites some of which are personal, some only business. That's because Google only lets publishers have one account, and that has been its strict policy for a long time/always.

In a related issue, it seems the same website that harvests your Google AdSense information also looks at Google Analytics IDs--the UA number.

Is it impossible for Google to give publishers privacy? Couldn't the snippet of Google Adsense code be set up by domain instead of the ID tag?

I don't know how many publishers would be interested in this but I suspect potentially a few. The bad news is I don't know of anything that can be done immediately on the publishers' side to enable privacy.

Speaking of privacy, even if you have Domain Privacy for every domain but one in your portfolio, but each domain gives away your AdSense ID, people can find out who you are via this site. This is good to know for anyone with a controversial site amongst a large portfolio of domains!

p/g

5:31 am on Feb 27, 2009 (gmt 0)

5+ Year Member



Google only lets publishers have one account, and that has been its strict policy for a long time/always.

I have several accounts (with Google's prior approval) - One corporate, one personal, another for a separate joint venture. It seems that all Google really wants is 1) a separate tax id, and 2) to know that you aren't scamming them.

And yes, someone harvesting AdSense Publisher IDs is disturbing, though I'm sure it's been going on longer than we all thought. After all, those Publisher IDs are sitting there just ready to be plucked...

[edited by: inactivist at 5:35 am (utc) on Feb. 27, 2009]

5:31 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with you this would be troubling.

Is it impossible for Google to give publishers privacy? Couldn't the snippet of Google Adsense code be set up by domain instead of the ID tag?

One problem with that is some web sites have multiple publishers working on it. Other web sites use some kind of revenue sharing system.

7:00 am on Feb 27, 2009 (gmt 0)

5+ Year Member



My ID# has been out there since 2004 and is not indexed by Google.

Your search - "my pub id#" - did not match any documents.

7:07 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member lame_wolf is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Mine does, but only on myspace where kids have copied things from my site incorrectly.
7:57 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I don't know of anything that can be done immediately on the publishers' side to enable privacy.

Identify their spider and the IP ranges they're coming from then block. More information here [webmasterworld.com].

8:59 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting - when I google my pub id, I find the site - german site.
It only has one of my sites.

Very odd thing. Could any legitimate use be made of this?

Mods - should we out the site?

9:21 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



No site outing, thanks. However you may want to check with incredibill the spider mod and see what can be done about identifying the spider and what can be published in the spider forum.
9:36 am on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good idea, MB - I'll shoot him a sticky
10:00 am on Feb 27, 2009 (gmt 0)

5+ Year Member



@leadegroot "Could any legitimate use be made of this? "

well, I know two sites to spy google adsense id and I do use such database of trackable site IDs because it's a nice way to check what my competitors are doing. Also you may uncover site networks by spying on Adsense IDs and learn about quite a few interesting things

Now about how to keep them off your butt: it's not really possible as of now if they are determined to get your id... But you may make things more difficult for their bots by using a script that call your adsense script (encapsulating your adsense script in an other one i mean). but if the bot is collecting IDs once the page is in the DOM 9once html and javascript have been interpreted), then there could be not protection unless Google Adsense decides to add some new feature to create random ids on a site basis or ad basis.

12:43 pm on Feb 27, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Clever trick. I'm not sure what's worse, this particular site that's fully indexed by Google, or another similar service (in English) that actually sells this information and has a much bigger database but isn't publicly available (i.e. the results aren't indexed, you have to pay first). The latter, which is quite easy to find, seems to be relatively popular and has over 700k sites in its index, whereas the German site claims 200k. I tried a few Adsense IDs, without paying, and they seemed to provide pretty complete results (they show you how many domains and subdomains use the ID and then make you pay to see them). This allows anyone to see if competitors are doing anything shady like arbitrage. A new can of "outing" worms. Additionally, the owner claims that what they are doing is perfectly legal and says they do not listen to, or even open, robots.txt files. Any blocking of the robot would have to happen on the basis of IP(s), but for some reason I doubt they are crawling with a user-agent that will easily identify them.

Correction: it looks like they have a 'sitemap' of search result pages, so it appears they are at least trying to get all the IDs indexed by search engines.

6:44 am on Feb 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



they do not listen to, or even open, robots.txt files

Which is actually great if you have bot traps that block them automatically. Spam bots that follow robots.txt rules, slowly index your site from many different IP, with various user agent, and at a reasonable speed, are a lot more difficult. Did I say too much?

8:06 am on Feb 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



and spookily enough, someone visited my site via the shadow site some 3 hours after I posted.
Coincidence? Probably reconfirming the data once it had been viewed.
I think we aren't supposed to post IPs? But the whois says its a static ADSL in .nl
8:14 am on Feb 28, 2009 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I think we aren't supposed to post IPs...

Discuss with incredibill. He's the Spider Mod. ;)

1:10 pm on Feb 28, 2009 (gmt 0)

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Someone I know reported the German site to be removed from the Google index last week.

Finding and blocking those sites is a simple matter of Googling a pub id, finding out the ip of the site, doing an arin.net or ripe.net ip lookup then blocking the hosting company's ip range (if you have cPanel IP deny manager does it for you), but they could very well be crawling and populating their database from an adsl connection, which renders the above useless.

The interesting question is whether it is illegal to list someone else's pub id, and whether Google has a legal standing to go after those sites other than traffic starving them by removing them from the index.

Perhaps the owners of such sites know that their odds for monetizing are grim, and they are doing it just to gain notoriety.

2:17 pm on Feb 28, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



and spookily enough, someone visited my site via the shadow site some 3 hours after I posted.

Oops! Sorry, Lea, that was me, actually, conducting what I thought was a harmless little experiment to find out how difficult it would be to find anyone's site network with these tools. Not too difficult, as it turned out. I apologize for the confusion and, particularly, the intrusion.

I don't think the owners of the other, more popular, English tool will care much about complaints. Everything looks rather shady. Some research on the info found in their (Hong-Kong) whois info reveals a stunning amount of criminal allegations, although I can't be sure if the registrant info also reflects the ownership of their site. The site is apparently hosted in the UK and payments are processed in The Netherlands. Fishy. Will Google care enough to find a solution for publishers? I doubt it.

2:58 pm on Feb 28, 2009 (gmt 0)

5+ Year Member



I wonder if we are allowed to put our pub ID in an external javascript file and not put it in the Adsense script in the HTML file.

<script type="text/javascript"><!--
// google_ad_client is global
google_ad_width = 300;
google_ad_height = 250;
...

But the Adsense bot may take exception to not seeing a proper ID and trigger an account review.

6:37 pm on Feb 28, 2009 (gmt 0)

WebmasterWorld Senior Member swa66 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'd love to have multiple publisher IDs on my one account ... Can't be that hard to allow us to have alternate IDs due to us not wanting to link some of our sites.
7:46 pm on Feb 28, 2009 (gmt 0)

5+ Year Member



my id is out there since '05, not listed anywhere except one site, but not the one mentioned here.
6:01 am on Mar 1, 2009 (gmt 0)

5+ Year Member



Why not use doubleclick style tags? Google now owns doubleclick. Each website's javascript comes with unique url like:
doubleclick.net/yoursite.com/blah

Or they can simply use domain or encrypted public ID for each ad slot.

6:13 am on Mar 1, 2009 (gmt 0)

WebmasterWorld Senior Member lame_wolf is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Better still...
Now that Google owns Doubleclick, why not just shut it down ?
Things have become worse since including them.
My 2c
3:22 pm on Mar 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Create a few pages on your site that are pretty worthless. Interlink them. Have just one minor link to one of these pages i.e. give Google and other bots a way in. Then put other people's publisher IDs on those pages. Lots of different IDs on different pages. Get reputable IDs from the big boys - NYT, Amazon, Ask, AOL. You don't lose any income, you don't lose any PR, it's within the Adsense TOS and you frustrate those data collectors ;)
2:08 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



How is this different than any other "spy" tool out there that grabs advertisers keyword lists, ad copy, SEO rankings, backlinks, etc?

It's been going on for years in other areas of online marketing it was only a matter of time before it hit the adsense side of things.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month