|Secret AdSense Publisher ID Data Harvesting by Domain Company|
Privacy compromised revealing entire domain/site portfolios
I just happened to Google my AdSense publisher ID a short time ago--not sure why, just curiosity, I guess--and, low and behold, I find a domain company has been crawling the web with its bots, paying special attention to the Adsense ID of each website it crawls. Then it automatically compiles a list of every single website that uses that ID.
I can't reveal the URL of the company doing it, but you may want to see if your ID has been lifted from your site. (I checked somebody else's ID found via View Source, but Google doesn't have it indexed in SERPs.)
Hmmm... not sure it's a bad thing for me personally, but I do have a mixture of websites some of which are personal, some only business. That's because Google only lets publishers have one account, and that has been its strict policy for a long time/always.
In a related issue, it seems the same website that harvests your Google AdSense information also looks at Google Analytics IDs--the UA number.
Is it impossible for Google to give publishers privacy? Couldn't the snippet of Google Adsense code be set up by domain instead of the ID tag?
I don't know how many publishers would be interested in this but I suspect potentially a few. The bad news is I don't know of anything that can be done immediately on the publishers' side to enable privacy.
Speaking of privacy, even if you have Domain Privacy for every domain but one in your portfolio, but each domain gives away your AdSense ID, people can find out who you are via this site. This is good to know for anyone with a controversial site amongst a large portfolio of domains!
|Google only lets publishers have one account, and that has been its strict policy for a long time/always. |
I have several accounts (with Google's prior approval) - One corporate, one personal, another for a separate joint venture. It seems that all Google really wants is 1) a separate tax id, and 2) to know that you aren't scamming them.
And yes, someone harvesting AdSense Publisher IDs is disturbing, though I'm sure it's been going on longer than we all thought. After all, those Publisher IDs are sitting there just ready to be plucked...
[edited by: inactivist at 5:35 am (utc) on Feb. 27, 2009]
I agree with you this would be troubling.
|Is it impossible for Google to give publishers privacy? Couldn't the snippet of Google Adsense code be set up by domain instead of the ID tag? |
One problem with that is some web sites have multiple publishers working on it. Other web sites use some kind of revenue sharing system.
My ID# has been out there since 2004 and is not indexed by Google.
Your search - "my pub id#" - did not match any documents.
Mine does, but only on myspace where kids have copied things from my site incorrectly.
|I don't know of anything that can be done immediately on the publishers' side to enable privacy. |
Identify their spider and the IP ranges they're coming from then block. More information here [webmasterworld.com].
Interesting - when I google my pub id, I find the site - german site.
It only has one of my sites.
Very odd thing. Could any legitimate use be made of this?
Mods - should we out the site?
No site outing, thanks. However you may want to check with incredibill the spider mod and see what can be done about identifying the spider and what can be published in the spider forum.
Good idea, MB - I'll shoot him a sticky
@leadegroot "Could any legitimate use be made of this? "
well, I know two sites to spy google adsense id and I do use such database of trackable site IDs because it's a nice way to check what my competitors are doing. Also you may uncover site networks by spying on Adsense IDs and learn about quite a few interesting things
Clever trick. I'm not sure what's worse, this particular site that's fully indexed by Google, or another similar service (in English) that actually sells this information and has a much bigger database but isn't publicly available (i.e. the results aren't indexed, you have to pay first). The latter, which is quite easy to find, seems to be relatively popular and has over 700k sites in its index, whereas the German site claims 200k. I tried a few Adsense IDs, without paying, and they seemed to provide pretty complete results (they show you how many domains and subdomains use the ID and then make you pay to see them). This allows anyone to see if competitors are doing anything shady like arbitrage. A new can of "outing" worms. Additionally, the owner claims that what they are doing is perfectly legal and says they do not listen to, or even open, robots.txt files. Any blocking of the robot would have to happen on the basis of IP(s), but for some reason I doubt they are crawling with a user-agent that will easily identify them.
Correction: it looks like they have a 'sitemap' of search result pages, so it appears they are at least trying to get all the IDs indexed by search engines.
|they do not listen to, or even open, robots.txt files |
Which is actually great if you have bot traps that block them automatically. Spam bots that follow robots.txt rules, slowly index your site from many different IP, with various user agent, and at a reasonable speed, are a lot more difficult. Did I say too much?
and spookily enough, someone visited my site via the shadow site some 3 hours after I posted.
Coincidence? Probably reconfirming the data once it had been viewed.
I think we aren't supposed to post IPs? But the whois says its a static ADSL in .nl
|I think we aren't supposed to post IPs... |
Discuss with incredibill. He's the Spider Mod. ;)
Someone I know reported the German site to be removed from the Google index last week.
Finding and blocking those sites is a simple matter of Googling a pub id, finding out the ip of the site, doing an arin.net or ripe.net ip lookup then blocking the hosting company's ip range (if you have cPanel IP deny manager does it for you), but they could very well be crawling and populating their database from an adsl connection, which renders the above useless.
The interesting question is whether it is illegal to list someone else's pub id, and whether Google has a legal standing to go after those sites other than traffic starving them by removing them from the index.
Perhaps the owners of such sites know that their odds for monetizing are grim, and they are doing it just to gain notoriety.
|and spookily enough, someone visited my site via the shadow site some 3 hours after I posted. |
Oops! Sorry, Lea, that was me, actually, conducting what I thought was a harmless little experiment to find out how difficult it would be to find anyone's site network with these tools. Not too difficult, as it turned out. I apologize for the confusion and, particularly, the intrusion.
I don't think the owners of the other, more popular, English tool will care much about complaints. Everything looks rather shady. Some research on the info found in their (Hong-Kong) whois info reveals a stunning amount of criminal allegations, although I can't be sure if the registrant info also reflects the ownership of their site. The site is apparently hosted in the UK and payments are processed in The Netherlands. Fishy. Will Google care enough to find a solution for publishers? I doubt it.
// google_ad_client is global
google_ad_width = 300;
google_ad_height = 250;
But the Adsense bot may take exception to not seeing a proper ID and trigger an account review.
I'd love to have multiple publisher IDs on my one account ... Can't be that hard to allow us to have alternate IDs due to us not wanting to link some of our sites.
my id is out there since '05, not listed anywhere except one site, but not the one mentioned here.
Or they can simply use domain or encrypted public ID for each ad slot.
Now that Google owns Doubleclick, why not just shut it down ?
Things have become worse since including them.
Create a few pages on your site that are pretty worthless. Interlink them. Have just one minor link to one of these pages i.e. give Google and other bots a way in. Then put other people's publisher IDs on those pages. Lots of different IDs on different pages. Get reputable IDs from the big boys - NYT, Amazon, Ask, AOL. You don't lose any income, you don't lose any PR, it's within the Adsense TOS and you frustrate those data collectors ;)
How is this different than any other "spy" tool out there that grabs advertisers keyword lists, ad copy, SEO rankings, backlinks, etc?
It's been going on for years in other areas of online marketing it was only a matter of time before it hit the adsense side of things.