| 8:26 pm on Mar 2, 2014 (gmt 0)|
The adsense management api might offer some helpful features for this.
There may be some tools out there based on this but if not I may just try to build something from scratch because this could really blow the lid off this issue or at least help people get closer to the causes of this scraping on a case by case basis.
@wa desert rat -- this is tackling the problem head on. Awesome!
BTW this may not give you real time but perhaps regular polling will be enough.
|wa desert rat|
| 9:05 pm on Mar 2, 2014 (gmt 0)|
I have already discovered that the mix (how the channels and products and clicks and CPC/RPM/CTR) all shifts around as Google massages the data. Normally I am saving on 30 minute intervals but in the evenings I get it down to 15 minute.
It is fascinating to see just how much manipulations of the data G is doing. Clicks appear and disappear, "earnings" goes up and down, CPC and CTR move up and down.
At one point I'll have 4 clicks on my bottom banner but 30 minutes later those are gone and I have 4 clicks on my top banner.
It would take a "waterfall" type analysis tool to watch this in real time. Seriously.
| 9:29 pm on Mar 2, 2014 (gmt 0)|
Best analytics tool is Google's own, Analytics. You can't track ads/advertisers, but you get much more detailed information on earnings. I also log earnings/clicks etc. every 15 mins.
|wa desert rat|
| 9:49 pm on Mar 2, 2014 (gmt 0)|
|Best analytics tool is Google's own, Analytics. You can't track ads/advertisers, but you get much more detailed information on earnings. I also log earnings/clicks etc. every 15 mins. |
I suspect you've never used Piwik. You don't have to choose between them, of course. You can use both. Piwik gives much more information - and much more granular information - than GA does but has no way to track earnings. But if you want to know what your visitors are doing on your site and where they go when they leave your site, Piwik is much better.
And the thrust of this thread is that I think that Google's tracking of earnings might be flawed. And I am not alone in that suspicion. So if GA is reporting flawed earnings reports, then what?
| 10:34 pm on Mar 2, 2014 (gmt 0)|
Sorry, I meant to say, best analytics tool to analyze Adsense is Google Analytics. AFAIK, no other analytics tool has direct access to Adsense data.
For example; you can go to referral report and select Adsense (under Explorer) to see detailed stats of adsense revenue, CTR, etc. for each referring domain. Most of the reports have option to explore based on Adsense data.
|wa desert rat|
| 10:56 pm on Mar 2, 2014 (gmt 0)|
Yes, of course. When it comes to analyzing Adsense the only game in town is Google. Nothing I know of will correlate clicks to ads to users.
This is not to say that it might not be possible. I can envision a couple of methods:
1. Nagios with a plug in that correlates an action by a user with a corresponding packet to Google.
2. A way to detect when a user's mouse has clicked outside of the standard browser window and onto an embedded ad's area.
The first method would imply control over one's own entire server; otherwise virtual web sites would confuse things. This would make it only feasible for people who can afford to co-locate a box at a hosting site. Or, possibly, someone in an environment serviing fiber access with static IPs and reasonable bandwidth rates.
The second is beyond my expertise entirely. But if a piece of malware can do it then I expect someone else can do it. :P
I have not yet looked at the API link provided by webcentric. But the fact that analytic tools like Piwik have not implemented anything (despite its obvious attraction) makes me think that it's a lot harder than it seems.
| 11:15 pm on Mar 2, 2014 (gmt 0)|
|The issue is that even though visitors are seeing ads and clicking on ads we are not getting any real information about why those clicks are not generating earnings. |
I'm fairly sure it has to do with click bots running through the network. Google is trying to reassure advertisers about the stability and honesty of Adwords. Advertisers are Google's lifeblood. So Adsense is being more aggressive in monitoring and shaving clicks that seem even a little iffy. If advertisers give up on Adwords, then we're all in trouble.
No, we publishers do not have any control. Might as well get used to it.
|wa desert rat|
| 4:20 am on Mar 3, 2014 (gmt 0)|
If Google is misidentifying and being more aggressive and the result is that legitimate clicks are being called back, then it's the publishers who are being defrauded.
Maybe we should figure out if that's what is going on.
|wa desert rat|
| 5:56 am on Mar 3, 2014 (gmt 0)|
I suspect that there may be more than a few publishers who don't actually have a clear idea how a "bot" works in conjunction with click fraud.
This is an excellent explanation along with an example of a specific click bot showing how it works with one (or more) associated C&C (Command and Control) servers: [icsi.berkeley.edu...]
Knowing how they work could better help us recognize them. Especially when we use better user analytics (instead of concentrating on earnings analytics).
|wa desert rat|
| 6:35 am on Mar 3, 2014 (gmt 0)|
Reading through the .pdf document explaining how click bots work a few details become more clear:
1. Publishers with more ad content were more likely to be complicit in click fraud in the past. However times have changed;
2. Middle men are now involved with click fraud who apparently enter into revenue sharing schemes with advertisers (or ad agencies);
3. Any web site can now be a conduit for click fraud since PPC servers hosted at the C&C center direct bots to web sites hosting the targeted ads;
3. High bid ads are preferred over low bids. Thus, a higher CPC bid would be more likely to be suspect than a low CPC ad. This explains why so many $25 CPC ads are scraped back, by the way;
4. A user on your site who is not registered is more likely to be suspicious (but not conclusive given the abilities of spam bots to avoid registration pitfalls designed to entrap them);
5. A user on a mobile device (tablet or smartphone) is less likely to be a click fraud bot;
6. A user that arrived at your site via a Google or Bing or Yahoo search (or any recognized search engine) are less likely to be bots. This is because the sophisticated bots are more likely to use the search engines hosted at the Command & Control center. Having said that, it would not surprise me to find bots that are able to spoof real search results from real search engines; and,
7. Users who remain on your site and explore content are much less likely to be click fraud bots.
An analytical tool that gives a publisher more granular details about his (or her) web site visitors is more likely to be able to identify suspicious activity than an analytical tool that concentrates on granular information about earnings.
Being able to correlate certain characteristics of click fraud are essential to identifying the IP addresses of the most likely culprits. So you need some data in real time to determine which user that meets all the criteria above is most likely aligned with a click take-back. Of course, this all is easier if the take-back is done in a timely manner. If Google waits some significant time period after the fraudulent click then it becomes a lot more difficult to determine which user that did not arrive by a search result and did not visit any pages before leaving was responsible for the fraudulent click.
About 9 months ago I had a sudden visit by a flock of "users" which were identified by Google Analytics as being from the same town in S. Calif. I could not see them on the control panel of my forum but GA insisted that they were there. The next day I searched for a tool that could give me IP addresses of my users and found Piwik. It also gives me real time data on what, exactly, individual users are reading or accessing on your web site.
| 5:25 pm on Mar 3, 2014 (gmt 0)|
I've not used piwik but do use another third party analytics app that does something similar (can access IP info in real time) and can say without a doubt that it's an invaluable feature. I use it to identify malicious users on a regular basis.
Some things I've noticed that make this type of tracking difficult are:
1. A delay between the Adsense home screen and the performance reports screen. Some clicks never make it into the performance reports because of this delay. This means you may see clicks in the home page summaries but never get a chance to figure out what unit was clicked, what country generated the click, etc.
2. Actual delays in click reporting means it's almost impossible to identify when the click actually happened. This can be a simple delay of a few minutes or could be the result of a click dump on Monday for clicks recorded over the weekend that were delayed for some reason. I'm seeing this particular pattern virtually every Monday these days. Too many clicks in a short period of time for normal traffic and they're not being taken back. Just a big dump all at once.
3. I've also seen clicks/revenue appear, disappear and then reappear again. I thought at first this was related to having a couple of browser windows open to Adsense at the same time and caching was somehow involved but that doesn't appear to be the case after closer inspection. Could be report caching on G's end or the fact that data is being run through a series of algorythms and one is resetting the report until it makes it's determination about whether the previously reported click is valid or not. This yo yo effect is what's really maddening and while I don't see a lot of takebacks, I'm still seeing the yo yo which indicates that the algorithms aren't all integrated and data is bouncing around between them. I'm guessing that the whole system looks like frankenstein's monster right now. A patchwork of algorithms stiched together in a very ad hoc fashion. And I'm pretty certain there will be even more stiches tomorrow.
|wa desert rat|
| 5:44 pm on Mar 3, 2014 (gmt 0)|
That Berkeley essay on click bots was, to say the least, enlightening. And Webcentric's comments illustrate how Google's performance reports don't make it any easier to identify them. I have to admit that I was naive in my thinking about the bots and the problems they create; but that piece certainly changed that.
A second reading showed that the click bot can change its apparent browser (in one case to Mozilla). So I looked through CherryTree notes showing periodic saves of performance reports and found a 30 minute period in which two clicks were made and then removed.
Since Piwik keeps a database of users and actions, I figured that it would be relatively easy to find a "user" (bot) that seemed to match the parameters. Fortunately this all happened late in the evening when the traffic to my site is very light.
The two clicks were not in the performance report at 2245 and appeared in the 2300 report but at 2315 they were gone.
I found only one user who matched the criteria (not mobile, not reading other pages... just connected and left). It was a Comcast IP address. The OS was NT and the browser was Mozilla.
I figured I had the culprit... however if Webcentric is right and there are unseen delays in the performance reports it's all for naught.
| 6:00 pm on Mar 3, 2014 (gmt 0)|
Malicious bots are notiourios for spoofing the "user agent" so it almost s a useless piece of information where hackers and bots are concerned.
I don't know about tracking actual users but I do think Adsense performance reports can be used to narrow a few things down where click scraping is concerned.
Record the country of each click as it shows up in the country report and then watch to see if that country's click count decrements later. This can help with identifying problems that may be coming from outside your country.
Use the same process for Ad Units. This can help determine if the issue is related to a specific ad slot (maybe identifying a high rate of accidentials in a specific location).
This could also be done with platforms and other reports in the performace report area.
This tecnique alone isn't going to give you the IP address of the user but you can start to get a more focused view of the problem and may see some interesting patterns emerge. It's undoubtably what G is doing...looking for patterns.
|wa desert rat|
| 6:19 pm on Mar 3, 2014 (gmt 0)|
If you use Piwik and can identify which user(s) you are looking for, you can get IPs. Take a look at their web site. They have a demo running that gives you access to the entire application. Click on "Visitors" at the top, then "Visitor Log" to see the database listings. The "Dashboard" (when you first log in) gives real time user data along the left.
But after studying that essay I am not sure we can do much. Perhaps that "adsense click fraud" plugin is the best we can hope for. There's little use in banning one IP address when the numbers of exploited Windows machines number in the millions.
My original concern centered around wondering why click fraud would reduce earnings over a long period of time. I still don't see how that would happen but I am certainly a bigger believer in the click fraud bots today than I was a week ago.
|wa desert rat|
| 6:54 am on Mar 4, 2014 (gmt 0)|
The pattern is that there really is no pattern. The clicks with the largest bids go before anything else. Mobile clicks are the fewest to go. One would think that clicks that generated the highest CTR would go right away but they often do not.
There is no practical way for a publisher to identify click bots. And absolutely no way for us to identify accidental clicks (where a user clicks but leaves the ad page content right away or even before it's loaded).
I am convinced that bots are out there. I am not convinced that Google is identifying them correctly.
| 10:50 am on Mar 4, 2014 (gmt 0)|
|The clicks with the largest bids go before anything else. |
Just how would they know this? Surely they'd have to have inside information since we certainly do not get that kind of info or do they simply target the supposedly higher paying ads?
| 3:04 pm on Mar 4, 2014 (gmt 0)|
As to the high paying clicks - either they're targeting them, or the bot activity has figured out some further way to flummox Google's reporting - when I get a click bomb, the clicks always report with a very high EPC.
Were they actually that high-paying? Nobody knows but Google. But the reporting in general is so far off that it wouldn't surprise me if it's off there too. Couldn't it be possible that the bots *aren't* actually targeting high paying clicks (because they can't) but that possibly the clicks come so fast and furious that every aspect of the reporting can't keep up, at least not simultaneously.
Bear in mind *all* the actions that have to be taken into account (and validated) at the exact time a click occurs, on the publishers side AND the advertisers side. There's a lot going on there. Times all the several million publishers and even more advertisers and the Google search results ads, all happening simultaneously. Now you throw in a mess of bots. It's a minor miracle we get same day reports at all.
So taking it all into account, it would not surprise me if, for example, clicks don't update (and get validated) at the same time that earnings do, resulting in some really weird reporting, outlandish CTRs and unrealistic EPCs. Temporarily. In Google's eyes, when those stats catch up, it's not a take-back, because they were never really ours to begin with. It'd help a lot if they just came out and said that.
And making it all worse was the change last year to make the reports more 'real time' - on the plus side, you get to see your take-backs as they happen instead of at the end of the month, but on the minus side, you get to see your take-backs as they happen instead of at the end of the month. People sometimes don't like to see how the sausage is actually made. (I'm one of them)
But - of course - everything we think or say is all speculation, because we are not going to know. I'm pretty sure it would take a lot of years and a major court decision to get that information out of Google, because it undermines the advertising network, and that can't be allowed to happen at any cost.
|wa desert rat|
| 5:28 pm on Mar 4, 2014 (gmt 0)|
|As to the high paying clicks - either they're targeting them, or the bot activity has figured out some further way to flummox Google's reporting - when I get a click bomb, the clicks always report with a very high EPC. |
Actually, these new bots DO know what the higher paying ads are. If you read that Berkeley link I posted you will see that the C&C servers know what ads are high-bid and what ads are not and the bots (which are hosted, for the most part, on compromised Windows PCs) are given instructions in XML format how to find them.
This is because they have that information. The only entity that would profit from this information would be ad agencies. The publishers (namely: us) aren't told what ads are high-bid until after a click and the advertisers (the people who are ultimately trying to get users to buy their products or services) actually LOSE money through click fraud. And Google, of course, loses.
The only way a web publisher could profit in this type of scheme is if the ad agency and the publisher are conspiring together. This would require the publisher to select ad candidates carefully up front. The Berkeley report is also somewhat confused as to exactly how the profits fall.
Middlemen in this system profit.
Reading this document is absolutely vital to understanding how click-fraud works: [icsi.berkeley.edu...]
| 6:04 pm on Mar 4, 2014 (gmt 0)|
I read it; I'm not convinced that's the major factor though. I'm sure there are advertisers gaming the system, I'm not sure that there are as many in the AdWords networks as that paper would imply.
|wa desert rat|
| 6:30 pm on Mar 4, 2014 (gmt 0)|
Looks to me like my theory that mobile devices are not considered to be likely click-fraud components is beginning to be substantiated by the data.
This morning I had 4 clicks on a moible ad (only one ad, a banner at the top) which, mostly (I suspect) because of relatively fewer page views, increased the CTR to over 4% (generally my CTR is 0.6% or so). Normally this would make me think it was a click-fraud candidate; and on a full page I think it would be. However for several days I have noticed that clicks on that mobile ad seem to be given a "pass" by Adsense and not scraped back.
This is in stark contrast to the three other clicks I had this morning; none of which increased the CTR or even had a high CPC (one for 66-cents and two for 28-cents). Nevertheless... they were snatched back.
|wa desert rat|
| 6:45 pm on Mar 4, 2014 (gmt 0)|
|I read it; I'm not convinced that's the major factor though. I'm sure there are advertisers gaming the system, I'm not sure that there are as many in the AdWords networks as that paper would imply. |
I think that Google is convinced that these new clock-bots are a major factor and that's more to the point. If Google thinks that they're being attacked by a squadron of bots linked to Command and Control servers then it doesn't much matter if it's true or not. Google can just snatch back ads it thinks are even on the edge. It's not good for G's bottom line but we're no longer the biggest player in the Google stadium and the threat of advertiser litigation might be more important that the (apparently declining) networks member revenue.
I suspect that if you are seeing click-bombs that target high-bid ads then that is a good indication of the presence of sophisticated bots on the Adwords networks.
Google's first quarter reports for 2014 should be pretty interesting. If "member networks" revenue is down again then we are headed for some bumpy times.
| 7:01 pm on Mar 4, 2014 (gmt 0)|
I would also be very wary of using too small a sample to determine trends. I think you need hundreds if not thousands of clicks to really see anything that might possible be statistically significant.
|wa desert rat|
| 7:41 pm on Mar 4, 2014 (gmt 0)|
|I would also be very wary of using too small a sample to determine trends. I think you need hundreds if not thousands of clicks to really see anything that might possible be statistically significant. |
I've been on this board for a year and the amount of REAL information put forward by anyone pales into insignificance when compared with the prattle of advice to improve content, change ad positions, etc. It's all speculation, after all. Someone said that recently.
Oh... it was you. :)
How much better is a small set of data than zero data? How much better would more people sharing more data be than guesswork? So far, as near as I can tell, no one but Google knows what's going on. But I've reverse-engineered things before and I think that if we spend less time being afraid of being specific we might get a step up.
Or not. But at least we'd be dealing in something more than simply whining about declining earnings.
No one wants to be specific... but everyone wants everyone else to be more specific.
You mentioned that in your experience click-bombs seemed to follow high-cost ads. That was interesting, I thought. In fact, in light of how that Berkeley study described how two different click-bots work, I thought it was significant.
So maybe we can share more data. That way we get more significance.
But if you prefer, I'll just go back to whining.
| 8:12 pm on Mar 4, 2014 (gmt 0)|
I don't share specific data, because that information is A) company confidential, B) in some cases against AdSense TOS, and C) in other cases, against WebmasterWorld policy (in that order). Other people have their own reasons.
AdSense is opaque by design. If you really think you are going to be the one to reverse engineer it, good luck.
|wa desert rat|
| 8:27 pm on Mar 4, 2014 (gmt 0)|
Well, looking at Google's revenue reports for the Adsense/Adwords programs, it's not working too well right now.
But maybe more whining will improve things.
| 9:19 pm on Mar 4, 2014 (gmt 0)|
*WE* are not going to improve AdSense performance. Yea, I said it. Not gonna happen.
What we *can* do is make sure that AdSense is just one of the tools in the monetization toolbox.
|wa desert rat|
| 10:52 pm on Mar 4, 2014 (gmt 0)|
|What we *can* do is make sure that AdSense is just one of the tools in the monetization toolbox. |
Standard response. Complete with standard jargon. Probably a good idea if you manage dozens of websites. Certainly wouldn't want you to go against company policy.
If that's the way most publishers on this forum feel far be it from me to interfere.
| 2:29 am on Mar 5, 2014 (gmt 0)|
I've been following this conversation while riding in a car today and have been dying to pipe in but haven't been able to until now. Anyway, my first thought is related to a question that's been bouncing around the board for awhile now and that is, "Why would a click bot click on my ads? What does it have to gain by doing so?" Here's my theory. If I was involved in a network that was operating as described in the berkely document, I would use sites that are not in my network as places of discovery e.g. that's where I would go to find and document the high paying ads. I don't stand to draw attention to my ad accounts if I'm doing all my discovery on someone elses site.I can run rampant through other people's site and once I've got the information I need, then I can carefully turn my bot (maybe even a completely different bot) loose on my own sites to actually make me some money by taking advantage of what I learned from some poor publishers site. Another reason for this behaviour could be to simply obfuscate the surfing behavior of a bot e.g. if the bot only visits sites it can make money from, it could help G identify the accounts involved, so randomly hit everyone's account and just work yours into the mix to hide which accounts are benefitting from the fraudulent activity. That's one observation on this discussion. More to come I believe.
| 3:00 am on Mar 5, 2014 (gmt 0)|
Next observation. Just reiterating a point I've made before. The fluxuations in reporting and obvious disconnects between various reports in real time lead me as a long-time database developer and programmer to the conclusion that G is layering new algorithms (or batch processes) on top of of the old system in a way that's causing the data to be run though multiple processes, each of which is updaing the reporting system as it is run. That's why we can see the takeaways. If they waited until all the processes have run to update the reports, all these changes would be far less evident. This is pointing to an evolving process of which the end result may become a seemless process but when you're combatting an evolving problem in real time (as I believe G is doing right now), you're going to need to put some bandaids in place and that isn't necessarily going to be pretty. You don't stitch the patient up though until you're done with the surgery so to speak.
Also, I do believe that the many changes in text ads we've seen recently are directed at a different kind of fraud (that being publishers trying to disguise their ads as content) so, looking at the larger picture, G is in a war with fraudsters on a variety of levels.
There was a time, many years ago, when I thought that the whole PPC model was dead for good. Fraud destroyed virtually every PPC program ever launched prior to the introduction of Adsense. I contend that Google single-handedly revived the model at a time when I truly believed it had gone or was going the way of the Dodo bird. So, those of us who have benefited from this program and may now be sour on it could use with stepping back and understanding what an accomplishment it was to make this program work as long as it has in a world full of operators out to cheat such system. I don't think Google is out to run it's own product into the ground. I believe they have been beating the odds in keeping it running as long as they have. I see what they are doing as truly swimming upstream when it might actually be easier to just walk away. This may be the last gasp of the dodo or this company may once again be doing what is necessary to keep doing what so many others have failed at. Time will tell, but my hunch is that all these changes point to the fact that no one over at Google is ready to throw the towel in on this venture quite yet. It's an ugly process but we live in a sometimes ugly world.
| This 34 message thread spans 2 pages: 34 (  2 ) > > |