Forum Moderators: martinibuster

Message Too Old, No Replies

Information from Raw Access Log

Where are fraudulent clicks coming from

         

azlinda

7:42 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I downloaded my October 2020 Raw Access Log. I carefully went through it. The only thing I found that looked dicey was scrapy dot org which I have blocked. I again had a 30% clawback from Google. It's beyond me where "fradulent clicks" may be coming from, especially as many as Google claims.

What I found:

google dot com/bot.html
semrush dot com/bot.html
bing dot com/bingbot.htm
ahrefs dot com/robot/
aspiegel dot com/petalbot
opensiteexplorer dot org/dotbot (forwards to moz dot com/link-explorer)
pinterest dot com/bot.html
facebook dot com/externalhit_uatext.php
help dot baidu dot com/question?
mj12 bot dot com
comscore dot com
admantx dot com/service-fetcher.html (site undergoing maintenance)
grapeshot dot co.uk/crawler.php resolves to oracle.com
napoveda dot seznam.cz/en/seznambot-intro/
eyeota dot com
yandex dot com/bots
bombora dot com/bot
scrapy dot org (blocked)
megaindex dot com/crawler

I also found things like this without a URL. I have no idea what they are.

Jaunt/1.5

Mozilla/5.0 (Linux; Android 10; SM-G973U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36

CommandDork

8:25 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Same here, 27% for me. It would be REALLY NICE to know which site, ad spot, or country was causing the issue.

Wish they gave us something to work with, instead they find new and fancy ways to deliver the same old reports in the dashboard.

azlinda

8:33 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If Google KNOWS they are fraudulent clicks, they must know where they are coming from. As I said, the only dicey bot I saw was scrapy dot org. I could see no reason for them to be crawling my site, so I blocked them. I'm at a complete loss to know where these "fradulent clicks" are coming from.

lucy24

9:12 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I also found things like this without a URL. I have no idea what they are.
That would be a User-Agent string. In isolation, there is no telling whether it is a human, or a humanoid robot. But you, with the full logs ahead of you, can easily tell. If the page request is followed-up by a request for all supporting files, preferably including the favicon, it is almost certainly a human. If the page is requested all by itself, it is almost certainly a robot.

If a click is reported at a time when logs don't show a human visit (does AdSense tell you exact times, or just an aggregate?), there's a problem.

If you have never before studied your own access logs, you may find it useful to visit the site yourself, making careful note of the time you open each page. Now go get the logs, and see what you--who are indisuputably human ;)--look like.

NickMNS

10:05 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(does AdSense tell you exact times, or just an aggregate?),

No Adsense does not report any details, in fact they don't even tell you what pages received the ad clicks. Now if you implement unique ad units for you specific pages or groups of pages you can get some information from that. Google analytics can show you which pages received ad clicks and the time but you obviously need to be using GA, and have your AdSense account linked to GA and then depending on how frequent ads are clicked per page, you may need to be somewhat imaginative with the filtering options.

That said, in the raw logs, the one thing I would look for is patterns. That is repeat visit from the same IP at the same time of day, or always going to the same pages. You could also do this using the user-agent instead of the IP. AdSense is most likely detecting invalid clicks this way, now they can search for suspect patterns across many websites (which you can't).

To do this filter you logs by most frequent IP's, these should mostly be from well known bots, such as Googlebot, Bingbot and others, but if there are some that are not known then investigate those. You also can filter across other features, such as user-agent and even time of day, arrivals to your site should be random even for most well behaved bots, but malicious bots targeting you site specifically may be programmed to hit your site at a specific time. You can do this for any feature, it may not turn anything up, but it could reveal a hidden pattern. This is tedious work, specially if you doing it in excel.

I hope this helps.

azlinda

11:05 pm on Nov 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thank you, Lucy24 and NickMNS. You gave me a lot to work with, and I will start on it right away. If there is a problem coming from a User-Agent string, how would their IPs be blocked as it's impossible to do a DNS lookup without a URL.

lucy24

12:36 am on Nov 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the problem is associated with a particular user-agent, then you don't deal with the IP at all. Instead you block the full UA--after making sure it is not currently in use by legitimate humans. Exact details will, of course, depend on your server type; there are subforums hereabouts for Apache and IIS.

I think you are giving too much weight to the URL in the UA string. It has nothing to do with where the robot actually comes from; it's just an information page for major law-abiding robots, especially search-engine spiders. The actual IP is typically the very beginning of each line in the access log. Again, details will depend on your server type, so you may need to ask follow-up questions in the appropriate subforum.

ronron

5:27 am on Nov 2, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



AdSense also includes unintentional (ie accidental) clicks under the umbrella of invalid traffic; they wrote that in their definition of invalid traffic. That's where you can have normal users who cause invalid traffic but user agent, IP, ISP all come back clean. That's one area where I wish Google would at least give some broad breakdown on the source of invalid traffic deductions: malicious bots or from bad ad placements.

dolcevita

2:02 pm on Nov 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



27% clawback for my earnings. Slightly lower than last month, but still high. Till April this year, it was never higher than 5%.

I would also like Google to be transparent about that. We are not getting any additional information now which is frustrating.

I use Cloudflare which has extra options for blocking Bots like 'Bot fighting mode', 'Javascript detections', 'Browser integrity check', 'Cloudlkare special ruleset' and the Firewall level is on medium, but despite everything the clawback is still high.

MayankParmar

3:20 pm on Nov 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Before the pandemic began, my clawback was between 1-2%, but it increased to 6% in May, June, and jumped to 9% in September without any changes to the site. Last month, clawback was under 2% and again I barely made any changes.

I have 2 ads on mobile (both non-AMP and AMP) and five ads on desktop, so this should have not happened). I don't even have a header or above the fold ad on mobile, so there's no chance of accidental clicks when opening the pages.

I still believe it's Google's fault (intentional or unintentional) and there's nothing that we can do to eliminate it completely.

This clawback trend started for a lot of people after the pandemic or when Google's earnings dropped :)