Welcome to WebmasterWorld Guest from 54.211.135.32

Forum Moderators: phranque

Seeing hits to ads.txt from non-google IP's

started about a week ago

     
1:30 am on Jul 5, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 8, 2016
posts:87
votes: 0


I can (or will) post more details about the IP's and user-agents involved, but all of a sudden I'm seeing hits to ads.txt from non-google IP's. Don't know what's up with that...

(if this should be moved to bots forum, go ahead...)
6:03 am on July 5, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9914
votes: 972


Should see from a wide variety ... last time I looked it was over 20 (non-g) on one of my sites.

All get 404s since I do not have any IAB style (third party) advertising active.

Ads.txt is for advertisers ... not specifically g itself.
6:23 am on July 5, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


Oh, heck, everyone asks for ads.txt, even if they have no earthly reason to believe such a file exists on your site.

:: detour for quick check ::

Yup. Out of the truly ridiculous number of requests I get, no more than 1/8 are from Google. The rest come from what can loosely be described as The Usual Suspects, including but not limited to 34/52/54 and the like. Interestingly, every single request except Google gets a 403, meaning that they have all caused offense in some way.

User-agents range from hilarious, like
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50

to improbable, like
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0

or, most popular of all,
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36

In fact a remarkable number (about 3/4 of the total) claim to be some recent Mac.

And finally a tiny handful profess to be
adstxt.com/1.2
which I tend to doubt, since no two have even similar IPs.

I suppose someone hereabouts can explain what information they hope to glean. (Like asking for robots.txt to learn the names of your roboted-out directories, I guess.)
1:34 pm on July 5, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2625
votes: 774


Ads.txt is not a Google thing. It was implemented to allow programatic ad-buyers to verify that the ad-networks selling the ad space in fact had the right to show ads where they claimed, thus it is normal and expected that requests for ads.txt would come from a variety to sources.
1:51 pm on July 5, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 8, 2016
posts:87
votes: 0


Hmmm. All I've ever seen (since I started seeing ads.txt requests maybe 2 years ago?) was from google. But over the past 1.5 years I've moved and expanded my IP blocking list to my router, so anything getting blocked there I have no way to know what they were trying to get. And I also don't have ads.txt so anyone asking would get 404. That's why I thought it was a google thing.
1:52 pm on July 5, 2019 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts:457
votes: 16


They could be probing for a particular plugin's existence that may be exploitable.

About a month ago set up live stats. Instead of looking at an overvi3eww created from the site logs, I can now refresh a page and see the latest hits... all of them. From that I deduced that the 1400 unique visitors each day were comprised mostly of malovents looking for a means to exploit the web site and/or server.

It was pretty obvious at first... hundreds of hits on PHP pages that did not exist on a Windows server. Hundreds of hits looking for particular WordPress plugins to exploit. Untold hits containing SQL commands and so forth.

So I started blocking the repeat offenders and even whole network blocks. Along the way I compiled a list if IP addresses for the popular search engines and some code that alerted me to new ones. Conclusion = there is a lot of hacking software out there claiming to be a searchbot.

Since I started, the 1400 unique visitors per day has been reduced to 800. I still see a lot of mischief from individual IPs that are randomly assigned. However I did mange to kill off the traffic from SEO spiders... the ones that study your site and then sell the info to your competitors!
1:59 pm on July 5, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2625
votes: 774


Here is a detail explanation of Ads.txt, how it works, why use it, and who it applies to.
[iabtechlab.com...]
12:25 am on July 7, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 8, 2016
posts:87
votes: 0


As of today, looking as far back as Jan 2015, I have a grand total of 507 requests for ads.txt. The first hit coming on 12/21/2017 from IP 198.148.27.17 (Pulsepoint Inc). The next hit happened on 1/4/2018 and was from 66.249.x.x - what I consider to be google's "googlebot" subnet. That marks the start of google's usually once-per-day request for ads.txt.

Of the 507 requests for ads.txt, all but 7 of them came from 66.249. (hence why I thought it was something specific to G). All of G's requests have the same (typical) user-agent:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The non-G hits to ads.txt have come from:

198.148.27.17 12/21/17 PulsePoint UA = Ads.txt-Crawler/1.0
54.173.25.142 1/29/18 Amazon AWS no UA
54.88.97.127 3/21/18 Amazon AWS UA = IndustryIndexBot/1.0 (+http://industryindex.com/bot/)
45.79.71.25 4/1/18 members.linode.com UA = Java/1.8.0_161
18.234.171.45 10/3/18 Amazon AWS UA = python-requests/2.4.3 CPython/2.7.9 Linux/4.9.93-41.60.amzn1.x86_64
74.128.145.x 7/1/19 Road Runner Lexington KY (HEAD only) no UA
67.226.210.4 7/3/19 Tremor Video DSP (?) UA = Dispatch/0.13.2

The reason for my relatively low (and G-focused?) requests for ads.txt must be because my site has absolutely zero cross-linking to outside domains for any reason (ie no tracking, FB, ad networks, googletag stuff, adwords, etc). Otherwise I have no idea why...
12:40 am on July 7, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9914
votes: 972


Chuckles. :)

Looking back at LAST MONTH on a rather obscure hobby site, 108 ads.txt requests and only three were from g.

Go figure.

YMMV ... A LOT ...
12:47 am on July 7, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9914
votes: 972


Aside: the hobby site mentioned above has NO ADS and is not commercial in any way (but is linked by many as internationally authoritative for that niche).
2:07 am on July 7, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


absolutely zero cross-linking to outside domains for any reason
Also zero cross-linking from outside domains? At latest count, I see around 900 requests within this calendar year, and that's on teeny weeny sites.

:: digression ::

Oh, will you look at that. On two separate dates in March for my test site, and a single (different) date for my personal site, and further-removed dates on other sites, there is:
80.248.227.abc - - [13/Mar/2019:01:52:01 -0700] "GET /robots.txt HTTP/1.1" 200 309 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)" 
80.248.227.abc - - [13/Mar/2019:01:52:04 -0700] "GET /humans.txt HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
80.248.227.abc - - [13/Mar/2019:01:52:07 -0700] "GET /ads.txt HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
80.248.227.abc - - [13/Mar/2019:01:52:11 -0700] "GET / HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
Wasn't I only just talking about bad reasons to request robots.txt? On the test site--the only one they hit twice--they would have met a comprehensive, all-encompassing Disallow.
11:59 am on July 7, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 8, 2016
posts:87
votes: 0


> Also zero cross-linking from outside domains?

There are a couple of links from wikipedia. I've just done a google advanced search for "mydomain.tld" and I see results from zoominfo, bloomberg (private company information), frasers, a few in books.google.com, google scholar and scientific journals, various corporate directories, about a dozen other companies in related fields, some university lab and personal websites. Our company's domain has had an active website going back to about 1998. Fecebook auto-generated what is I guess a place-holder or dummy page for us a few years ago - we don't use it.

I've probably blocked domaincrawler.com (probably the entire AS network that hosts it) but I don't know why you brought up domaincrawler in connection with my observations related to ads.txt.
5:54 pm on July 7, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


I see results from
Wow, that sounds like a solid collection.

I don't know why you brought up
Because I was searching for requests for ads.txt and this was one of the very few with a name as opposed to a made-up humanoid UA. This led to noting that they were one of the very few that requested ads.txt as part of a set of requests for other stuff. If nothing else, it suggests that ads.txt is becoming a standard file that robots expect to find. (But humans.txt? Seriously? That one never did become a standard.)