Forum Moderators: open

Message Too Old, No Replies

Mozilla/5.0 (compatible; ips-agent)

         

tangor

4:00 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New one for me...

UA: Mozilla/5.0 (compatible; ips-agent)
IP: 72.13.46.n
Robots.txt: YES
ISP:VeriSign Infrastructure & Operations\

One hit and gone. What is this?

Note: site is HTTP not HTTPS...

lucy24

5:09 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IP: also 69.58.178.various, 72.13.62.various
Requests: nothing but robots.txt (I cross-checked the three IPs to make sure), once every few weeks

:: poring over Header Access info file ::

Oh, whoops, that would explain it. At some point they failed the ordinary robots.txt test
User-Agent: name
User-Agent: othername
User-Agent: thirdname
Disallow: /
so in 2016 (really) I put them on a separate-line test:
User-Agent: ips-agent
Disallow: /
and, er, I guess I forgot all about them.

:: closer look at archived logs ::

Looks as if by the end of 2018 they had decided to become compliant, although even before then they've always complied on my test site (fully roboted-out, so robots.txt is a simpler file).

dstiles

9:11 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Mozilla/5.0 (compatible;

From observation, no true web browser uses the word "compatible" any longer. Bots often do but not true browsers.

jmccormac

10:29 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is Verisign's crawler and should be coming from a Verisign IP address. It has been used to measure web usage of domain names in .COM//NET gTLDs. Don't think I've seen it on any other TLD. The Verisign Domain Name Industry Brief used to include a breakdown on single page sites versus multi-page sites. The number of responding websites is lower than the number of websites with an IP address and bots like ips-agent are used to measure this and other elements of the usage of TLDs.

Regards...jmcc

tangor

10:49 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the info!

I allow robots.txt to anyone, but take a hard look at any robots.txt to see if there is abuse (I only allow Bing and G and DDG).

lucy24

6:00 pm on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From observation, no true web browser uses the word "compatible" any longer.
Amusingly, I have just this moment come from looking up the PetalBot (sister crawler to AspiegelBot), whose UA starts with the element “(compatible)”. Compatible with what? Ah, who knows, who cares.

tangor

12:28 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"(compatible)" is about to join my naughty list ... I suspect is it old school hacking still in use by hackers/scrapers. Why not use it, they might say, after all computers are dumb and never get tired as long as there is power to the machine that runs the bots...

wilderness

1:09 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bing & Google both use compatible in their UA and there may be others.

lucy24

2:54 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



there may be others

:: quick check for 200 response to request other than robots.txt with UA containing string "compatible" ::

In addition to G and B there's Yandex, Seznam, Mail.RU, MJ12, LineSpider ... AhrefsBot, DotBot, BLEXBot ... Daum, DuckDuckGo faviconbot ... (At this point I got tired.) In short, all the better-known robots.

In fact, checking in the other direction indicates that the element “compatible” shows up at least ten times as often in legitimate robots as in, er, illegitimate ones.

Oh, and will you look at that:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)
That’s something to do with bing, but not the once-common plainclothes bingbot.

tangor

7:21 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I filter by both UA and IP, poking holes for IP I like. :)

Seriously, and this is a serious question, are the "compatible" well known bots feeding good traffic, or just better behaved than the bad actors?

My adventures in bot control have been spotty ... as in it has been 6-7 years ago since I was "this interested".

dstiles

9:25 am on Sep 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From what I've seen, compatible is used by many bots but no "real people". I use it as a "go away" telltale.

lucy24

5:06 pm on Sep 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



many bots but no "real people".
Comparatively rare, but does exist.

While looking this up, I noted that “compatible” occurs in the UA string of Google-Read-Aloud, which I believe is one of those YMMV entities (like anything with “translate” in the name).

dstiles

8:57 am on Sep 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm willing to dump those.

lucy24

3:49 pm on Sep 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just make sure to poke holes for authorized robots, since all the major search-engine spiders do contain the “compatible” element. You may not need to, though; the unwanted ones probably have other deficits that would get them banned regardless.

dstiles

9:43 am on Sep 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The way my env-based traps are organised, good bots get a free pass through most traps. I have few "it's a bot, let it pass" holes.

SumGuy

1:32 pm on Sep 27, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



I have seen these Verisign hits for a few months now. I've decided not to block them (so far). I figure Verisign is looking to see who is using what certificate authority, so this is business / competition reconnaissance for them. ?

lucy24

3:29 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This may be one of those cases where it actually doesn’t matter whether they are blocked or not, because a 403 response itself can carry the information the tool is looking for.