Forum Moderators: open

Message Too Old, No Replies

panscient.com

         

grouchy sysadmin

1:38 am on Mar 8, 2016 (gmt 0)

10+ Year Member



Previously mentioned at,
[webmasterworld.com...]
[webmasterworld.com...]

Agent: panscient.com
Host: Verizon
72.73.128.0 - 72.87.47.255

I find the IP to be weird for a bot since the ptr is static-72-76-243-86.nwrknj.fios.verizon.net. It's hitting multiple unrelated websites and seems to be following the same aggressive pattern indicated in the above threads.

[edited by: keyplyr at 6:09 pm (utc) on Mar 17, 2016]
[edit reason] depersonalized IP address [/edit]

keyplyr

2:32 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Likely either a faked UA or this bot has gone back to random IPs. It also could have become distributed, although there's nothing on their site to indicate that.

lucy24

2:59 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, hey, I remember panscient. They're the ones who are too dumb to grasp the difference between a relative and an absolute link, so they're always asking for things like
/artists/piwik.php
that don't exist.

GET /boilerplate/piwik.js HTTP/1.1" 404 1458 "-" "panscient.com" 
Er, panscient, did you not see the "Disallow:" line? I know you've seen robots.txt; you request multiple copies on each visit. (Still haven't figured out TextWrangler's alphabet, since February 2014 cannot possibly be either the oldest or the newest of panscient's visits.)

Oh, and they're one of those robots that always make two parallel requests for the front page, one of them 301, meaning that they requested the page before learning (by studying the response to a robots.txt request) which domain name is correct.

:: memo to self now that I've been reminded: study headers and make decisions about hole-poking ::

[edited by: keyplyr at 6:09 pm (utc) on Mar 17, 2016]
[edit reason] depersonalized IP address [/edit]

keyplyr

3:15 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen evidence that allowing panscient.com (the real one) would benefit my interests. From their description, they seem like a *people info* data miner.

grouchy sysadmin

3:57 am on Mar 8, 2016 (gmt 0)

10+ Year Member



I just noticed the same issue with Piwik url's. For example,
/about/piwik.php HTTP/1.1" 404 134 "-" "panscient.com" "-"0.001- MISS
/about/our-community/piwik.js HTTP/1.1" 404 134 "-" "panscient.com" "-"0.001- MISS

It also seems to go after other weird url strings that I don't normally see bots trying. For example,
/wp-content/plugins/revslider/rs-plugin/js/?)!=e&p.data( HTTP/1.1" 403 134 "-" "panscient.com" "-"0.000- MISS

Definitely not the smartest bot on the net.

keyplyr

4:21 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If your site is *not* a Word Press CMS, this UA is most likely a fraud and the bot is looking for vulnerabilities to exploit.

If your site *is* a Word Press build, the bot may be authentic and the 404s due to the noted "stupidity."

See... we figured it all out :)

lucy24

5:27 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I pored over headers and found that yes indeedy, panscient has a highly distinctive, bordering on malformed header pattern. I would add it to the lockout list, but it seems to coincide with other behaviors that are already enough to get them banned so why add another line.

:: champing at the bit to inaugurate new access controls everywhere, because it's so ... much ... fun but I know very well I have to test everything slowly, no more than one added site every few weeks ::

keyplyr

6:02 am on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You're showing incredible restraint

tangor

11:03 pm on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I probably should use more surgeon like exclusions, but I like the shotgun method: take 'em all out. :)

UA, ips, referers (when found), headers... but I am reminded by this post it might be time to look back through my bans. Things might have changed. (but I doubt it)

lucy24

11:56 pm on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



showing incredible restraint

I had a bit of a setback when I discovered that
SetEnvIf HeaderName !. blahblah
does not mean "HeaderName is absent". Why doesn't anyone tell me this stuff? ("Errr, Lucy, the leading ! is Apache syntax, and the docs explicitly say you're using a Regular Expression. What more do you want?”) For that, you have to say
SetEnvIf HeaderName ^-?$ blahblah
(same locution as for the empty or--more often--absent UA string, where the -? is really superfluous but makes me feel safer). Unfortunately the !. mistake did not lead to a comprehensive 500 error --or I'd have noticed it right away-- because apparently a leading ! in a Regular Expression is interpreted as a literal exclamation mark (I double-checked in the text editor) though a leading ? is an error no matter how you slice it.

I really wish I had figured this out before one unattractive robot took advantage of the removal of old lockout rules to make a comprehensive sweep of my test site, which is 100% roboted-out aside from Twitter.* Not just the human-visible links but the invisible one inserted for that very purpose, as in '<a class = "honeypot" href' etcetera.


* Because if someone blunders across the site, whether via bing search or type-in, and is amused and tweets about it, more power to 'em ;)

keyplyr

12:16 am on Mar 9, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Things might have changed. (but I doubt it)
IMO it's prudent to revalidate *all* your bans constantly. With mobile, almost all server farms have opened cloud services and numerous ISPs have arisen to handle streaming, etc.

I have added hundreds of rewrites and poked holes for even more UAs in once blocked ranges.

Also, I've discovered dozens of reallocated & repurposed IP ranges once dedicated to hosting.