UA sstntechnology

Forum Moderators: open

Message Too Old, No Replies

UA sstntechnology

tangor

6:34 am on Jul 21, 2020 (gmt 0)

New one for me ... does not "appear" to be a bot ...

robots.txt: Yes ... then ignores
UA: SSTN/1.0.3 (compatible; support@sstntechnology.com)
IP: 103.139.9.xxx (Bangledesh)

The rip was images, css, favicon and three (3) html files. Denied for unusual behavior.

dstiles

8:39 am on Jul 21, 2020 (gmt 0)

Surely, if it hits robots.txt then it's a bot?

lucy24

4:06 pm on Jul 21, 2020 (gmt 0)

Once in a blue moon, a human will look at your robots.txt just out of curiosity. (On my primary site, this generally nets them the same disallow-everyone version I serve to humanoid fakers.)

Or it could be a human working for a bot-running company. If so, you’d want to operate out of a region with exceptionally low wages. Like, say, Bangladesh.

blend27

11:52 pm on Jul 23, 2020 (gmt 0)

are they missing human headers?

if so, next call is the last one, in my book. mask or not.

tangor

4:39 am on Jul 24, 2020 (gmt 0)

Heck, I look at other sites robots.txt all the time. That said, I don't generally turn around and hit a bunch of other pages, too.

It's call "research".

That's my story and I'm sticking to it.

blend27

4:07 pm on Jul 24, 2020 (gmt 0)

research... for that there is a humans.txt. mine are in several different languages.

lucy24

6:25 pm on Jul 24, 2020 (gmt 0)

But humans.txt wouldn't tell you which robots a given site likes or dislikes, or tell you which parts of the site you don’t want crawled.

I don’t like robots that simply pretend to be human and don’t have anything robotic in their UA--even if it’s just a “HappyBot” thrown into the middle of an otherwise humanoid string--so I set an environmental variable called lying_bot on anything that claims to be Chrome or Firefox. It isn’t used for access control, but is used when determining which version of robots.txt is served out. (Lesson: If you want to snoop into my robots.txt, use a browser that contains neither of those two names ;))

tangor

11:37 pm on Jul 27, 2020 (gmt 0)

Since I have been hammered for humans.txt if finally put one up that is 129 bytes smaller than my 404. Sharing with you folks.

/* SITE */
Created: Last century
Standards: Low
Software: Works
Plugins: 120v IEC
Design: KISS

lucy24

11:56 pm on Jul 27, 2020 (gmt 0)

I took a quick look at logs and was bemused to find several dozen requests for “humans.txt” in the last half-year or so, all blocked, all with UA
Go-http-client/1.1
(a UA I’ve never bothered to block by name because it always comes in with deficient headers anyway)
almost all from IP
192.36 or 192.72 (I think it's that Swedish server farm whose name I’ve gone blank on).

dstiles

1:05 pm on Jul 29, 2020 (gmt 0)

I only get the occassional humans but at least one /.well-known/assetlinks.json per site per day and a few /ads.txt. All from googlebot.