Forum Moderators: open

Message Too Old, No Replies

UA sstntechnology

         

tangor

6:34 am on Jul 21, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New one for me ... does not "appear" to be a bot ...

robots.txt: Yes ... then ignores
UA: SSTN/1.0.3 (compatible; support@sstntechnology.com)
IP: 103.139.9.xxx (Bangledesh)

The rip was images, css, favicon and three (3) html files. Denied for unusual behavior.

dstiles

8:39 am on Jul 21, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Surely, if it hits robots.txt then it's a bot?

lucy24

4:06 pm on Jul 21, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Once in a blue moon, a human will look at your robots.txt just out of curiosity. (On my primary site, this generally nets them the same disallow-everyone version I serve to humanoid fakers.)

Or it could be a human working for a bot-running company. If so, you’d want to operate out of a region with exceptionally low wages. Like, say, Bangladesh.

blend27

11:52 pm on Jul 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



are they missing human headers?

if so, next call is the last one, in my book. mask or not.

tangor

4:39 am on Jul 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heck, I look at other sites robots.txt all the time. That said, I don't generally turn around and hit a bunch of other pages, too.

It's call "research".

That's my story and I'm sticking to it.

blend27

4:07 pm on Jul 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



research... for that there is a humans.txt. mine are in several different languages.

lucy24

6:25 pm on Jul 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But humans.txt wouldn't tell you which robots a given site likes or dislikes, or tell you which parts of the site you don’t want crawled.

I don’t like robots that simply pretend to be human and don’t have anything robotic in their UA--even if it’s just a “HappyBot” thrown into the middle of an otherwise humanoid string--so I set an environmental variable called lying_bot on anything that claims to be Chrome or Firefox. It isn’t used for access control, but is used when determining which version of robots.txt is served out. (Lesson: If you want to snoop into my robots.txt, use a browser that contains neither of those two names ;))

tangor

11:37 pm on Jul 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since I have been hammered for humans.txt if finally put one up that is 129 bytes smaller than my 404. Sharing with you folks.
/* SITE */
Created: Last century
Standards: Low
Software: Works
Plugins: 120v IEC
Design: KISS

lucy24

11:56 pm on Jul 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I took a quick look at logs and was bemused to find several dozen requests for “humans.txt” in the last half-year or so, all blocked, all with UA
Go-http-client/1.1
(a UA I’ve never bothered to block by name because it always comes in with deficient headers anyway)
almost all from IP
192.36 or 192.72 (I think it's that Swedish server farm whose name I’ve gone blank on).

dstiles

1:05 pm on Jul 29, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I only get the occassional humans but at least one /.well-known/assetlinks.json per site per day and a few /ads.txt. All from googlebot.