Forum Moderators: open

Message Too Old, No Replies

User agent : KOCMOHABT

User agent of new browser under development

         

Martin Potter

3:55 pm on Oct 3, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



This morning I noticed a new (to me) User Agent : KOCMOHABT (https://kozmonavt.ml/) Mozilla/5.0 (Web Explorer)

The spelling is Cyrillic script transliterated to Latin for "cosmonaut". There is a project on Github, registered in June of 2021, for a browser called Kosmonaut ("a web browser agent for the space age"), associated with Twilco. In the project description, the developer includes a famous quotation by Yuri Gagarin.

The associated TLD (.ml) equates to Mali, possibly one of those African nations strongly influenced by Russia, which might explain the spelling. Yet the IP address (129.153.xx.xxx) of the hits on my site leads to a block of IPs assigned to Oracle in California.

The browser only asked for three things, robots.txt, the default home page (/), and one of the apple-touch-icons.

lucy24

4:46 pm on Oct 3, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The browser only asked for three things, robots.txt, the default home page (/), and one of the apple-touch-icons.
In that order? We are all too familiar with malign entities that ask for the front page, possiby other pages, and only later “Oh, yeah, and let me have robots.txt too.”

Will keep my eyes peeled in any case. (Logs don’t show any blocked requests, which I wouldn’t notice unless I’m actively searching.)

Martin Potter

8:34 pm on Oct 3, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yes, in that exact order. Robots.txt was first, and the requests were all made very quickly. Server log showed them all within the same second, but how is that possible, given latency and propagation delay, etc?

There was no referer, like Google. If I understand correctly how this works (?), then I think the request was made directly at my IP address. Probably (?) sequentially by some script.

I will keep an eye out for this one again.

lucy24

11:10 pm on Oct 3, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think the request was made directly at my IP address
Do you have your own server, with your site set as the default? Otherwise, requests without a hostname would never reach any particular site.

But I don't understand the connection between no-referer and IP address. Robots don't normally send a referer unless, paradoxically, they are up to no good and are lying in their teeth about who sent them.

If your logs are like most, they don't show anything less than a full second. There could be multiple consecutive requests within that same second. And if your logs are like mine, there can be hiccups between the request itself and the act of logging, so you can never be 100% certain that a series of requests was received in the exact order shown in logs.

:: detour to experiment in logs using
^[^:]+:(\d\d:\d\d:\d\d) (.+\n[^:]+:\1 ){some-number-here}
(RegEx pattern for Apache log format, ymmv) ::

Yup. By the time I (and BBEdit) got tired, I’d arrived at well over 100 requests within the same second--happily in my case not evil robots, but a handful of pages that include a colossal number of images.* I do hope those human visitors at least stayed on the page long enough to look at all those pictures.

* I do have one isolated page that uses sprites, but generally I think it's more trouble than it's worth, especially with smartphones that defer loading on their own initiative, without being told.

Martin Potter

2:54 am on Oct 5, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks very much, Lucy, for that information and explanation.

It would be nice to have my own server, but, no, my site is hosted. After what you said, I had a more careful look at the logs (Lightspeed/Apache server viewed from cPanel) and, of course, there were lots of examples of multiple requests from other visitors that occurred "all within the same second". Learn something every day! Thank you.

Haven't seen KOCMOHAVT again yet.

tangor

7:38 am on Oct 5, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just wait.... once around comes around! :)

Most bots that ignore my robots.txt (I do white listing, all others are denied) won't get me over excited if the take is as minimal as in the OP.

Chalk it up as the cost of doing business with stupid bots.

Martin Potter

4:25 pm on Oct 6, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yes, and it seems there is no end to it.

("If there is any way to abuse a system used by the public, for personal or corporate gain, it will be found." -- Finagle's 9th Law)

martinibuster

4:36 pm on Oct 6, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Bots change user agents sometimes. I have seen different bots coming from Oracle IP addys. Not sure why.

[webmasterworld.com...]

Martin Potter

1:36 am on Oct 8, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks, martinibuster. Your referenced thread is good info to remember. The bot world is full of twists and turns.

lucy24

4:28 am on Oct 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, hey, I got one. Logs for 4 October, HTTP branch of main site:
192.18.145.abc - - [04/Oct/2022:14:16:31 -0700] "GET / HTTP/1.1" 403 3495 
"-" "KOCMOHABT (https://kozmonavt.ml/) Mozilla/5.0 (Web Explorer)"

Blocked due to a single header deficit (of which they don't need to know the details).

blend27

1:56 pm on Oct 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Isn't "Mozilla/5.0 (Web Explorer)" in UA should be blocked by default any way?

BTW,

1. It never hurts to ask for a Static IP when you don't have you own server(couple $ a month more) on a shared hosting, then drop all requests to IP directly.
2. Also ask for the timestamp to be full string when speaking to a hosting Rep. Look at the data that might be useful(read the Docs of server for info) and ask them(hosting provider) to be included, never hurts to ask...
3. NOTE: the data is available at the request time when request is made to the server anyway

Serving content via a script is a part of most CSMs at this point, e,g. /category/widgets/ is not really a sub-folder in sub-folder anyway, is it?

nor is:

RewriteRule robots\.txt robotsq.php [NC,L]

where from request data in headers you get all the info from UA to all the way to BoomShakaLaka level

lucy24

5:10 pm on Oct 8, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Isn't "Mozilla/5.0 (Web Explorer)" in UA should be blocked by default any way?
Should it? I just checked logs, and our font-flattened Cosmonaut is literally the only occurrence of “Web Explorer” in this entire calendar year.
RewriteRule robots\.txt robotsq.php [NC,L]
I just say robots.php--but if they ask for mis-cased ROBOTS.txt, that's on them and why should they get anything? Same goes for things like “blogs/robots.txt”, so I use a ^ anchor.

Currently the rewrite enables three extras:
#1 incorporate material from an allrobots.txt that can be shared by multiple sites
#2 log headers
#2 throw in some extra rules so that certain requests get only a minimalist Disallow-everyone (because some aspect of the request makes it plain they are up to no good)

At my host, an IPv4 costs money but an IPv6 is free. A side effect of having an IPv6 is that logs show when a request comes from IPv6; otherwise everything is shown as IPv4. (Disclaimer: I do not actually know how this works. I do know that the SeznamBot, in particular, was understandably extatic* to be able to use IPv6 and bypass the overcrowded IPv4 field.) But requests without a hostname simply don’t reach the site, even on those that have a fixed IP.

But I digress.

* Oh, er, oops. That’s not the current century’s spelling, is it. I’ll leave it, though.

blend27

9:24 pm on Oct 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That’s not the current century’s spelling


And then imagine if every request to Robots.txt was redirected to MyOwnRobots.txt on my own site and.... if you spider the latter "IT" would be allowed to lurk from/with same credentials(we control those on site level).

Damn those who create HTML Pages and ask for HOW Spiders work without understanding that 'THEY RAE BEING SCRAPED'.

blend27

9:32 pm on Oct 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



re:User agent : KOCMOHABT,


Holly Molly Cyrillic is allowed on a forum?, haven't been posting in a while here....