Forum Moderators: open

Message Too Old, No Replies

Yeti

new UA

         

keyplyr

11:12 pm on Aug 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/spd)
Protocol: HTTP/1.1
Robots.txt: ?
Host: navercorp.com (Korean Search Engine)
125.209.192.0 - 125.209.255.255
125.209.192.0/18

The only real issue I have with naver.com search is their index caches & translates web pages, even if you use noarchive or notranslate meta tags or headers.

While my properties have excellent ranking at naver.com, I rarely see any referred traffic.

Previous:
[webmasterworld.com...]

lucy24

1:14 am on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Robots.txt: No
I’d call it a “Yes but...” Since their reappearance last fall--although now that you mention it I haven't seen much of them in recent months--yeti visits have typically looked like this:
125.209.235.abc - - [20/Mar/2018:20:40:35 -0700] "GET /robots.txt HTTP/1.1" 200 976 "-" "Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/bot)" 
125.209.235.abc - - [20/Mar/2018:20:40:38 -0700] "GET / HTTP/1.1" 200 2808 "-" "Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/bot)"
125.209.235.abc - - [20/Mar/2018:20:40:39 -0700] "GET /sharedstyles.css HTTP/1.1" 200 2066 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
125.209.235.abc - - [20/Mar/2018:20:40:39 -0700] "GET /piwik/piwik.js HTTP/1.1" 403 1838 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
where the 403 is because I don’t let robots crawl piwik, and sometimes a No Admittance sign isn't enough. Note the humanoid UA, used consistently for non-page files.

keyplyr

1:38 am on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did originally say "Robots.txt: No" then probably while you were looking through your logs, changed it to "?" because while I couldn't find a robots.txt request from this visit, I didn't have the motivation to look through the last 72 hours.

lucy24

2:05 am on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



while I couldn't find a robots.txt request from this visit, I didn't have the motivation to look through the last 72 hours
Yeah, I did find a couple individual visits that happened not to start with a robots.txt request, so you may have hit one of those.

Anyway, I should have put this on a separate line:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36
This UA is not unique to Yeti--other random robots use it too--but it's definitely several years out of date for humans.

keyplyr

2:22 am on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, again IMO just because requests come from a SE IP range, doesn't indicate those request are related to the known bot from that SE range (think MS and Bing.)

That naver /18 could have residential or private company connectivity, VPNs, schools, etc that may use a browser UA. The fact that the UA is outdated *may* not be much of an indicator being this is Korea.

In other words, your example may not be a "plain clothes" Yeti (to use a favorite term of yours.)

lucy24

5:38 pm on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Look at the timestamps. On my site, requests that arrive in consecutive seconds from the same IP and/or UA are the same visitor, on consecutive lines of access logs.* (The most common exception is a Yandex-related activity, where two or three requests tend to get entangled together: Page A, Page B, stylesheet A, stylesheet B, all in the same second.) In the case of Yeti, that's always the pattern. I just picked one at random.

In fact, this is one of the cases where patterns are easier to detect on a low-traffic site, because you don't have multiple unrelated requests mixed up together in logs.


* As you've probably noticed, this particular host can be a bit capricious about exact sequence, so I'll often see a page request logged several lines after supporting-file requests with a later timestamp.

keyplyr

6:25 pm on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Look at the timestamps
OK, I see that in your 1st post now.

I'm usually on mobile. Longer quotes are obscured and I wouldn't see more than 3 or 4 (short) lines unless I scroll.