Forum Moderators: open

Message Too Old, No Replies

Linespider

         

lucy24

6:54 pm on Oct 31, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IP: 203.104.154.xyz
UA: Mozilla/5.0 (compatible;Linespider/1.1;+https://lin.ee/4dwXkTH)
robots.txt: YES
Requests: front page and all authorized pages linked therefrom
xyz = various from .135-144 (sic, not 136-143, so at least .128/27)
Didn't check, but I think .ee is Estonia, though the IP range was Japan last time I looked it up (definitely APNIC, not RIPE)
Appears to be robots.txt compliant, as they did not ask for any material in the roboted-out directory that is linked from the front page.

If they had waited a few days I could have verified that they can use HTTPS, because this particular site is about to move. (Having run out of excuses not to...)

lucy24

3:28 am on Nov 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update: Well, they came back three days later. I can therefore say that their behavior when faced with HTTPS is this:

HTTP request for robots.txt followed by root, the latter of which gets a redirect. (I always serve robots.txt as-is, free of canonicalization, so nobody has any excuse to say they couldn't see it.)

start over again at HTTPS, beginning with a fresh robots.txt

Interesting line from the page in the UA:
Even if information is published on the internet, it does not mean that the copyright holder or website owner has given permission to publish, copy, or use information obtained using any search method, including search robots.
Ay-yup.

lucy24

5:15 pm on Nov 18, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Further update: Turns out it's Yeti by another name. Linespider visits come in tandem with

Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.0 Safari/537.36 (compatible; Yeti/1.1; +http://naver.me/spd)

Aside from operating from the same IP at the same time, the giveaway is that page requests from Linespider are immediately followed by CSS requests from Yeti, citing the page as referer.

The interesting part is that, when last seen, the pattern was:
pages: Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/spd)
css: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.0 Safari/537.36

That was from 125.209.235, but it's been a few months.

Huh.

not2easy

6:59 pm on Nov 18, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Aha! You're ahead of the news - Yahoo Japan to merge with Naver's Line [webmasterworld.com] app ;)

dstiles

11:07 am on Nov 19, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does this mean linespider needs to be allowed?

And is yeti only fetching css now?

lucy24

6:32 pm on Nov 19, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does this mean linespider needs to be allowed?
Well, they appear to be robots.txt compliant. But then, I'm pretty lax about who I let in; most of the time they just have to ask. In the specific case of Linespider, they send humanoid headers, and have yet to request anything that's disallowed, so I never bothered to formally evaluate them. Curious detail: the Accept-Language header consistently says Japanese, not Korean. Wonder why?

The URL in the UA redirects to
[help.naver.com...]
which would be more useful to me if I knew Korean. (Fascinatingly, Google Translate claims to be unable to translate it. You'd think an About Our Robots page would be just so much boilerplate.)

When last seen--a few days ago--Linespider did pages (and robots.txt) while Yeti did css. When previously seen--about half a year ago--Yeti did pages and Chrome did css. The pattern is otherwise identical.

dstiles

11:48 am on Nov 20, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Japanese, not Korean.

I always thought yeti was Japanese. Maybe they were taken over?

I'll enable it and see what happens. :)

Point of (no) interest:
Yeti/1.1;\s+http://etc
Linespider/1.1;+https://etc

lucy24

6:15 pm on Nov 20, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Careful with the \s+ sequence or there will be Unintended Consequences ;)

dstiles

11:16 am on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good call, Lucy! Never thought of it that way, although I have already escaped it. I used the \s above for clarification.