Forum Moderators: open

Message Too Old, No Replies

Yahoo Japan

         

lucy24

8:57 pm on Aug 17, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From the name, one would swear it had been around forever. But I never set eyes on it until late July (2023).

IP: 182.22.30 (Yahoo Japan) 

UA:
Mozilla/5.0 (compatible; Y!J-WSC/1.0; +https://yahoo.jp/3BSZgF)
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; Y!J-WSC/1.0; +https://yahoo.jp/3BSZgF) Chrome/113.0.0.0 Safari/537.36

robots.txt: yes, appears to be compliant

headers: humanoid
The first (shorter) UA is for pages and robots.txt; the longer one is for scripts and stylesheets. It follows a slightly unusual pattern, where each page request is accompanied by at most one script or stylesheet, giving the HTML as referer. (I think most search engines do this now, probably wisely, since some sites will serve different stylesheets under the same name.) If the first stylesheet associated with a page is something it has previously picked up, it gets something further down the list, if any. To date I haven’t seen it pick up other types of supporting files such as images or fonts.

“Appears to be” compliant because, thanks to showing up out of nowhere with humanoid headers, it never went through the usual access tests. But so far it hasn't requested anything from a roboted-out directory, notably including the analytics script that is attached to all pages.

Oh, and the URL in the UA redirects to an information page in Japanese. (Forgot to check this before posting.) I have no Japanese-language content.

:: business with lang ?= ?"(?!en|iu|de|kl|la|fr) to ensure I'm not talking out of my hat ::

Oh, look at that. One occurrence of <i lang = "ja">kami-shimo</i> and three of <i lang = "ja">sake</i>, all in a single book. It would be entertaining if this had proved to be the very first page the robot homed in on, but this is not the case.

<tangent>
Long ago I used this very page for experimenting with G*** translate. I learned that they don’t look at "lang" tags, and hence come to grief over “sake”.
</tangnent>

not2easy

6:06 pm on Aug 18, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Someone drank Pete's sake, huh?

They might have followed a link somewhere to send a bot. I have not seen it, ever.

SumGuy

12:07 am on Aug 19, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yahoo has about 66k IP's in AS23816 of which 182.22.0.0/17 is about half of that. I happen to not be blocking any of that AS.

What I've seen from 182.22/17 is Yahoo's crawler grabbing my site's handful of major index pages on Dec 14 / 2020 including robots.txt. The UA was:

Y!J-ASR/1.0 crawler (https :// www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/)

It looks like that URL is no longer functional. That was direct to my https site. On Jan 16 this year it hit robots and my landing page, but used http, did not follow the 301 redirect to https.

Then just a few weeks ago on July 28 it asks (https) for robots.txt and a single pdf file (where it botches the file name so it gets a 404). This time the UA is:

Y!J-ASR/1.0 crawler (https :// support.yahoo-net.jp/PccSearch/s/article/H000007955)

All these hits come from the same /24 (182.22.28.0/24) but different IP's.

I searched my logs for your UA string Y!J-WSC but found nothing. I re-searched my logs looking for my UA string Y!J-ASR and get exactly what I said above and nothing else.