Forum Moderators: open

Message Too Old, No Replies

Yandex bots - incorrect rDNS?

         

dstiles

8:58 pm on May 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I posted this in the European SE forum hereabouts a week ago but so far no answer. Since it's to do with bots...

I've white-listed, as far as I can discover, the yandex bot IPs - they all have "spider" in their rDNS.

I'm now getting several hits purpoting to be yandex bots (the User-Agent is correct) but with rDNS names like sticker00.yandex.ru and piano2.yandex.ru; also the occasional "slovo". Does anyone know what these IPs are doing? Are they some kind of alternative bot, maybe a slightly different function (eg for thumbnails), or perhaps just badly named?

Pfui

9:56 pm on May 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure what they're doing but I don't like what they're doing when they do it on my sites. For example, from February:

plane03.yandex.ru
Yandex/1.01.001 (compatible; Win16; H)

robots.txt? Yes BUT promptly ignored. Pattern similar: x2 r.txt reqs, then x3-4 /:

12:57:53 /robots.txt
12:57:53 /robots.txt
12:57:55 /
12:57:55 /
12:57:59 /
12:57:59 /

And here's more, from last October:

Yandex (redux)
[webmasterworld.com...]

keyplyr

10:46 pm on May 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yandex is an enigma. I've got 6 different ranges white listed. I gave up trying to understand their rDNS, just happy they obey robots.txt and hoping they'll eventually send traffic my way.

dstiles

12:32 am on May 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yandex is an rDNS nightmare. Almost every /24 contains just two adjacent bot IPs, so whitelisting by IP results in over 70 separate records, just two IPs to a record. But they do have the word "spider" in each of them; except for those I mentioned which seem as busy as any of the true "spiders".

Pfui - I've seen a few odd ones like that - piano stands out though not as a high-activity one.

I wonder if they are thumbnail or image bots. Or, the thought just hit me, perhaps rented out to other engines or businesses.

Pfui

3:43 pm on Jul 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This just in from the OP's "sticker" subdomain reference. Mirror detector? Hm...

sticker01.yandex.ru
Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)

robots.txt? Yes

dstiles

8:42 pm on Jul 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah. I just got that as well - dozens of the things. Not sure what it is unless it's what it seems to claim: a download mirror detector.

I enabled the IP as a bot a few days ago - prior to that it behaved exactly like a bot so I gave in about the rDNS. Ditto a couple of other Yandex IPs that resolve to things like piano and slovo.

In checking this one I noticed that Yandex has changed its bot UAs (which is how I detected this). There are several UAs but the original "^yandexbot/1..." seems to be no longer one of them.

Their bots page is helpful (and in English as well as Russian) - follow the FAQ given. In particular they seem to identify their bots by function, such as "media" and "blog".

dstiles

9:40 pm on Aug 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note on MirrorDetector:

It's hitting forms blocked by robots.txt. It must be picking up the URL from other pages/menu. I do not know if it's actually reading robots.txt (haven't had time to check the site logs).

MirrorDetector is now banned, although I let in the normal Yandex bot.