Welcome to WebmasterWorld Guest from 54.196.224.166

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

new Yandex range?

     

lucy24

6:58 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Can't find any dates on this, and Forums search comes up cold.

141.8.128.0/18
Yandex (confirmed by free lookup, though I had to do some spot-checking before they'd admit to the full /18)

Just met a 141.8.189.112 asking for robots.txt with UA
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

wilderness

8:24 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



lucy,
Yandex is robots.txt compliant.

dstiles

8:38 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Thanks. Didn't have that one. Now blocked with hole for bot at 141.8.189.96 - 141.8.189.127.

The range above that is fairly deadly - 141.8.192.0/18

lucy24

9:48 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Yandex is robots.txt compliant.

Oh, I've got no problems with Yandex. At least not in the last few years. This time they didn't even ask for anything but robots.txt-- testing the waters?-- though I suppose eventually they will. I haven't flagged them as Ignore yet, so I'll notice.

The range above that is fairly deadly - 141.8.192.0/18

Does it all belong to a single host/colo type of entity? I've got it listed as about half a dozen different countries, mostly in /21 pieces.

wilderness

10:07 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



;)

RewriteCond %{REMOTE_ADDR} ^141\.

incrediBILL

10:22 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The real Yandex spider has the following RDNS:
spider-100-43-85-7.yandex.com

Your Yandex IP range isn't their spider.

jmccormac

10:50 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's Yandex on the lower address lookup. 141.8.128.0 - 141.8.131.255.
Free lookups tend to be quite iffy. If it is a RIPE address then it is best to use RIPE's (www.ripe.net) site directly. Yandex tends to be quite well behaved unlike Bing and its muppets who think they know about large website sitemaps.

Regards...jmcc

lucy24

11:57 pm on Jul 22, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Well, since their sole sign of life so far has been a request for robots.txt, any vigorous reaction is probably premature :)

Pfui

3:01 am on Jul 23, 2014 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Well, I'm not happy with Yandex.

Aside from their asking x20/day forEVER for the same-old-same-old full Disallow robots.txt that they get, they spawn robots.txt-noncompliant pests like today's twin hitters using a ridiculous UA to hit root x4:

master01h.kp.yandex.net
(a.k.a. 5.45.235.44)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:12:35 /
19:00:43 /

test-ro-bk01h.kp.yandex.net
(a.k.a. 5.255.215.226)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:18:22 /
19:13:30 /

Re the above, hailing from Yandex Russia:

5.45.235.0 - 5.45.235.127
5.45.192.0/18

5.255.192.0 - 5.255.255.255
5.255.192.0/18

lucy24

3:41 am on Jul 23, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



In fairness, Yandex's own docs stress that you should look at UA rather than IP. So you can make lockouts that look just like your existing rules for fake googlebots ("If it claims to be googlebot and isn't from this IP" and vice versa).

It's a lot harder with robots that are decently behaved in and of themselves, but can't get it together to pick an IP and stick with it (looking at you, mj12bot). At least with Yandex you can look up the IP. Until they introduce the yandex-app-engine, anyway.

keyplyr

7:40 am on Jul 23, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




I get an increasing amount of traffic from Yandex. IMO they're one of the good guys. The only thing that bugs me is they've always used way too many IPs to effectively set simple fraud filters.

dstiles

7:56 pm on Jul 23, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Lucy - 141.8.192.0/18

Seperate blocks, yes, but I lumped them into one for practical purposes.

Incredibill - what? Was that for me or something else? I checked the short range that I quote and it was approx that sub-range, give or take the odd IP.

incrediBILL

1:01 pm on Jul 24, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



In fairness, Yandex's own docs stress that you should look at UA rather than IP.


Which as we know is 100% garbage and in all fairness, that makes Yandex stupid if their docs really say that.

Anyone can fake their UA, nobody can fake their IP.

lucy24

6:17 pm on Jul 24, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Two prongs. For flat-out denial, use the IP-and-UA cross-check. Once you've locked out the ones that don't belong, you can safely leave the rest to robots.txt.

:: detour to check ::

Version 1:
[help.yandex.com...]
All Yandex robots have names ending in “yandex.ru”, “yandex.net” or “yandex.com”. If the host's name has a different ending the robot does not belong to Yandex


Version 2:
[help.yandex.com...]
There are many IP addresses that Yandex robots can originate from, and these IP addresses are subject to change. We are therefore unable to offer a list of IP addresses and we do not recommend using a filter based on IP addresses.

incrediBILL

1:32 pm on Jul 25, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



you can safely leave the rest to robots.txt.


Robots.txt is hardly 'safe', even with Google. I might post a thread about that in the robots.txt forum and start a small riot later, but i digress.

Only if you're using a robots.txt validation script that validates all inbound requests against my robots.txt file, otherwise robots.txt overall is a joke, even from the "valid" search engines.

I took a PHP robots.txt script used by scraper routines to process robots.txt and reversed it so all inbound requests to a site run through the same code.

Simply point the code at your robots.txt file and feed it the inbound user agent and requested page and it spits out whether it's allowed or denied, and suddenly robots.txt has actual teeth and can respond with a 403 forbidden.

Cute, eh?

Overall, the real Yandex seems to behave properly with the exception some have mentioned that if it's 100% blocked it throws a tantrum banging on robots.txt all day long waiting for you to change your mind.

They aren't the only spider that throws a robots.txt denial tantrum either, one of many.