Welcome to WebmasterWorld Guest from 54.211.62.139

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

new Yandex range?

     
6:58 pm on Jul 22, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13267
votes: 363


Can't find any dates on this, and Forums search comes up cold.

141.8.128.0/18
Yandex (confirmed by free lookup, though I had to do some spot-checking before they'd admit to the full /18)

Just met a 141.8.189.112 asking for robots.txt with UA
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
8:24 pm on July 22, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


lucy,
Yandex is robots.txt compliant.
8:38 pm on July 22, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3135
votes: 4


Thanks. Didn't have that one. Now blocked with hole for bot at 141.8.189.96 - 141.8.189.127.

The range above that is fairly deadly - 141.8.192.0/18
9:48 pm on July 22, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13267
votes: 363


Yandex is robots.txt compliant.

Oh, I've got no problems with Yandex. At least not in the last few years. This time they didn't even ask for anything but robots.txt-- testing the waters?-- though I suppose eventually they will. I haven't flagged them as Ignore yet, so I'll notice.

The range above that is fairly deadly - 141.8.192.0/18

Does it all belong to a single host/colo type of entity? I've got it listed as about half a dozen different countries, mostly in /21 pieces.
10:07 pm on July 22, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


;)

RewriteCond %{REMOTE_ADDR} ^141\.
10:22 pm on July 22, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


The real Yandex spider has the following RDNS:
spider-100-43-85-7.yandex.com

Your Yandex IP range isn't their spider.
10:50 pm on July 22, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2529
votes: 47


That's Yandex on the lower address lookup. 141.8.128.0 - 141.8.131.255.
Free lookups tend to be quite iffy. If it is a RIPE address then it is best to use RIPE's (www.ripe.net) site directly. Yandex tends to be quite well behaved unlike Bing and its muppets who think they know about large website sitemaps.

Regards...jmcc
11:57 pm on July 22, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13267
votes: 363


Well, since their sole sign of life so far has been a request for robots.txt, any vigorous reaction is probably premature :)
3:01 am on July 23, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Well, I'm not happy with Yandex.

Aside from their asking x20/day forEVER for the same-old-same-old full Disallow robots.txt that they get, they spawn robots.txt-noncompliant pests like today's twin hitters using a ridiculous UA to hit root x4:

master01h.kp.yandex.net
(a.k.a. 5.45.235.44)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:12:35 /
19:00:43 /

test-ro-bk01h.kp.yandex.net
(a.k.a. 5.255.215.226)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:18:22 /
19:13:30 /

Re the above, hailing from Yandex Russia:

5.45.235.0 - 5.45.235.127
5.45.192.0/18

5.255.192.0 - 5.255.255.255
5.255.192.0/18
3:41 am on July 23, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13267
votes: 363


In fairness, Yandex's own docs stress that you should look at UA rather than IP. So you can make lockouts that look just like your existing rules for fake googlebots ("If it claims to be googlebot and isn't from this IP" and vice versa).

It's a lot harder with robots that are decently behaved in and of themselves, but can't get it together to pick an IP and stick with it (looking at you, mj12bot). At least with Yandex you can look up the IP. Until they introduce the yandex-app-engine, anyway.
7:40 am on July 23, 2014 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7029
votes: 181



I get an increasing amount of traffic from Yandex. IMO they're one of the good guys. The only thing that bugs me is they've always used way too many IPs to effectively set simple fraud filters.
7:56 pm on July 23, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3135
votes: 4


Lucy - 141.8.192.0/18

Seperate blocks, yes, but I lumped them into one for practical purposes.

Incredibill - what? Was that for me or something else? I checked the short range that I quote and it was approx that sub-range, give or take the odd IP.
1:01 pm on July 24, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


In fairness, Yandex's own docs stress that you should look at UA rather than IP.


Which as we know is 100% garbage and in all fairness, that makes Yandex stupid if their docs really say that.

Anyone can fake their UA, nobody can fake their IP.
6:17 pm on July 24, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13267
votes: 363


Two prongs. For flat-out denial, use the IP-and-UA cross-check. Once you've locked out the ones that don't belong, you can safely leave the rest to robots.txt.

:: detour to check ::

Version 1:
[help.yandex.com...]
All Yandex robots have names ending in “yandex.ru”, “yandex.net” or “yandex.com”. If the host's name has a different ending the robot does not belong to Yandex


Version 2:
[help.yandex.com...]
There are many IP addresses that Yandex robots can originate from, and these IP addresses are subject to change. We are therefore unable to offer a list of IP addresses and we do not recommend using a filter based on IP addresses.
1:32 pm on July 25, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


you can safely leave the rest to robots.txt.


Robots.txt is hardly 'safe', even with Google. I might post a thread about that in the robots.txt forum and start a small riot later, but i digress.

Only if you're using a robots.txt validation script that validates all inbound requests against my robots.txt file, otherwise robots.txt overall is a joke, even from the "valid" search engines.

I took a PHP robots.txt script used by scraper routines to process robots.txt and reversed it so all inbound requests to a site run through the same code.

Simply point the code at your robots.txt file and feed it the inbound user agent and requested page and it spits out whether it's allowed or denied, and suddenly robots.txt has actual teeth and can respond with a 403 forbidden.

Cute, eh?

Overall, the real Yandex seems to behave properly with the exception some have mentioned that if it's 100% blocked it throws a tantrum banging on robots.txt all day long waiting for you to change your mind.

They aren't the only spider that throws a robots.txt denial tantrum either, one of many.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members