homepage Welcome to WebmasterWorld Guest from 54.161.198.195
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Gold Sponsor 2015!
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
new Yandex range?
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4689812 posted 6:58 pm on Jul 22, 2014 (gmt 0)

Can't find any dates on this, and Forums search comes up cold.

141.8.128.0/18
Yandex (confirmed by free lookup, though I had to do some spot-checking before they'd admit to the full /18)

Just met a 141.8.189.112 asking for robots.txt with UA
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 8:24 pm on Jul 22, 2014 (gmt 0)

lucy,
Yandex is robots.txt compliant.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4689812 posted 8:38 pm on Jul 22, 2014 (gmt 0)

Thanks. Didn't have that one. Now blocked with hole for bot at 141.8.189.96 - 141.8.189.127.

The range above that is fairly deadly - 141.8.192.0/18

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4689812 posted 9:48 pm on Jul 22, 2014 (gmt 0)

Yandex is robots.txt compliant.

Oh, I've got no problems with Yandex. At least not in the last few years. This time they didn't even ask for anything but robots.txt-- testing the waters?-- though I suppose eventually they will. I haven't flagged them as Ignore yet, so I'll notice.

The range above that is fairly deadly - 141.8.192.0/18

Does it all belong to a single host/colo type of entity? I've got it listed as about half a dozen different countries, mostly in /21 pieces.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 10:07 pm on Jul 22, 2014 (gmt 0)

;)

RewriteCond %{REMOTE_ADDR} ^141\.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 10:22 pm on Jul 22, 2014 (gmt 0)

The real Yandex spider has the following RDNS:
spider-100-43-85-7.yandex.com

Your Yandex IP range isn't their spider.

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 10:50 pm on Jul 22, 2014 (gmt 0)

That's Yandex on the lower address lookup. 141.8.128.0 - 141.8.131.255.
Free lookups tend to be quite iffy. If it is a RIPE address then it is best to use RIPE's (www.ripe.net) site directly. Yandex tends to be quite well behaved unlike Bing and its muppets who think they know about large website sitemaps.

Regards...jmcc

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4689812 posted 11:57 pm on Jul 22, 2014 (gmt 0)

Well, since their sole sign of life so far has been a request for robots.txt, any vigorous reaction is probably premature :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4689812 posted 3:01 am on Jul 23, 2014 (gmt 0)

Well, I'm not happy with Yandex.

Aside from their asking x20/day forEVER for the same-old-same-old full Disallow robots.txt that they get, they spawn robots.txt-noncompliant pests like today's twin hitters using a ridiculous UA to hit root x4:

master01h.kp.yandex.net
(a.k.a. 5.45.235.44)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:12:35 /
19:00:43 /

test-ro-bk01h.kp.yandex.net
(a.k.a. 5.255.215.226)
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
17:18:22 /
19:13:30 /

Re the above, hailing from Yandex Russia:

5.45.235.0 - 5.45.235.127
5.45.192.0/18

5.255.192.0 - 5.255.255.255
5.255.192.0/18

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4689812 posted 3:41 am on Jul 23, 2014 (gmt 0)

In fairness, Yandex's own docs stress that you should look at UA rather than IP. So you can make lockouts that look just like your existing rules for fake googlebots ("If it claims to be googlebot and isn't from this IP" and vice versa).

It's a lot harder with robots that are decently behaved in and of themselves, but can't get it together to pick an IP and stick with it (looking at you, mj12bot). At least with Yandex you can look up the IP. Until they introduce the yandex-app-engine, anyway.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 7:40 am on Jul 23, 2014 (gmt 0)


I get an increasing amount of traffic from Yandex. IMO they're one of the good guys. The only thing that bugs me is they've always used way too many IPs to effectively set simple fraud filters.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4689812 posted 7:56 pm on Jul 23, 2014 (gmt 0)

Lucy - 141.8.192.0/18

Seperate blocks, yes, but I lumped them into one for practical purposes.

Incredibill - what? Was that for me or something else? I checked the short range that I quote and it was approx that sub-range, give or take the odd IP.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 1:01 pm on Jul 24, 2014 (gmt 0)

In fairness, Yandex's own docs stress that you should look at UA rather than IP.


Which as we know is 100% garbage and in all fairness, that makes Yandex stupid if their docs really say that.

Anyone can fake their UA, nobody can fake their IP.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4689812 posted 6:17 pm on Jul 24, 2014 (gmt 0)

Two prongs. For flat-out denial, use the IP-and-UA cross-check. Once you've locked out the ones that don't belong, you can safely leave the rest to robots.txt.

:: detour to check ::

Version 1:
[help.yandex.com...]
All Yandex robots have names ending in “yandex.ru”, “yandex.net” or “yandex.com”. If the host's name has a different ending the robot does not belong to Yandex


Version 2:
[help.yandex.com...]
There are many IP addresses that Yandex robots can originate from, and these IP addresses are subject to change. We are therefore unable to offer a list of IP addresses and we do not recommend using a filter based on IP addresses.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4689812 posted 1:32 pm on Jul 25, 2014 (gmt 0)

you can safely leave the rest to robots.txt.


Robots.txt is hardly 'safe', even with Google. I might post a thread about that in the robots.txt forum and start a small riot later, but i digress.

Only if you're using a robots.txt validation script that validates all inbound requests against my robots.txt file, otherwise robots.txt overall is a joke, even from the "valid" search engines.

I took a PHP robots.txt script used by scraper routines to process robots.txt and reversed it so all inbound requests to a site run through the same code.

Simply point the code at your robots.txt file and feed it the inbound user agent and requested page and it spits out whether it's allowed or denied, and suddenly robots.txt has actual teeth and can respond with a 403 forbidden.

Cute, eh?

Overall, the real Yandex seems to behave properly with the exception some have mentioned that if it's 100% blocked it throws a tantrum banging on robots.txt all day long waiting for you to change your mind.

They aren't the only spider that throws a robots.txt denial tantrum either, one of many.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved