homepage Welcome to WebmasterWorld Guest from 54.242.18.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 46 message thread spans 2 pages: 46 ( [1] 2 > >     
Yandex.ru now running yandex.com spider
Wanted to let others know they may want to update their black/white list.
JAB Creations




msg:4390155
 7:09 am on Nov 23, 2011 (gmt 0)

I noticed a higher than average number of rejects and investigated my glorious reject log for my site to find that a bot running from yandex.com was being denied. Well I did a little research (and even found Brett posting about yandex.com though not specifically the associated bot). It looks legit as far as I can tell though someone mentioned that the IP's are in a block associated with some undesirables. The main point is that you may or may not want to update your black/white lists accordingly. I would not mind hearing about the associated undesirables here though.

- John

 

Staffa




msg:4390207
 10:23 am on Nov 23, 2011 (gmt 0)

Is the yandex.com bot coming from RU IPs or from elsewhere ?

So far I have only seen their bot coming from RU.

lucy24




msg:4390215
 10:59 am on Nov 23, 2011 (gmt 0)

Should be 77.88.0-63 for the regular yandexbot, 95.108.128-255 for images. I've gone back and forth on them; currently they're behaving themselves and even sending the occasional visitor. Mostly for images, as you might expect.

Staffa




msg:4390251
 1:16 pm on Nov 23, 2011 (gmt 0)

Looking through last month's files I noticed the Y###.com in the UA, however, since the IP numbers are RU no cigar for them.

I never took notice of the .com because of the IPs.

dstiles




msg:4390482
 10:00 pm on Nov 23, 2011 (gmt 0)

I have it here from 199.36.240.1 and 199.36.240.5 as of about 6 weeks ago on the latter and early august on the former. Both from the Yandex US range of IPs.

Fine by me - they always seem to be reasonably good except for the occasional robots.txt infringement, which I catch by other means anyway.

I only did a brief check for rDNS on this range so there are probably others - eg there is an image bot further up the 240/24 range (which I block anyway).

JAB Creations




msg:4390488
 10:20 pm on Nov 23, 2011 (gmt 0)

These IP's were rejected until I added it to my whitelist: 77.88.42.27, 188.132.239.189, 199.36.240.1 and 199.36.240.5.

dstiles, try contacting them, they seem very responsive. I reported that after doing a search for my site (not site:, just the terms) that the second result was a scrapper/reported attack site (with content stolen from sites other then mine) in Firefox. They're looking in to it and hopefully it'll be removed from the results.

I have gotten actual visitors from Yandex too and their bots seem to behave themselves just fine as far as I can see.

- John

lucy24




msg:4390502
 11:21 pm on Nov 23, 2011 (gmt 0)

I reported that after doing a search for my site (not site:, just the terms) that the second result was a scrapper/reported attack site (with content stolen from sites other then mine) in Firefox.

That reminds me of something to watch out for. There's a whole family of Russian robots that have their own spin on the auto-referer: their forged referer says in full

http://yandex.ru/yandsearch?text=example.com

where example.com is your domain name. The "text=" part is the standard Yandex version of "q=" in genuine referers. These are pretty predictable and can generally be blocked by IP.

keyplyr




msg:4390545
 12:49 am on Nov 24, 2011 (gmt 0)

I get an increasing amount of traffic from Yandex. However dealing with their numerous IP crawl ranges can be tedious.

dstiles




msg:4390835
 9:56 pm on Nov 24, 2011 (gmt 0)

JAB, Keyplr - I have 103 bot IP ranges currently in my database, most comprising only three or four actual IPs (eg 77.88.42.27 is actually 77.88.42.25 - 77.88.42.27). This is certainly one of the annoyances of the bot.

Lucy - I've seen several bots and non-bots try the referrer scam. It's not just Russia by a long way.

keyplyr




msg:4390854
 11:11 pm on Nov 24, 2011 (gmt 0)

I have 103 bot IP ranges currently in my database, most comprising only three or four actual IPs

AFAIK many early ranges have been dropped. Yandex uses 8 ranges specifically for crawling, although your experience may be different:

77.88.22.0 - 77.88.22.127
77.88.42.0 - 77.88.42.255
87.250.255.0 - 87.250.255.255
93.158.146.0 - 93.158.146.255
178.154.233.0 - 178.154.233.255
188.132.239.0 - 188.132.239.255
199.36.240.0 - 199.36.243.255
213.180.207.0 - 213.180.207.255

I allow only their UA from these ranges and have not blocked any legit Yandex bots in over a year.

lucy24




msg:4390871
 2:40 am on Nov 25, 2011 (gmt 0)

Another quirk: With me, Yandex always uses the exact same two IPs: one for the regular bot, one for Yandex Images. If I didn't know better I would think it was a teeny little robot that only owned those two addresses. They also consistently use the same two addresses-- different from mine, but internally always the same-- at my art-studio site. It's like having your own personal yandexbot. Feel like I should invite it in for tea ;)

keyplyr




msg:4390876
 3:59 am on Nov 25, 2011 (gmt 0)

Yandex always uses the exact same two IPs: one for the regular bot, one for Yandex Images.

Prey tel

lucy24




msg:4390878
 4:17 am on Nov 25, 2011 (gmt 0)

Oh, there's nothing special about them. They're each in the appropriate range. But I would love to know if Yandex behaves that way with all sites, or if there's some criterion. It does raise a great mental picture of Robots That Specialize. Since my sister site has a different pair of bots, it obviously isn't something obvious like shared userspace on a server.

dstiles




msg:4391127
 9:27 pm on Nov 25, 2011 (gmt 0)

keyplr - I'll run a yandex DNS check again and see what I come up with. Most are June 2010 and earlier.

JAB Creations




msg:4391185
 1:52 am on Nov 26, 2011 (gmt 0)

dstiles, yes, that's how a lot of them work and I can only imagine that the same spammers own multiple IP spans and rotate them accordingly based on what they're doing.

- John

keyplyr




msg:4391192
 2:26 am on Nov 26, 2011 (gmt 0)

I can only imagine that the same spammers own multiple IP spans and...

What "spammers?"

dstiles




msg:4391358
 9:59 pm on Nov 26, 2011 (gmt 0)

JAB - have to agree with keyplr - haven't seen yandex spamming in any reasonable form of the concept.

dstiles




msg:4391809
 9:30 pm on Nov 28, 2011 (gmt 0)

I have now run a DNS check against a yandex DNS server and grep'd for "spider" with the following results. The list is roughly similar to my previous list but with about two dozen old ones no longer included and about the same moved to new ranges. A couple of those I removed I put back a few hours later: one had a DNS of "sticker...", another of "stest..." (can't recall what "..." is off-hand).

Apart from the bot list below I previously discovered a short range used for web site verification (equiv of webmaster tools) at 95.108.234.34 - 95.108.234.38 - one of my clients' registered a site with them.

Also in my list as "kill" is a set labelled "piano..." at 95.108.151.5 - 95.108.151.6 which appear to be media bots (yandexmedia): worth noting but in my case accorded Kill status.

There is a bot at 95.108.151.244 named "sticker001..." that seems to crop up with "mirrordetector" in the UA. I have a note against it that it ignores robots.txt and hits my forms as a result, so it's currently blocked (about the only one that does ignore robots.txt in my experience).

Another bot I kill is named "imparser" at 95.108.158.230 - 95.108.158.245 - can't recall what it does.

I have noted a few img-spider IPs. I do not permit image bots on most of my sites but they are about the only ones I inhibit.

Not every DNS entry was in the format "spider - IP". There were a few odd ones given as "spider lang", "spider - nnn fb" and "turbospider"; I have included them in the list below without comment (I can email the expanded list if required).

Odd that the US range is mostly img-spider...

I do not claim the list below is complete: my DNS scan may have missed a few. I ran the scan only on likely IP ranges and there may be a few others skulking around.

77.88.11.88
77.88.22.224
77.88.24.25 - 77.88.24.28
77.88.25.26 - 77.88.25.28
77.88.26.25 - 77.88.26.27
77.88.27.25 - 77.88.27.27
77.88.28.246 - 77.88.28.248
77.88.29.246 - 77.88.29.248
77.88.30.246 - 77.88.30.248
77.88.31.246 - 77.88.31.248
77.88.42.25 - 77.88.42.27
77.88.43.25 - 77.88.43.27
87.250.252.240 - 87.250.252.242
87.250.253.241 - 87.250.253.243
87.250.254.241 - 87.250.254.243
87.250.255.241 - 87.250.255.243
93.158.144.27 - 93.158.144.28
93.158.145.27 - 93.158.145.28
93.158.148.30 - 93.158.148.31
93.158.149.31 - 93.158.149.32
93.158.150.20 - 93.158.150.21
93.158.151.24 - 93.158.151.25
93.158.162.224
93.158.164.25 - 93.158.164.28
93.158.165.26 - 93.158.165.28
93.158.166.25 - 93.158.166.27
93.158.167.25 - 93.158.167.27
93.158.168.246 - 93.158.168.248
93.158.169.246 - 93.158.169.248
93.158.170.246 - 93.158.170.248
93.158.171.246 - 93.158.171.248
93.158.172.25 - 93.158.172.27
93.158.173.25 - 93.158.173.27
93.158.178.10 - 93.158.178.11
93.158.178.251
93.158.180.30 - 93.158.180.31
93.158.181.31 - 93.158.181.32
93.158.182.20 - 93.158.182.21
93.158.183.24 - 93.158.183.25
93.158.186.27 - 93.158.186.28
93.158.187.27 - 93.158.187.28
93.158.188.240 - 93.158.188.242
93.158.189.241 - 93.158.189.243
93.158.190.241 - 93.158.190.243
93.158.191.241 - 93.158.191.243
95.108.128.240 - 95.108.128.242
95.108.154.251 - 95.108.154.252
95.108.155.251 - 95.108.155.252
95.108.156.251
95.108.157.251 - 95.108.157.252
95.108.158.133 - 95.108.158.134
95.108.184.251 - 95.108.184.252
95.108.185.251 - 95.108.185.252
95.108.191.251 - 95.108.191.253
95.108.202.251 - 95.108.202.252
95.108.203.251 - 95.108.203.252
95.108.206.240 - 95.108.206.242
95.108.210.251
95.108.211.251 - 95.108.211.252
95.108.216.251 - 95.108.216.252
95.108.217.251 - 95.108.217.252
95.108.240.250 - 95.108.240.252
95.108.241.250 - 95.108.241.252
95.108.244.251 - 95.108.244.253
95.108.245.251 - 95.108.245.253
95.108.246.252 - 95.108.246.253
95.108.247.251 - 95.108.247.253
95.108.248.29 - 95.108.248.30
95.108.249.29 - 95.108.249.30
178.154.148.250 - 178.154.148.251
178.154.149.250 - 178.154.149.251
178.154.160.29 - 178.154.160.30
178.154.161.29
178.154.162.29
178.154.163.29 - 178.154.163.30
178.154.164.250 - 178.154.164.251
178.154.165.250 - 178.154.165.251
178.154.172.29 - 178.154.172.30
178.154.173.29 - 178.154.173.30
178.154.174.251 - 178.154.174.252
178.154.175.251 - 178.154.175.252
178.154.178.248 - 178.154.178.249
178.154.178.251
178.154.179.248 - 178.154.179.250
178.154.180.250 - 178.154.180.251
178.154.181.250 - 178.154.181.251
178.154.184.250 - 178.154.184.251
178.154.185.250 - 178.154.185.251
178.154.187.4
178.154.202.250 - 178.154.202.251
178.154.203.250 - 178.154.203.251
178.154.204.250 - 178.154.204.251
178.154.205.250 - 178.154.205.251
178.154.206.250 - 178.154.206.251
178.154.207.250 - 178.154.207.251
178.154.209.4
178.154.210.250 - 178.154.210.252
178.154.211.250 - 178.154.211.252
178.154.243.90 - 178.154.243.119
178.154.254.139 - 178.154.254.150
199.36.240.1
199.36.240.5
199.36.240.17 - 199.36.240.29 (img-spider)
213.180.209.10 - 213.180.209.11
213.180.209.251

keyplyr




msg:4391825
 10:18 pm on Nov 28, 2011 (gmt 0)

@dstiles As I said above, while those ranges may be tagged "spider" they've almost all been discontinued for YandexBot; not to say they won't be used again in the future. Yandex had a rough start, but for the last year or so it appears to me they've got their act together.

Lucy's comment about only one IP for HTML and one IP for Images got me to review logs for the last 6 months, and I also found only two active IP tanges:

77.88.42.0 - 77.88.42.255 (HTML)
95.108.158.128 - 95.108.158.255 (Images)*

*Note: I didn't have the images IP range included in the 8 ranges listed earlier because I only recently started allowing images to be crawled and never saw these requests.

JAB Creations




msg:4391959
 10:10 am on Nov 29, 2011 (gmt 0)

I wasn't suggesting Yandex was spamming at all, I was referring to spammers who have blocks of four IP addresses.

- John

keyplyr




msg:4391965
 10:44 am on Nov 29, 2011 (gmt 0)

...the IP's are in a block associated with some undesirables

I guess I missed this part; so those were the "spammers" you were referring to :)

I think most of us are pretty specific about blocking. Sometimes ya need to be surgical.

keyplyr




msg:4391966
 11:09 am on Nov 29, 2011 (gmt 0)

And one more:

Mozilla/5.0 (compatible; YandexAntivirus/2.0; +http://yandex.com/bots)

178.154.233.0 - 178.154.233.255
(mentioned earlier)

dstiles




msg:4392258
 11:26 pm on Nov 29, 2011 (gmt 0)

keyplr - I see a few different IPs to you, I think. For example, 95.108.240.250 for me is a standard bot, not images, and 93.158.147.8 arrived only a few minutes later but to a different domain. I had 93.158.149.31 a while before that (one hit only) followed by 95.108.155.252. All in under 2 hours.

As far as I can tell, at least many of the IP ranges I listed are still in use. I wonder if they are deployed according to geo-location and even to specific IP (I run a block of 16 IPs).

I do not have 178.154.233.0/24 in the bots list - no rDNS entry for spider. If it's only for AV then I'm not interested.

keyplyr




msg:4392275
 12:28 am on Nov 30, 2011 (gmt 0)

Very possible IPs are geo-specific. My point was it is not necessary to allow/block all those dozens of Yandex ranges. I whitelisted 8, now 3. Works for my purposes.



Just a FYI:

In addition to HTML 77.88.42.0 - 77.88.42.255 is also use for Yandex.Translate

UA: Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0 YB/6.1.0 Yandex.Translate

dstiles




msg:4392705
 10:25 pm on Nov 30, 2011 (gmt 0)

Haven't yet come across yandex translate. I'll watch out for it. Thanks.

I think my point was: you may only see 3 now but next month you may see 4? :)

keyplyr




msg:4393431
 1:58 pm on Dec 2, 2011 (gmt 0)

And one more:

Mozilla/5.0 (compatible; YandexAntivirus/2.0; +http://yandex.com/bots)

93.158.146.0 - 93.158.146.255
(mentioned earlier)

dstiles




msg:4393640
 11:33 pm on Dec 2, 2011 (gmt 0)

Neither of your ranges are in my list because their rDNS is not for bots, but I'll look further at those when I have time.

keyplyr




msg:4393648
 12:08 am on Dec 3, 2011 (gmt 0)

I'm getting about 20 to 50 (real AFAIK) Russian visitors per day from Yandex; one even bought some stuff last month ;) I could care less if the crawl IP resolves to a range tagged as crawl. I've never cared about that; only that the IP is registered to the company.

Block the non-crawl ranges and who looses?

incrediBILL




msg:4393788
 11:21 am on Dec 3, 2011 (gmt 0)

The real Yandexbot, from my observations, does seem to behave and support full trip rDNS validation such as "spider-77-88-31-248.yandex.com".

I've been letting it crawl for quite some time now and I'm even picking up traffic from Yandex, not a lot, but it seems to be legit and traffic is marginally increasing over time.

I think it's worth a shot, mainly because it's easy to rank well like the old Google ;)

FYI, I'm not getting Russian visitors, EU and US mostly.

dstiles




msg:4397838
 3:57 pm on Dec 14, 2011 (gmt 0)

Hit from new yandex USA range today. Following it up...

Yandex range: 199.21.96.0 - 199.21.99.255
YandexBot range (spider*): 199.21.99.65 - 199.21.99.125

This 46 message thread spans 2 pages: 46 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved