homepage Welcome to WebmasterWorld Guest from 54.196.62.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 46 message thread spans 2 pages: < < 46 ( 1 [2]     
Yandex.ru now running yandex.com spider
Wanted to let others know they may want to update their black/white list.
JAB Creations




msg:4390155
 7:09 am on Nov 23, 2011 (gmt 0)

I noticed a higher than average number of rejects and investigated my glorious reject log for my site to find that a bot running from yandex.com was being denied. Well I did a little research (and even found Brett posting about yandex.com though not specifically the associated bot). It looks legit as far as I can tell though someone mentioned that the IP's are in a block associated with some undesirables. The main point is that you may or may not want to update your black/white lists accordingly. I would not mind hearing about the associated undesirables here though.

- John

 

lucy24




msg:4397987
 9:25 pm on Dec 14, 2011 (gmt 0)

I'm not getting Russian visitors, EU and US mostly.

Well, they may be immigrants or vacationers sticking with what's familiar ;) I haven't bothered to check where mine come from, but the queries are always in Russian.

keyplyr




msg:4398025
 11:25 pm on Dec 14, 2011 (gmt 0)

I'm not getting Russian visitors, EU and US mostly.


Well, they may be immigrants or vacationers sticking with what's familiar ;) I haven't bothered to check where mine come from, but the queries are always in Russian.

All mine are wearing furry hats.

incrediBILL




msg:4398026
 11:41 pm on Dec 14, 2011 (gmt 0)

I'm now seeing both .ru and .com bots, too lazy, er busy, to update whitelist yet ;)

dstiles




msg:4398431
 10:25 pm on Dec 15, 2011 (gmt 0)

> All mine are wearing furry hats.

Ah. Northern part of USA and Canada, then. :)

lucy24




msg:4406129
 8:35 am on Jan 12, 2012 (gmt 0)

Ooh, I've passed the Minimum Size Threshold. As of January 1, I too have started getting visits from the YandexBot at its US address, 199.21.99.nn. I noticed it while

:: insert boilerplate here ::

checked back and it really did start precisely on January 1. Well, maybe it was already the 2nd in their time zone; I'm on the west coast.

They are unequivocally the same robot. In addition to the identical UA and same behavior, the US one is drawing 304's from pages that it has never visited before from that IP.

At some point when I wasn't looking, the YandexBot started using what I'd always thought of as the imagebot's IP at 95.108. This has bumped the imagebot over to 178.154.243.83, an address I don't remember seeing before. But I must not have been paying attention; Yandex paid a couple of visits from 178.154 way back in May (thank you, Spotlight) and started using it sporadically in November.

What ever will they think of next? :)

Pfui




msg:4412671
 2:25 pm on Jan 31, 2012 (gmt 0)

FWIW: The second of two Yandex bot 'sessions' in six hours from two Hosts totally ignored a total Disallow in robots.txt:

spider-199-21-99-95.yandex.com
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
19:46:47 /robots.txt [200]

sticker03.yandex.ru [93.158.147.8]
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
01:36:21 /robots.txt [200]
01:36:21 /robots.txt [200]
01:36:21 / [403]
01:36:22 / [403]
01:36:22 / [403]
01:36:22 / [403]

lucy24




msg:4412848
 9:18 pm on Jan 31, 2012 (gmt 0)

Bummer. I've found them very well behaved ever since I let them back in a few months ago.

In fact, as long as we're here, I had a "D'oh" moment recently.

-- posts in assorted forums complaining about the ever-increasing clutter in g### SERPs
-- discussion of Yandex

1 + 1 =

Yup. An absolutely clean SERP. Nothing but results, as far as the eye can see. No big suspicious white spaces implying that my Ad Blocker is doing its stuff.

mslina2002




msg:4428102
 12:54 pm on Mar 12, 2012 (gmt 0)

Saw this one yesterday.

Didn't bother to look at robots.txt

From Palo Alto, CA:

100.43.83.136
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

dstiles




msg:4428284
 9:40 pm on Mar 12, 2012 (gmt 0)

Thanks for the heads-up, mslina.

For a reverse lookup within the range 100.43.64.0 - 100.43.95.255 for the word "spider" the bot range is currently 100.43.83.129 - 100.43.83.161

erlandc




msg:4463026
 4:55 pm on Jun 8, 2012 (gmt 0)

Yandex did this (I'm no expert) today.
199.21.99.91 - - [08/Jun/2012:03:11:22 -0700] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:03:11:22 -0700] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:05:41:46 -0700] "GET / HTTP/1.1" 200 26386 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:07:54:44 -0700] "GET / HTTP/1.1" 200 26386 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

Do my logs tell me that Yandex ignored my robots.txt?

thanks

g1smd




msg:4463051
 5:39 pm on Jun 8, 2012 (gmt 0)

That depends on what is in your robots.txt file.


Yandex always uses the exact same two IPs: one for the regular bot, one for Yandex Images.

For me, one in
178.154.243.nnn and one in 77.88.30.nnn
Yes, quite consistent for
YandexBot/3.0
erlandc




msg:4463072
 6:35 pm on Jun 8, 2012 (gmt 0)

whoops! I forgot I had just rebuilt my site & accidentally deleted my robots.txt file! I've now denied their robot, so I'll wait & see if they obey.

Thanks for the reminder g1smd!

dstiles




msg:4463137
 9:21 pm on Jun 8, 2012 (gmt 0)

I allow yandex - have done for some time - but their bot does seem to have one bug: it ignores some folder exclusions in robots.txt IF it finds, within one of the site's pages, a link to a file that lives there.

At least, that seems to be the case here.

Otherwise it's a good bot; better than some I could mention for SEs of much higher prominence. :(

erlandc




msg:4463210
 1:10 am on Jun 9, 2012 (gmt 0)

Thanks for the note dstiles, I'll keep an eye on their behaviour. Don't really like a pounding by some bots. Cheers!

Sapo




msg:4486497
 2:51 pm on Aug 20, 2012 (gmt 0)

I block them by UA.

Today there was a new one from a known (by me at least) bad network.

184.82.128.0/18 Scranton NOC. I have never seen any legitimate traffic from that nest of evil. ;)

keyplyr




msg:4486621
 11:58 pm on Aug 20, 2012 (gmt 0)



For the first time I'm seeing triple digit daily human traffic coming from Yandex SERP. A few of these users have German IPs, so it's just not Russian users who use their SE.

This 46 message thread spans 2 pages: < < 46 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved