homepage Welcome to WebmasterWorld Guest from 54.166.113.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 46 message thread spans 2 pages: < < 46 ( 1 [2]     
Yandex.ru now running yandex.com spider
Wanted to let others know they may want to update their black/white list.
JAB Creations

WebmasterWorld Senior Member jab_creations us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4390153 posted 7:09 am on Nov 23, 2011 (gmt 0)

I noticed a higher than average number of rejects and investigated my glorious reject log for my site to find that a bot running from yandex.com was being denied. Well I did a little research (and even found Brett posting about yandex.com though not specifically the associated bot). It looks legit as far as I can tell though someone mentioned that the IP's are in a block associated with some undesirables. The main point is that you may or may not want to update your black/white lists accordingly. I would not mind hearing about the associated undesirables here though.

- John

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4390153 posted 9:25 pm on Dec 14, 2011 (gmt 0)

I'm not getting Russian visitors, EU and US mostly.

Well, they may be immigrants or vacationers sticking with what's familiar ;) I haven't bothered to check where mine come from, but the queries are always in Russian.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4390153 posted 11:25 pm on Dec 14, 2011 (gmt 0)

I'm not getting Russian visitors, EU and US mostly.


Well, they may be immigrants or vacationers sticking with what's familiar ;) I haven't bothered to check where mine come from, but the queries are always in Russian.

All mine are wearing furry hats.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4390153 posted 11:41 pm on Dec 14, 2011 (gmt 0)

I'm now seeing both .ru and .com bots, too lazy, er busy, to update whitelist yet ;)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4390153 posted 10:25 pm on Dec 15, 2011 (gmt 0)

> All mine are wearing furry hats.

Ah. Northern part of USA and Canada, then. :)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4390153 posted 8:35 am on Jan 12, 2012 (gmt 0)

Ooh, I've passed the Minimum Size Threshold. As of January 1, I too have started getting visits from the YandexBot at its US address, 199.21.99.nn. I noticed it while

:: insert boilerplate here ::

checked back and it really did start precisely on January 1. Well, maybe it was already the 2nd in their time zone; I'm on the west coast.

They are unequivocally the same robot. In addition to the identical UA and same behavior, the US one is drawing 304's from pages that it has never visited before from that IP.

At some point when I wasn't looking, the YandexBot started using what I'd always thought of as the imagebot's IP at 95.108. This has bumped the imagebot over to 178.154.243.83, an address I don't remember seeing before. But I must not have been paying attention; Yandex paid a couple of visits from 178.154 way back in May (thank you, Spotlight) and started using it sporadically in November.

What ever will they think of next? :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4390153 posted 2:25 pm on Jan 31, 2012 (gmt 0)

FWIW: The second of two Yandex bot 'sessions' in six hours from two Hosts totally ignored a total Disallow in robots.txt:

spider-199-21-99-95.yandex.com
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
19:46:47 /robots.txt [200]

sticker03.yandex.ru [93.158.147.8]
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
01:36:21 /robots.txt [200]
01:36:21 /robots.txt [200]
01:36:21 / [403]
01:36:22 / [403]
01:36:22 / [403]
01:36:22 / [403]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4390153 posted 9:18 pm on Jan 31, 2012 (gmt 0)

Bummer. I've found them very well behaved ever since I let them back in a few months ago.

In fact, as long as we're here, I had a "D'oh" moment recently.

-- posts in assorted forums complaining about the ever-increasing clutter in g### SERPs
-- discussion of Yandex

1 + 1 =

Yup. An absolutely clean SERP. Nothing but results, as far as the eye can see. No big suspicious white spaces implying that my Ad Blocker is doing its stuff.

mslina2002

10+ Year Member



 
Msg#: 4390153 posted 12:54 pm on Mar 12, 2012 (gmt 0)

Saw this one yesterday.

Didn't bother to look at robots.txt

From Palo Alto, CA:

100.43.83.136
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4390153 posted 9:40 pm on Mar 12, 2012 (gmt 0)

Thanks for the heads-up, mslina.

For a reverse lookup within the range 100.43.64.0 - 100.43.95.255 for the word "spider" the bot range is currently 100.43.83.129 - 100.43.83.161

erlandc

10+ Year Member



 
Msg#: 4390153 posted 4:55 pm on Jun 8, 2012 (gmt 0)

Yandex did this (I'm no expert) today.
199.21.99.91 - - [08/Jun/2012:03:11:22 -0700] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:03:11:22 -0700] "GET /robots.txt HTTP/1.1" 200 26 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:05:41:46 -0700] "GET / HTTP/1.1" 200 26386 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.91 - - [08/Jun/2012:07:54:44 -0700] "GET / HTTP/1.1" 200 26386 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

Do my logs tell me that Yandex ignored my robots.txt?

thanks

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4390153 posted 5:39 pm on Jun 8, 2012 (gmt 0)

That depends on what is in your robots.txt file.


Yandex always uses the exact same two IPs: one for the regular bot, one for Yandex Images.

For me, one in
178.154.243.nnn and one in 77.88.30.nnn
Yes, quite consistent for
YandexBot/3.0
erlandc

10+ Year Member



 
Msg#: 4390153 posted 6:35 pm on Jun 8, 2012 (gmt 0)

whoops! I forgot I had just rebuilt my site & accidentally deleted my robots.txt file! I've now denied their robot, so I'll wait & see if they obey.

Thanks for the reminder g1smd!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4390153 posted 9:21 pm on Jun 8, 2012 (gmt 0)

I allow yandex - have done for some time - but their bot does seem to have one bug: it ignores some folder exclusions in robots.txt IF it finds, within one of the site's pages, a link to a file that lives there.

At least, that seems to be the case here.

Otherwise it's a good bot; better than some I could mention for SEs of much higher prominence. :(

erlandc

10+ Year Member



 
Msg#: 4390153 posted 1:10 am on Jun 9, 2012 (gmt 0)

Thanks for the note dstiles, I'll keep an eye on their behaviour. Don't really like a pounding by some bots. Cheers!

Sapo



 
Msg#: 4390153 posted 2:51 pm on Aug 20, 2012 (gmt 0)

I block them by UA.

Today there was a new one from a known (by me at least) bad network.

184.82.128.0/18 Scranton NOC. I have never seen any legitimate traffic from that nest of evil. ;)

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4390153 posted 11:58 pm on Aug 20, 2012 (gmt 0)



For the first time I'm seeing triple digit daily human traffic coming from Yandex SERP. A few of these users have German IPs, so it's just not Russian users who use their SE.

This 46 message thread spans 2 pages: < < 46 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved