homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Yandex gets fatter
lucy24




msg:4517888
 2:05 am on Nov 11, 2012 (gmt 0)

37.140.128.0/18
Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)

Like the other yandexbots I've met, they pick an IP and stick with it. Here it was 37.140.141.13.

I've yet to see them on my own site, but a rare visit to art studio's logs

:: insert boilerplate about finding something while looking for something else (in this case iPad misbehavior, see nearby post) ::

found them all over the place, alternating with the familiar 199.21.64.0/20 and a handful of 95.108.128.0/17.

They didn't go anywhere they weren't supposed to.

 

dstiles




msg:4518119
 9:54 pm on Nov 11, 2012 (gmt 0)

Thanks for the new range, Lucy. Just ran a DNS check on the full range and the word "spider" occurs only in the range 37.140.141.1 - 37.140.141.36

Of these, 37.140.141.1 - 37.140.141.12 are basic "spider"; the others are "image-spider".

TypicalSurfer




msg:4518121
 10:11 pm on Nov 11, 2012 (gmt 0)

Yandex has a nice index, it kinda reminds me of a search engine. I have seen a few referrals and noticed that dogpile is tossing them into their meta results. They do crawl a lot but I'll cut them slack.

keyplyr




msg:4518133
 10:49 pm on Nov 11, 2012 (gmt 0)

I get a fair amount of traffic from Yandex. Their bot behaves and I have never had a problem.

Yandex has a nice index, it kinda reminds me of a search engine.

Probably because they are.

lucy24




msg:4540812
 5:04 am on Jan 31, 2013 (gmt 0)

:: bump ::

Here's another one:

37.9.64.0/18
possibly constrained to
37.9.84.0/22 (37.9.84-87)

Only met them twice so far.
exact IP: 37.9.84.253
full UA: Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/ bots)

Along with the favicon (first visit only), they picked up the front page both times.

No relation to 37.9.0.0/18 (main activity in the 37.9.50's, I think) which has been mentioned in a few threads hereabouts. But while cross-checking the UA I found this Yandex page [help.yandex.com] (in English) which I don't remember seeing before.

Yandex has a nice index, it kinda reminds me of a search engine.
Probably because they are.

I thought he was being satirical ;)

keyplyr




msg:4540835
 8:20 am on Jan 31, 2013 (gmt 0)


There are many IP addresses that Yandex robots can originate from, and these IP addresses are subject to change. We are therefore unable to offer a list of IP addresses and we do not recommend using a filter based on IP addresses.

This is interesting.

lucy24




msg:4540840
 8:34 am on Jan 31, 2013 (gmt 0)

At least so far the IPs do seem to belong to Yandex. It isn't as hopeless as, say, the MJ12bot which can come from absolutely anywhere-- including a good many blocked server farms. And, conversely, you don't see a lot of Yandexbot spoofers. Fake yandsearch* in referers, yes. Fake UAs, not so much.


* I did some quickie experimenting. The fake referer isn't any different from a real one. Except that currently mine all end in &lr=213. I don't know what "lr" is, only that it doesn't correspond to "cd".

bigtoga




msg:4540884
 11:57 am on Jan 31, 2013 (gmt 0)

dstiles: "Just ran a DNS check on the full range and the word "spider" occurs only in the range 37.140.141.1 - 37.140.141.36"

Could you share how you did that? I've asked that question here before and never really gotten a good response.

dstiles




msg:4541117
 10:31 pm on Jan 31, 2013 (gmt 0)

I use dig on linux, run in a bash shell script. I run the checks against one of the relevant DNS servers (eg yandex for yandex, ms for bingbot etc).

Slow but a) I'm in no hurry for the results and b) I worry that a fast DNS scan would get me blocked.

Not sure how to run this under windows but there is probably an equivalent to dig.

Leosghost




msg:4541127
 11:14 pm on Jan 31, 2013 (gmt 0)

Not sure how to run this under windows but there is probably an equivalent to dig.


I was going to suggest "sam spade" ..(was a "multi tools" for win ) ..and then went for a look ..and "sam" is gone :( .."where have all the..."

So ..going to pour myself a "leapfrog"..and make sure there are no kids on my lawn..

dstiles




msg:4541575
 8:30 pm on Feb 1, 2013 (gmt 0)

I used to use sam when I was using windows machines for everyday activities but gave up when various bits of it stopped working. At the time there was still a web version running but I haven't looked for several years. It's possible that NS-Batch might still be available and capable of doing this but I've only used that once (recently) for tracking a list of IPs and can't recall its full facilities.

lucy24




msg:4541597
 9:43 pm on Feb 1, 2013 (gmt 0)

At the time there was still a web version running

Heh. I didn't know there was anything but a www version. It must have closed up shop eons ago; I even took it off my bookmarks. Currently dot com times out and dot org gives you the "It Works!" header which I think is generated by testing something-or-other but, er, I can't remember what. Cursory whois'ing says that dot org is the one we want :(.

SevenCubed




msg:4541601
 9:57 pm on Feb 1, 2013 (gmt 0)

...the "It Works!" header...

It's the default message that an Apache server homepage displays once it's set up properly.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved