homepage Welcome to WebmasterWorld Guest from 50.17.162.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Mailru-net
stopped by snagging images only.
Bewenched




msg:4520953
 3:11 am on Nov 21, 2012 (gmt 0)

netnum: 217.69.128.0 - 217.69.135.255
netname: MAILRU-NET

definitely not an email referral

 

dstiles




msg:4521258
 9:20 pm on Nov 21, 2012 (gmt 0)

I have a note on the range 217.69.128.0 - 217.69.143.255 (which is blocked): "may include proxies".

lucy24




msg:4521291
 10:43 pm on Nov 21, 2012 (gmt 0)

I kinda think they're a legitimate search engine-- for a given definition of "legitimate" at least. But I finally got tired and blocked them just the same. Same images over and over again. Tiny little ones of no use to anybody. Yawn. Maybe it's a common search term that brings up the same set of hotlinks in a package each time.

keyplyr




msg:4521398
 3:31 am on Nov 22, 2012 (gmt 0)

I kinda think they're a legitimate search engine

Absolutely. Mail.ru is the largest portal/SE in Russia. Mail.ru is similar to Yahoo where Yandex is similar to Google in Eastern Europe.

That's not to say they do not engage in "iffy" behaviors (by our standards anyway) but they are a legit organization and an important player.

lucy24




msg:4521440
 6:54 am on Nov 22, 2012 (gmt 0)

That's reassuring, since I had nothing to go by except gut feeling. Well, and their site looks exactly like Yahoo or any of those other ISP mail sites.

Maybe if I give them a month or so they'll lose their morbid appetite for that particular fistful of pictures and go for something else. Yandex tends to bring up pictures of rats. (To the point where I can recognize the word in Cyrillic ;))

keyplyr




msg:4521522
 10:23 am on Nov 22, 2012 (gmt 0)

Having said that, my records show I put a block in place to stop them from scraping image files over a year ago :)

217.69.128.0 - 217.69.135.255
217.69.128.0/21

I do let them crawl, just not retrieve image files.

I let many SEs take my images for their image search *if* they create a thumbnail that links to my image, where by connecting to my server, I have a script that pulls the user to the parent screen, my web page = = traffic!

A few of the 2nd & 3rd level SEs just steal my images without linking to my site, so I block those since I don't gain anything from them.

lucy24




msg:4521676
 9:29 pm on Nov 22, 2012 (gmt 0)

They used one of those tripartite systems with me. Seems to be popular with ex-soviet robots in general; in the mail.ru version you get five sets of (each set for a different image, but always the same UA-and-referer pattern)

217.69.135.91 - - [19/Nov/2012:05:53:31 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)"
217.69.135.91 - - [19/Nov/2012:05:53:32 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)"
217.69.135.91 - - [19/Nov/2012:05:53:34 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "http://go.mail.ru/search_images" "Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120403211507 Firefox/12.0"

where the requested images are barely bigger (2-3K) than the 403. Many are almost literally thumbnail-sized.

Matter of fact, I could exclude them from /games/images/ alone and it would have pretty much the same effect. Why on earth would someone want the "Made with FutureBasic" logo from 1997?

Oddly I've got them down as /20, not /21. But the actual crawling is from a still narrower range. Probably something like ..132.0/22.

dstiles




msg:4521679
 9:48 pm on Nov 22, 2012 (gmt 0)

I have just "allowed" the mail.ru bot to see what happens (can't be worse than G, can it?).

As far as I can tell there is only one IP range for mail.ru (if anyone has others I'd be interested)...

217.69.128.0 - 217.69.143.255

Bots, according to a DNS scan and grep for "spider" and "fetcher"...

217.69.133.67 - 217.69.133.70
217.69.134.53 - 217.69.134.56
217.69.134.79 - 217.69.134.79
217.69.134.113 - 217.69.134.113
217.69.134.165 - 217.69.134.179
217.69.135.91 - 217.69.135.91
217.69.136.29 - 217.69.136.32

Bot UA is...

Mozilla/5.0 (compatible; Mail.RU_Bot/2.0)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved