| 9:20 pm on Nov 21, 2012 (gmt 0)|
I have a note on the range 220.127.116.11 - 18.104.22.168 (which is blocked): "may include proxies".
| 10:43 pm on Nov 21, 2012 (gmt 0)|
I kinda think they're a legitimate search engine-- for a given definition of "legitimate" at least. But I finally got tired and blocked them just the same. Same images over and over again. Tiny little ones of no use to anybody. Yawn. Maybe it's a common search term that brings up the same set of hotlinks in a package each time.
| 3:31 am on Nov 22, 2012 (gmt 0)|
|I kinda think they're a legitimate search engine |
Absolutely. Mail.ru is the largest portal/SE in Russia. Mail.ru is similar to Yahoo where Yandex is similar to Google in Eastern Europe.
That's not to say they do not engage in "iffy" behaviors (by our standards anyway) but they are a legit organization and an important player.
| 6:54 am on Nov 22, 2012 (gmt 0)|
That's reassuring, since I had nothing to go by except gut feeling. Well, and their site looks exactly like Yahoo or any of those other ISP mail sites.
Maybe if I give them a month or so they'll lose their morbid appetite for that particular fistful of pictures and go for something else. Yandex tends to bring up pictures of rats. (To the point where I can recognize the word in Cyrillic ;))
| 10:23 am on Nov 22, 2012 (gmt 0)|
Having said that, my records show I put a block in place to stop them from scraping image files over a year ago :)
22.214.171.124 - 126.96.36.199
I do let them crawl, just not retrieve image files.
I let many SEs take my images for their image search *if* they create a thumbnail that links to my image, where by connecting to my server, I have a script that pulls the user to the parent screen, my web page = = traffic!
A few of the 2nd & 3rd level SEs just steal my images without linking to my site, so I block those since I don't gain anything from them.
| 9:29 pm on Nov 22, 2012 (gmt 0)|
They used one of those tripartite systems with me. Seems to be popular with ex-soviet robots in general; in the mail.ru version you get five sets of (each set for a different image, but always the same UA-and-referer pattern)
188.8.131.52 - - [19/Nov/2012:05:53:31 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)"
184.108.40.206 - - [19/Nov/2012:05:53:32 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)"
220.127.116.11 - - [19/Nov/2012:05:53:34 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "http://go.mail.ru/search_images" "Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120403211507 Firefox/12.0"
where the requested images are barely bigger (2-3K) than the 403. Many are almost literally thumbnail-sized.
Matter of fact, I could exclude them from /games/images/ alone and it would have pretty much the same effect. Why on earth would someone want the "Made with FutureBasic" logo from 1997?
Oddly I've got them down as /20, not /21. But the actual crawling is from a still narrower range. Probably something like ..132.0/22.
| 9:48 pm on Nov 22, 2012 (gmt 0)|
I have just "allowed" the mail.ru bot to see what happens (can't be worse than G, can it?).
As far as I can tell there is only one IP range for mail.ru (if anyone has others I'd be interested)...
18.104.22.168 - 22.214.171.124
Bots, according to a DNS scan and grep for "spider" and "fetcher"...
126.96.36.199 - 188.8.131.52
184.108.40.206 - 220.127.116.11
18.104.22.168 - 22.214.171.124
126.96.36.199 - 188.8.131.52
184.108.40.206 - 220.127.116.11
18.104.22.168 - 22.214.171.124
126.96.36.199 - 188.8.131.52
Bot UA is...
Mozilla/5.0 (compatible; Mail.RU_Bot/2.0)