Msg#: 4520951 posted 10:43 pm on Nov 21, 2012 (gmt 0)
I kinda think they're a legitimate search engine-- for a given definition of "legitimate" at least. But I finally got tired and blocked them just the same. Same images over and over again. Tiny little ones of no use to anybody. Yawn. Maybe it's a common search term that brings up the same set of hotlinks in a package each time.
Msg#: 4520951 posted 6:54 am on Nov 22, 2012 (gmt 0)
That's reassuring, since I had nothing to go by except gut feeling. Well, and their site looks exactly like Yahoo or any of those other ISP mail sites.
Maybe if I give them a month or so they'll lose their morbid appetite for that particular fistful of pictures and go for something else. Yandex tends to bring up pictures of rats. (To the point where I can recognize the word in Cyrillic ;))
Msg#: 4520951 posted 10:23 am on Nov 22, 2012 (gmt 0)
Having said that, my records show I put a block in place to stop them from scraping image files over a year ago :)
126.96.36.199 - 188.8.131.52 184.108.40.206/21
I do let them crawl, just not retrieve image files.
I let many SEs take my images for their image search *if* they create a thumbnail that links to my image, where by connecting to my server, I have a script that pulls the user to the parent screen, my web page = = traffic!
A few of the 2nd & 3rd level SEs just steal my images without linking to my site, so I block those since I don't gain anything from them.
Msg#: 4520951 posted 9:29 pm on Nov 22, 2012 (gmt 0)
They used one of those tripartite systems with me. Seems to be popular with ex-soviet robots in general; in the mail.ru version you get five sets of (each set for a different image, but always the same UA-and-referer pattern)
220.127.116.11 - - [19/Nov/2012:05:53:31 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)" 18.104.22.168 - - [19/Nov/2012:05:53:32 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "-" "Mozilla/5.0 (compatible; Mail.RU/2.0c)" 22.214.171.124 - - [19/Nov/2012:05:53:34 -0800] "GET http://www.example.com/games/images/SultanPic.jpg HTTP/1.1" 403 1442 "http://go.mail.ru/search_images" "Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120403211507 Firefox/12.0" where the requested images are barely bigger (2-3K) than the 403. Many are almost literally thumbnail-sized.
Matter of fact, I could exclude them from /games/images/ alone and it would have pretty much the same effect. Why on earth would someone want the "Made with FutureBasic" logo from 1997?
Oddly I've got them down as /20, not /21. But the actual crawling is from a still narrower range. Probably something like ..132.0/22.