Forum Moderators: open

Message Too Old, No Replies

Scraper from Micro Synergy

         

keyplyr

3:18 am on Mar 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hit once a couple months ago, came again today. Scrapes 100s of html pages in several minutes, each time changing UA and using my site as referrer.

216.23.184.## - - [15/Mar/2008:20:42:34 -0400] "GET /example.html HTTP/1.0" 200 8636 "http://www.mydomain.com" "ANTFresco/x.xx"

216.23.184.## - - [15/Mar/2008:20:42:35 -0400] "GET /example.html HTTP/1.0" 403 918 "http://www.mydomain.com" "ICE Browser/5.05 (Java 1.4.0; Windows 2000 5.0 x86)"

216.23.184.## - - [15/Mar/2008:20:42:35 -0400] "GET /example.html HTTP/1.0" 200 6492 "http://www.mydomain.com" "Lotus-Notes/4.5 ( Windows-NT )"

216.23.184.## - - [15/Mar/2008:20:42:35 -0400] "GET /example.html HTTP/1.0" 200 9588 "http://www.mydomain.com" "Lotus-Notes/4.5 ( Windows-NT )"

216.23.184.## - - [15/Mar/2008:20:42:35 -0400] "GET /example.html HTTP/1.0" 200 9596 "http://www.mydomain.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; IOpener Release 1.1.04)"

incrediBILL

6:45 pm on Mar 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ANTFresco is supposedly a browser and not a bot, but just like any other user agent someone probably slapped it on some other tool.

Here's what I dug up:

ANT Fresco Browser

Market-leading HTML presentation platform for IPTV, hospitality and consumer electronics devices.

The UA variations I've seen are:

"Mozilla/3.04 (compatible; NCBrowser/2.35; ANTFresco/2.17; RISC OS-NC 5.13 Laz1UK1309)"
"Mozilla/3.04 (compatible; ANTFresco/2.13; RISC OS 4.02)"
"ANTFresco/x.xx"

Mostly I see it come from 80.225.x.x which appear to be Tiscali's UK ADSL or dial-up lines and the types of accesses on my server lead me to believe that the UA has been adopted by a bot but hard to say for sure.

keyplyr

12:07 am on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill, the point is that these UA are spoofed.

incrediBILL

5:57 am on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You could be right, probably are, as I would obviously have to see the log files to make a determination so I'll take your word for it since you've seen them.

However, not trying to be argumentative, but based on the IP where they are located they may not be spoofed IMO. If that's a DSL or T1 line supplied to an office then multiple UA's could actually be coming over a single IP via a business firewall router.

The UA of Lotus-Notes is what makes me think this because it's typically an in-house application running on an NT server, combined with ANTFresco which is a very obscure product as well. The ICE Browser is a very expensive Java development tool, also an enterprise application, so combine that with Lotus-Notes and it's an expensive scraper if it's a scraper, so you aren't dealing with kiddie scripts at a minimum.

Just trying to offer up another suggestion as to how that kind of activity could be happening on a single IP from a business server is all as I learned the hard way back when I was overly aggressive blocking some things that later turned out not to be as they seemed.

If it is a scraper, probably a data mining operation based on all the pieces of the puzzle so far.

keyplyr

8:51 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



While the log snippet is small, in actuality there were hundreds of hits from the same IP address where every hit was a different UA. All HTML requests. These were not different users from an intranet or dsl pool.

This guy is running a scraper tool that randomly changes displayed UAs.

Anyway, just a heads-up. Thanks for the alternative reasoning.

Bewenched

4:43 am on Apr 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



AMI Publishing, Inc. ICI-AMIPUBLISH-2 (NET-216-23-184-0-1)
216.23.184.0 - 216.23.184.63

hummm
Web Site Temporarily Unavailable for the company
-----
Offers access, hosting, and designs web sites aimed at target markets.