Forum Moderators: DixonJones

Message Too Old, No Replies

Is this a sneaky scraper?

changing UA

         

LeChuck

7:38 am on Nov 18, 2005 (gmt 0)

10+ Year Member



Hi, I'm getting hit by a changing useragent and I'm thinking it's a bot. Agree?


Host: ***.***.***.***

/oldlocation/page
Http Code: 301 Date: Nov 18 - Http Version: HTTP/1.1 Size in Bytes: 5
Referer: valid ref where I know I have a link
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)

/newlocation/page
Http Code: 200 Date: Nov 18 - Http Version: HTTP/1.1 Size in Bytes: #*$!x
Referer: -
Agent: Mozilla/3.01 (compatible;)

/style.css
Http Code: 200 Date: Nov 18 - Http Version: HTTP/1.1 Size in Bytes: #*$!x
Referer: [mysite...]
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)

/favicon.ico
Http Code: 200 Date: Nov 18 - Http Version: HTTP/1.1 Size in Bytes: #*$!
Referer: -
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)

Getting this behaviour from many different IP's every day, but never more than one page.

Seems the 301 confuses it.

I have seen the "Mozilla/3.01 (compatible;)" part be either version 3.01 or 4.0. Why would a browser claim to be older than it is?

The UA isn't Netscape most of the time either, it's some random version of IE.

Is this a smart scraper with a scraping botnet who has discovered he can't trust any links on my page, instead just jumping in, getting one page and heading off?

It hasn't fallen in the bot trap yet.

Paranoia?

Ocean10000

6:20 pm on Nov 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Usually when I see the following useragent
Mozilla/3.01 (compatible;)
Its usually a content filter downloading the page and checking the content for items that trip the filter and block the page. This usually means your website hasn't been rated by the content filter yet, so it does an initial look, and based on that will either let the user keep browsing or block it. And send a little notice back to the company that supplies the filter to rate the site.

This has been my experience in the past in corporate world.

LeChuck

11:48 pm on Nov 18, 2005 (gmt 0)

10+ Year Member



Thanks for the info. I guess this has a connection with my blocking of the websense bot (It's way too aggressive).