homepage Welcome to WebmasterWorld Guest from 23.20.220.79
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Multiple User Agents, same IP
Is this a bot?
grandma genie




msg:4451058
 9:40 pm on May 8, 2012 (gmt 0)

This visitor has been blocked for previous unacceptable behavior, but it's back again. This is just for your info. Note the many user agents and no referer:

50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; HbTools 4.8.0)"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20050924 Epiphany/1.4.4 (Ubuntu)"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0 )"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; HbTools 4.8.0)"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
50.58.nnn.nn - - "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0 )"

The IP is from TW Telecom Holdings, Inc. out of Littleton, CO. Just passing the info along for anyone who has had this IP sniffing around their site.

 

btherl




msg:4451087
 11:42 pm on May 8, 2012 (gmt 0)

I blocked that one this morning. It hit 1,994 different domains in a 24 hour period and cycled through the same user agents you have there.

keyplyr




msg:4451096
 1:28 am on May 9, 2012 (gmt 0)

It's either a bot or some retriever script that generates a different UA each request. There are many similar utility scripts that do this, difficult to say which is being used here. Best block by IP range.

wilderness




msg:4451102
 1:56 am on May 9, 2012 (gmt 0)

ggenie,
all you need to obscure is Class D (last group).

If you'd provided the Class C, the IP would have been more helpful for reference.
EX:
50.58.123.nn

I'm pretty sure that Littleton, CO is just the headquarters address for Time Warner, rather than the actual GEO-locale.

I've had some TW IP's that have been real pests, however the TW users are so vast that it requires careful analysis.

Here's one old thread [webmasterworld.com] with different Class A.

I have a large text file of TW locals from 2005, although they hadn't acquired the Class A 50 at that time, or at least it didn't appear in the results I've.

I had a home page request in 2008:
209.163.169.z
TW is still the backbone, however the sub-net and sub-sub-net orgs have different names today.

Otherwise, I've very little saved references to TW.

lucy24




msg:4451108
 2:19 am on May 9, 2012 (gmt 0)

In any case I'm glad you posted, because it made me realize there's a block of IPs I thought I had accounted for but in fact they were missing. Now duly filled in.

Next step: ask Spotlight for 50.58. and see what turns up. Answer: not much-- but that little is quite recent.

50.58.197.206 - - [03/Apr/2012:04:53:04 -0700] "GET / HTTP/1.1" 301 506 "-" "Java/1.6.0_27"
50.58.197.206 - - [03/Apr/2012:04:53:04 -0700] "GET / HTTP/1.1" 403 1529 "-" "Java/1.6.0_27"

I never even knew they existed, because they were blocked by UA. Same for identical visits on 30 and 31 March.

OK, so is that tw telecom based in Littleton CO, or is it Confluence Networks based in Austin TX? And where's Lone Tree anyway?

Now here's a different and more interesting robot. Note the slash at the end of both the request and the auto-referer.

50.58.99.8 - - [10/Nov/2011:21:35:05 -0800] "GET /fun/AlonzoMelissa.html/ HTTP/1.0" 404 2221 "http://www.example.com/fun/AlonzoMelissa.html/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 (.NET CLR 3.5.30729)"

--and (sans slash this time around)--

50.58.99.8 - - [10/Nov/2011:16:51:19 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.0" 200 623197 "http://www.example.com/fun/AlonzoMelissa.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3"

Psst! Fake-UA-generator! I don't think Firefox would ever have a NET CLR statement at all, let alone in that location ;)

... and that's why I moved the real Alonzo and Melissa, leaving a smaller file with the same name for the robots to feast on.

wilderness




msg:4451122
 2:35 am on May 9, 2012 (gmt 0)

50.58.197.206 - - [03/Apr/2012:04:53:04 -0700] "GET / HTTP/1.1" 301 506 "-" "Java/1.6.0_27"
Austin TX

50.58.99.8-Columbus, OH

Time Warner is a heavy-duty provider to commercial companies.

grandma genie




msg:4451143
 3:33 am on May 9, 2012 (gmt 0)

This is the one that visited my site: 50.58.197.nn
That nn varied with these numbers: 31, 32, 33, 34, 38, 41, 43, 44, 48, 49, 50, 56, 65, 67, 70, 74. And here is a list of the very entertaining user agents:
Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; HbTools 4.8.0)
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20050924 Epiphany/1.4.4 (Ubuntu)
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0 )
Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; HbTools 4.8.0)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0 )
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser; GTB6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20050614 Firefox/0.9.0+
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser; GTB6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/125.4 (KHTML, like Gecko, Safari) OmniWeb/v563.15
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20050924 Epiphany/1.4.4 (Ubuntu)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8
Mozilla/4.0 (compatible; MSIE 5.15; Mac_PowerPC)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
Mozilla/5.0 (X11; U; Linux; i686; en-US; rv:1.6) Gecko Debian/1.6-7
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20050614 Firefox/0.9.0+
Mozilla/5.0 (X11; U; Linux; i686; en-US; rv:1.6) Gecko Epiphany/1.2.5
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20050924 Epiphany/1.4.4 (Ubuntu)
Mozilla/4.0 (compatible; MSIE 7.0; America Online Browser 1.1; rev1.5; Windows NT 5.1; .NET CLR 1.1.4322)
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0 )

And on the same day I got this: notice these all have the same user agent but different IPs:
85.95.187.nnn - - "GET /example HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
190.98.127.n - - "GET /example HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
190.98.127.n - - "GET /example HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
46.252.160.nn - - "GET /example HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
46.252.160.nn - - "GET /example HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"

Never sure if these are different visitors or the same botnet just trying different methods. Kind of reminds me of the Sentinels in The Matrix. They just never give up. Where is Neo when you need him?

grandma genie




msg:4451154
 3:50 am on May 9, 2012 (gmt 0)

By the way, this was in today's logs, too. Is this an indication of a cron job?

69.30.243.nnn - - "HEAD / HTTP/1.1" 404 - "-" "curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6"

Could a cron run a program that would allow for this type of activity?

keyplyr




msg:4451158
 4:12 am on May 9, 2012 (gmt 0)

Just a linux machine retrieving documents using a libcurl program.

A cron is just a bit of code that executes a command. Nothing in that UA string that would indicate that, although there could be a cron employed somewhere in that process.

btherl




msg:4451160
 4:14 am on May 9, 2012 (gmt 0)

That looks like a cron job to me. I have blocked all curl UAs because curl is a scripting library, not part of any web browser.

Edit: keyplyr is being a bit more precise there - it's a script, not necessarily a cron job. But in either case it's not a web browser, it's some kind of script which probably doesn't display the page to a real person.

lucy24




msg:4451166
 5:41 am on May 9, 2012 (gmt 0)

Safari/125.8

Wow. They've jumped on the Version Update Of The Week bandwagon big time, haven't they :)

:: detour to htaccess ::

Yup, there is it.

BrowserMatch libcurl keep_out
grandma genie




msg:4451412
 5:05 pm on May 9, 2012 (gmt 0)

This one was also sniffing around last month:

150.241.250.n HTTP/1.1" 404 - "-" "curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6"

This IP is from Spain - Laboratorios LABEIN

It always got the 404 not found server error. Then it went away and this one showed up, just once:

69.30.243.nnn - - "HEAD / HTTP/1.1" 404 - "-" "curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6"

That IP resolves to mx4.lintpm.com. They also got the 404 not found.

My developer set up a cron job for me in order to submit my sitemaps to the search engines and I noticed in my logs that cron had a similar user agent as the ones shown above. But I haven't seen my log entry for my cron for awhile. Does that mean it is no longer working? How can I block those two above if I have a cron running with a similar user agent? Do I just find a portion of the user agent that is not the same as mine and block that?

I'm not that familiar with crons. If I have one running, should it be showing up in my own server logs? Also, are we getting off topic? Which forum would be appropriate for discussing crons and how they work? And how to fix them if they don't.

dstiles




msg:4451506
 7:53 pm on May 9, 2012 (gmt 0)

I block all TW Telecom ranges. Never seen anything good come from them (this is in the UK).

keyplyr




msg:4451544
 9:47 pm on May 9, 2012 (gmt 0)


Where I'm at, Time Warner is one of the major cable internet ISPs. Blindly blocking TW ranges could result in blocking many regular users.

I do see they have some biz accounts and dedi server products, but without knowing precisely which ranges, I'm not taking a chance.

wilderness




msg:4451561
 11:01 pm on May 9, 2012 (gmt 0)

Where I'm at, Time Warner is one of the major cable internet ISPs. Blindly blocking TW ranges could result in blocking many regular users.


keyplr,
Yesterday I opened up some long denied Road Runner ranges for the same reason. I'll just have to watch closely.

lucy24




msg:4451571
 11:29 pm on May 9, 2012 (gmt 0)

If I have one running, should it be showing up in my own server logs?

If it doesn't, you can constrain the block to THE_REQUEST. That's a generic answer, not a specific one.

dstiles




msg:4451966
 8:49 pm on May 10, 2012 (gmt 0)

keyplr - either only server farms have been badly behaved (and hence blocked) on my server or the dsl service does not impact my server. I have one client who gets complaints if his US (and other) customers can't access his site, even if they block themselves through stupidity.

I must admit I haven't checked all ranges I've blocked but those I have (the majority) have shown server-like DNS entries. After they became a headache I began blocking ad hoc whenever an IP misbehaved.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved