homepage Welcome to WebmasterWorld Guest from 54.166.113.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Is this a legitimate User Agent?
Mozilla/4.0
grandma genie



 
Msg#: 4231116 posted 6:12 pm on Nov 16, 2010 (gmt 0)

I find this in the logs almost every day, just with different IPs. The User Agent string changes back and forth. What is this doing and is it legitimate (a real human being not up to mischief.)

204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /image1.jpg HTTP/1.1" 200 6228 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image1.jpg HTTP/1.1" 200 5918 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /returnhome.jpg HTTP/1.1" 200 9375 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image2.JPG HTTP/1.1" 200 13705 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory2/image1.jpg HTTP/1.1" 200 2949 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image3.jpg HTTP/1.1" 200 15908 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image4.jpg HTTP/1.1" 200 25962 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image5.jpg HTTP/1.1" 200 22148 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image6.jpg HTTP/1.1" 200 19592 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image7.jpg HTTP/1.1" 200 36319 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /directory1/image8.gif HTTP/1.1" 200 37083 "-" "Mozilla/4.0 (compatible;)"
204.116.178.n - - [16/Nov/2010:09:54:35 -0500] "GET /favicon.ico HTTP/1.1" 200 19342 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1)"

See how the user agent went from "Mozilla/4.0 (compatible;)" to "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1)"

I see these types of things all the time in my logs. I've been ignoring them unless the IP is from someplace like China. But this visitor seems to be from South Carolina.

- Jeannie

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4231116 posted 6:52 pm on Nov 16, 2010 (gmt 0)

Jeannie,
There's many, many old threads on this.

It's simply the footprint of a bot, or browser that is cache your images. In this instance a local internet provider/server.

Generally speaking, the practice of cache is beneficial to your sites. Unless of course, it's NOT your desire to have pages and images cache'd.

If you find an IP that you feel abuses these requests?
Simply create a multi-conditional deny based upon both the UA and IP (or multiple IP's).
EX:

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0$
RewriteCond %{REMOTE_ADDR} ^123.456. [OR]
RewriteCond %{REMOTE_ADDR} ^456.789.
RewriteRule .* - [F]

grandma genie



 
Msg#: 4231116 posted 9:14 pm on Nov 17, 2010 (gmt 0)

Hi Wilderness,
Please pardon my ignorance. This forum is really for the more knowledgeable webmasters and I try to glean as much as I can. I try to scan my raw server logs every day. I do not see anyone taking my content and copying it, so I don't understand why so many visitors seem to be just grabbing so much of my site. They don't buy anything and they don't act like people. I am assuming that this visitor was actually a bot. Since I am on a hosted server, I can only view the raw server logs and some visitor data in the admin section. I do not like unknown bots indexing my site. Thank you for the suggestion for denying such requests in htaccess. That will be very helpful. Hope I'm not being too much of a pest.
- Jeannie

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4231116 posted 8:34 am on Nov 18, 2010 (gmt 0)

Grandma_genie, your posts show you still puzzle a lot over what you see in your logs, stuff that's still easily looked up/learned about, often common, and quickly handled if need be.

If you like log-watching, cool. So do I. Just remember: Stuff happens. Little stuff. Big stuff. All.The.Time. For example, even a small, A-OK server grinds out scads of visitor-related error_log lines a day, the majority of which are no big deal.

So when you see stuff in your logs that's more than merely iffy, whether it's related to a Host/IP or a bot/UA, etc., just do what's oft' been explained to you:

1.) Look it up; and
2.) Ignore it; or
3.) Block it.

Period.

Five minutes, tops. Then move on to the good stuff: Making new content -- and sales!

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4231116 posted 9:29 am on Nov 18, 2010 (gmt 0)

Pfui, you beat me to it. I was only going to say "don't sweat the small stuff". Deal with full site rippers/bots, read up a bit on who does what (and from where) drop the noise by 90% in about 90 minutes then get on with life (site). Can't stop all, can't protect it against determined rip/scrape...

In my other life when I'm not a webmaster I'm a musician. I've played clubs and concerts for 45 years. There's all kinds of rowdies in those places BUT ONLY A FEW DRUNKS. For band and place to work okay you let the BOUNCER (.htaccess/rewrite) deal with the bad actors and ignore the rest. Meanwhile you make beautiful music and the kiddies love it, the management (that's you, too) will get rich...and the burly boys put the trash to the curb.

But there's ALWAYS another rowdy out there... just decide if they are too drunk to party in the house. Most are harmless...a bit loud at times, but harmless. Instruct the burly boys properly and then don't worry about the noise.

grandma genie



 
Msg#: 4231116 posted 5:23 pm on Nov 18, 2010 (gmt 0)

I spend hours reading up on hack attacks, but with all my reading still can't seem to recognize if someone is ripping me off or just surfing. Is there any software that will read the logs for me and tell me if the visitor is a bot or not? I'm a Mac user. Please remember that you guys have been doing this forever and it is easy for you.

I am paranoid about this because last year my site was badly hacked and the developer who helped me fix my site said the hacker deliberately targeted my site; it was not a random bot. It was an amazonaws user whose site was taken down after the attack and this year I was again attacked by an amazonaws user. Thankfully htaccess stopped the attack. And I learned all about htaccess from webmasterworld.

So, despite my slowness in learning, I do appreciate your help. I'll keep reading...

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4231116 posted 9:01 pm on Nov 18, 2010 (gmt 0)

I spend hours reading up on hack attacks, but with all my reading still can't seem to recognize if someone is ripping me off or just surfing.


Rather than spending hours searching for solutions, of which you have no clue as to whether these issues exist on your website (s)?

Spend the majority of your time interpreting your raw visitor logs, and how those same requests interact with the structure of your website (s). (Only your are aware of the the structure of your website (s)).
Then determine (from your logs analysis experience) if these visitors are viewing and/or grabbing your pages too fast, and faster than a normal visitor would both read and navigate the material/pages of your website (s).

A key issue is in denying most "server farms" (and their complete server IP ranges), of which amazonaws is one. There are many more.

These server farms are determined by reviewing your raw visitor logs and chasing down the identity of the IP's (and whether it's a host/website, or a normal visitor). All this takes time and patience, to accumulate the references.
It's not likely that you'll locate a copy and paste solution (although I seem to recall an old thread on "server farms" [google.com]) to this or other issues.

thetrasher

5+ Year Member



 
Msg#: 4231116 posted 12:15 pm on Nov 19, 2010 (gmt 0)

[webmasterworld.com...]
Mozilla/4.0 (compatible;) is often the stupid bluecoat "proxy". It's used a lot by large companies

[webmasterworld.com...]
this is a "security check"

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved