homepage Welcome to WebmasterWorld Guest from 50.19.169.37
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
mozilla
position of "Mozilla" in UA string
lucy24




msg:4595907
 11:44 pm on Jul 23, 2013 (gmt 0)

Another day, another question:

Does the element "Mozilla" ever occur non-initially in legitimate human UAs?

I've found one possibility:
Kik/6.4.0.38 (Android 2.3.6) Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R820 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1

I can't speak to its legitimacy but it definitely seems to be human.

Contrariwise I meet a lot of robots that identify themselves as "Mozilla blahblah" in quotes-- i.e. non-initial Mozilla. And a lot of this kind of thing:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.0.3705)
-- that is, one UA nested inside another. (Most often MSIE 6, which is handy because those are automatically redirected in any case.)

Setting aside the image-search possibility, is it safe to slap down a global block on
.Mozilla
?

 

wilderness




msg:4595929
 1:38 am on Jul 24, 2013 (gmt 0)

lucy,
There use to be some useful links within each forum's library. The Apache forum offered a very useful one on regex. They've apparently all been removed and replaced with standard lines in all-libraries :(

The only exceptions in my logs today were:

ia archiver & msnbot-media

everything else begins with.

BTW I've a left of SetEnvIf from not sure when "^moz", note LC.

lucy24




msg:4595940
 2:36 am on Jul 24, 2013 (gmt 0)

I may not have worded the question right. As a Regular Expression,
.Mozilla
means "the literal string 'Mozilla' preceded by any text at all" -- as opposed to
^Mozilla
which means it's the first element in the string.

So I'm not looking for
!^Mozilla
things that don't begin "Mozilla" (a group that contains most robots, all of Opera, and an untold number of telephones).

I'm looking for
.Mozilla
things that say "Mozilla" after they have said something else-- including but not limited to UAs that say "Mozilla" twice, so at least one occurrence has to be non-initial.

dstiles




msg:4596175
 6:50 pm on Jul 24, 2013 (gmt 0)

Mobile UAs are variable and although some of the later ones obey the rules many do not. For example, many begin Opera or Samsumg or similar. If you know they are mobiles it's probably safe to let them in.

NOTE: Checking other headers is not a good way of validating mobile devices. It depends on what they are and how they are connecting. In particular mobiles via proxies (even legit proxies) can come in with some really odd header combinations. I've had to relax header-checking quite a bit for known mobile UAs, especially those using proxies.

Multiple Mozillas in a single UA can often mean "I've just installed a really amazing tool in my browser and it has no idea how to create a proper user-agent string." I see this a lot with toolbar extensions, including G's. I suspect MS may also screw it up when updating from (eg) IE7 to IE8. As a result I tend to go by other headers, using double Mozillas as a tie-breaker.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved