Forum Moderators: open

Message Too Old, No Replies

where's the browser?

         

lucy24

11:13 pm on Jun 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is, to all appearances, a human. But I'm still scratching my head.

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-gb) AppleWebKit/533.21.1 (KHTML, like Gecko) Vienna/2.6.0.2600

OK, so there's an obscure browser called Vienna. Except there isn't. I looked it up* and it's an open-source RSS reader. The page they visited doesn't have any RSS, and they collected all the regular human stuff: images, css, favicon.

Can anyone explicate?


* Also searched here. My goodness, Vienna is a busy place. Places, plural.

Pfui

4:01 am on Jun 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lots of apps and add-ons and such append their IDs to 'regular' browser names/strings and are ready to go if/when called upon by the user. RSS readers, toolbars, you name it. I reckon there are hundreds, if not thousands, a la:

AskTb
chromeframe
eSobiSubscriber
FunWebProducts
OfficeLiveConnector
OfficeLivePatch
PeoplePal
SearchToolbar
SuperSearchSearchToolbar
WebWasher
Windows-Media-Player
Zune

Even Word gets in the game with its own name:

Mozilla/5.0 (Macintosh; Intel Mac OS X) Word/12.29.0

Some UAs carry so many extras they look like those station wagons that have back windows chock-a-block with tourist stickers.

Your logs will show you the details but I wouldn't sweat the add-ons unless they're notorious, like WebWasher or similar scrapers, etc. (Those I block by default.)

lucy24

4:36 am on Jun 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some UAs carry so many extras they look like those station wagons that have back windows chock-a-block with tourist stickers.

You mean like the way MSIE always tacks on eighteen different variants of NET CLR? :)

Your logs will show you the details

What I quoted was my logs :( It's the entire UA section-- from open quote to final close quote-- from the raw log file. That's why I'm mystified about what the actual browser is.

Pfui

4:33 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(By details, I meant additional add-on names and string twistifications.)

It looks like Vienna literally substituted itself for/lopped off the end of the original string, the part after "(KHTML, like Gecko)". I reckon the actual browser was Mac Safari:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-gb) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1

Mem'ries...

Over the years, this forum's seen scores of instances where add-ons messed with strings, either accidentally or apparently intentionally. I think AVG wreaked the most havoc... [webmasterworld.com...]

wilderness

5:16 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another MS addon is Office Connector and the multiple UA variations which follow it.
Even MS is now inserting syntax errors (extra spacing, absence of and/or mislocated semi-colons) which results in otherwise broken UA's.

The mobile phones have so many variations of UA's that is impossible to track.

lucy24

6:46 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The mobile phones have so many variations of UA's that is impossible to track.

Yes, and they don't get the favicon, which is otherwise the easiest way to tell between a human and a robot :(

Viennaman* at least was a legitimate human visitor. I finally got so fed up with my Ukrainian robot that I redirected them to 127.0.0.1. They're still pounding on the door up to 5 times a day-- but now it's only 5 single knocks instead of 5 sets of 6 or 10. (It used to be 9 or 15 until I figured out how to unlock the "forbidden" page.) They're very predictable: interior page, then front page, and repeat for a total of 3 or 5 rounds. If only I knew why they're picking on me :sob:


* Actually MUN-man, or possibly MUN-woman.

tangor

9:04 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is usually easier to list what you ALLOW than to keep track of what you DON'T... but that's different strokes for different folks.

lucy24

9:17 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is usually easier to list what you ALLOW than to keep track of what you DON'T... but that's different strokes for different folks.

I've never understand how that works. There are hundreds of thousands of UAs, and 2^some-vast-number of IPs. How do you make your whitelist?

Pfui

11:59 am on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This forum's very own incrediBILL got me started. Searching for --

whitelist incrediBILL

-- yielded this great beginning, from the Apache Web Server forum:

Don't waste time blocking bots, OPT-IN bots and control your content
blacklisting is a no-win endless game
by incrediBILL
[webmasterworld.com...]

See also among the results (ditto Apache forum):

Why not skip blacklisting Bots and Whitelist Browsers Only?
[webmasterworld.com...]

Whitelisting doesn't mean your .htaccess-tweaking days are over, neither is mod_rewrite coding easy to figure out (imho). But it's reallly nice knowing you're blocking loads of bad actors from the get-go.

wilderness

1:27 pm on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's never been a thorough guide of white-listing syntax provided here at Webmaster World. Perhaps it was provided in Bill's own Blog or in another location.

The explanations provided are show brief and merely basic guidelines, which require the user to expand their own applications of syntax.

Another participant had some black-listing lines (passed around privately (not in the open forums)) which provide some white-listing effect based upon then-common browsers and the relation of browsers to operating systems. However even the understanding between browser and OS is overwhelming for most folks.

lucy24

9:08 pm on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you see the recent thread about a user who, for reasons best known to himself, insisted on staying with MSIE 5 for Mac? MSIE <6 (bordering on <7) would seem to be a no-brainer, but there you are. And by this time everyone has "Mozilla" somewhere in the UA string except the low-budget homemade robots. Not a lot of humans will introduce themselves as, in full, "Python-urllib/2.5" ;)

My Ukrainian friend used a package of forged UAs, a different one for each pair of requests. Some are definitely elderly; some are believable, especially for that part of the world.

dstiles

10:08 pm on Jun 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



MSIE 6 is now deprecated by MS and rightly so. They no longer support it so any holes are exploitable. It's only time that's preventing me from adding a line at the top of my own web sites saying "Idiot!" (well, a proper explanation, anyway). I already have a tracer log that tells me when MSIE 6 is used - it's luckily not over-much but some stupid organisations are still using it (it was how google got hacked a year or so back).

I can't upgrade my windows machines to a later browser because they are Windows 2000 - also no longer supported and proving a real hassle since the machines are too puny to install anything later than 2000 even if I wanted to. I've been using linux for most things, including browsing, for a couple of years now but even before that it was always Firefox.

Note that Firefox has at least one add-on that allows the UA to be rotated, so you can't rule out Firefox as the offending bot, possibly with a downloader add-on. The only real way is to study the other browser headers.

Whitelisting bots is a reasonable way to go about things but there are pitfalls. The true bots themselves (bing, g, etc) are easy to whitelist providing the IPs are added into the mix. Except that bing is currently browsing with IPs that have no proper rDNS and g is pushing all kinds of junk bots on us (preview, proxy, translation...).

Bots that use browser-like UAs are more tricky but can usually be controlled by common-sense regex and attention to headers. Plus, of course, blocking the IP range of every server farm that can be found and identified as such - and even then some DSL static ranges get ideas above their station, not to mention DSL botnets.

The internet is in an advanced state of dissolution. Sadly it was never very robust in the first place, having been cobbled together rather than designed, and it's getting to be a dangerous place!

lucy24

12:20 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just did a rough-and-dirty check of the last couple months' processed logs. About 1/4-1/3 of the MSIE 6 users are human, defined as coming in from a search. (I did say rough-and-dirty: I just said "find all" in the text editor. "Processed logs" means I've already excluded authorized robots.) No humans below 6, so they can safely be blocked en masse. Thought I'd found an exception (human using MSIE 5), but backtracking to raw logs it looks as if it was just the translation-bot doing a dry run.

:: wandering off to pore over Apache docs to see if mod_setenvif can look for presence of string "[&?]q=" in referrer, though I guess mod_rewrite would obtain the same results ::

wilderness

3:42 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



MSIE 6 is now deprecated by MS


I haven't used MSIE as my primary browser since 2003.
REFUSE to update to any version newer than 6.0 and use that very sparingly.

tangor

5:03 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whitelisting is a different face on access. Done properly it will appear you've LOST tons of traffic... yet in reality you have not... you've lost tons of bots gobbling bandwidth. Making that decision (losing "hits" problematical) is difficult. Did this some years back and gave it a year... and proved that my DOLLARS did not change (conversions) but my bandwidth LOSS had by an average of 70%. This becomes a no-brainer, but I had to prove it to myself first before I believed it.

Next OFF my whitelist are some elder IE, FF, Safari, and even Chrome versions. The list of mobile allowed is going to be refined as well. After all, it is MY DOLLAR providing the content (Cost Of Business) and folks need to keep that in mind as they chase the Traffic numbers...

blend27

6:44 pm on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



either accidentally or apparently intentionally. I think AVG wreaked the most havoc...


For me it is the Websense. Many of the visitors come see content from Corporate Networks on the sites I manage, visitors from behind Websense proxies. 208.80.19[2+].nnn is a manace to webserver logs, feeding them a 203 with few bites of content that describes the site, that is it. They will fake/append a random word(s) they fetch from Britanica(IMO) to/within a standard UA.

lucy24

8:52 pm on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Done properly it will appear you've LOST tons of traffic... yet in reality you have not...

It sounds as if there are different decisions depending on whether you've got a commercial site or an informational one. I'm definitely on the informational side. (Some visitors might use a less polite word.) No money involved and I don't pay for bandwidth, so as noted elsewhere the choice is between "I don't like your face" and "No skin off my nose". F'rinstance I can't stand hotlinks so they're all blocked on principle. Besides, 90% of them have unspeakably crummy pages. (They all appear to come from the same WYSIWYG editor and to assume a monitor the size of your average wall. And, of course, take hours to load up because every single graphic is a hotlink.)

Because of my content I also follow the unwritten rules (they do exist in writing but it isn't formally codified) for sites aimed partly at Native audiences. That means people with older computers and slow connections, including dialup and unpredictable satellite. A teeny bit of javascript but nothing beyond that.

Besides, sometimes they turn out to be human after all.

:: ostentatiously not looking in the direction of Bretagne ::

dstiles

9:51 pm on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A while ago mobiles were using MSIE 5 and 6 with odd extra paramaters in the UA. I haven't checked recently to see if this is still true.

One of the reasons for blocking arbitrary bots and server farms is to prevent site scraping as far as possible (not entirely possible but a pretty fair proportion).

Many "panda" type complaints in the google forum are about scraped content rating above the originating site.

I'm not saying this will panda-proof sites but it makes sense to prevent scrapers (aka "fake" bots) as much as possible.

lucy24

7:36 pm on Jul 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Postscript: Before we got sidetracked onto whitelisting [webmasterworld.com]*, this thread started with a question about "Vienna". This UA element has turned out to be fortuitous because it made it very easy to keep track of MUN-human, even after they unplugged their laptop and took it home to a bellaliant IP.

Why do I want to keep track of them? Because "My Caribou's Nostrils are Full of French Robots" now has a sequel: "These Kamiik were made for Canadian Academics". (You'll have to take the page on faith, because the People Up Top got mad at me last time I posted a link. Trust me, it's of extremely limited interest-- except possibly to a few people at MUN.)

In this version, MUN-man loads up a string of pages in separate tabs until the computer either crashes or is shut down. On restart, all those tabs dutifully load up again. To date it has happened five times, most recently when MUN-human took his or her laptop back to the office for one of those obligatory midsummer visits.

I think I'll do some quick business with javascript and throw them an Alert saying something like "Willya please close a few tabs because it's driving me bonkers" :)


* A great thread to re-read in its own right, not only for its content but for the Memory Lane look at the robots of 2006 and the later developments of the googlebot.