Forum Moderators: open
Do you all get many or any of these kinds of UAs?
If so how do you handle them?
They never seem to take many files so I've been leaving them alone. But now I'm thinking it's time to start denying access whenever I see certain profane words. Just like I do with other words like "User Agent:".
I've never seen any one UA take more than a half-dozen files at a time. Mostly it's just one or two. But I am seeing dozens of unique UAs each week now.
Since these UAs serve no useful purpose, and I'm now concerned they might impact my SE rankings, I'm going to deny them access to all but my browser project site.
As always, thanks for the info.
One particular case I noticed was with the access of favicon.ico (default icon file). I was seeing independent access on that file quite a lot in my logs lately (and the irrelevant spamy referrers). So I changed the code to use a different icon file to see what happens. As a result the number of requests towards the old non-existing icon file were increased presumably due to the redirects I am forcing in this case. At least I don't waste b/w.
Also since the icon file change I was able to identify a number of IPs that will previously come in having everything setup correctly (headers, ptr, ua etc) but will not access the new icon file.
The only time it MAY be a problem is if you let google scan your site logs. Which, unless you have a VERY good reason to allow, is a very bad idea. And drives me nuts when trying to track down strange new UAs.
I have to add: although offensive UAs have been around for a long time I don't get a lot of them. I tend to get far more idiots with alphabetic UAs such as egewoighew. Which are also inhumed with extreme prejudice.
Enigma - I've kept my favicons out of sight ever since icon-leeching became a sport. Main problem is that Firefox auto-loads from the root before looking at the page header to discover where it should be looking. At least, it used to and I think it still does.
Also, I don't think you can reliably detect baddies on a bad header for an icon request, even on the resulting 404. I've seen some blank headers, depending on browser, in the past, though I can't recall details now.
Now there is another pattern I am seeing that there are way too many requests on the favicon file (today under a minute I saw like 10 requests on the default icon file with spamy referrers and different IPs). It may be more than just leeching. What I haven't figured out yet is why this increase since I switched the icons (perhaps the redirects?).
There are lots of tips out there for disabling favicons icons in FF. It's just a quick about:config edit. So I guess since FF users tend to be a bit more technically proficient, and since most of my friends, at least, cannot stand favicons, there's just probably more people telling FF not to fetch them.
Now I do not think the site owner makes the logs public. But there are some pre-defined or standard folders where these go into. The bots scan these folders and in turn make them "public".
Also there are other reasons for inserting information in the logs like this (UA or referrer). Log browsing s/w may run say php. So by inserting some code in between php tags within the referrer or ua, the attacker has a chance to run php and may gain access on other things.
Eg:
127.0.0.1 - - [10/May/2005:00:50:50 -0400] "GET / HTTP/1.1" 200 2000 "<?php phpinfo(); ?>" "<?php $j=1; ?>"
So depending in what context the log is viewed the php code can be executed. I remember seeing references about it probably here.
For the favicon, in my case it wasn't matter of not accessing it. What's happening is that an icon is listed with the html but another icon is requested.
Eg:
<link rel="shortcut icon" href="http://example.com/test.ico">
Now I would expect to see requests for the test.ico only and not for the favicon.ico. But I see lots of requests on the later, a non-existing file, so why a browser will ever try to fetch it since it reads the html? The thing is only that file is requested, so that's why the chances are more towards some spam or a hack. And so it's not a browser it can be something else, a link posted perhaps elsewhere, or a bot that probes or attempts to insert some spamy referrer in the logs.
Oh, I misunderstood what you meant about the favicon.
1. all from a single IP with some dumb MSIE UA, often with MSN rubbish in it and probably, I think, some kind of toolbar (or possibly an allied page monitor);
2. several IPs all hitting the same site/page, again with stupid (and sometimes variable) MS UAs;
3. a few firefox UAs, probably genuine but certainly (in the few I saw) version 3.
I doubt many firefox users are "semi-techie". I recommend all of my clients and contacts to use FF and few of them have any clue, even to installing add-ons I recommend.
If anything is asking for the wrong icon name I suspect it's either an icon scraper or, as I said, a (possibly old) version of FF or some stupid toolbar/monitor, none of which would bother to load the page first (FF does eventually get the correct icon but it (used to?) try the root default first.
SE sitemap names worry me and it's past time I renamed them before submitting them to the SEs.
I'm also nervous about the SE verifier strings. How much would someone want to try to crack one? I'm very tempted to only present them in page headers if it's the correct SE IP asking for the page, but even then the big-3 are so lax about IP usage and UAs that it could still be an external scan. I certainly distrust the rDNS of some of them so that's not a good answer.
Paranoid? Me? Ha! :)