Forum Moderators: open

Message Too Old, No Replies

Profane user agents

Do you get any/many?

         

GaryK

8:14 pm on May 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have seen an alarming increase in the number of profanity-laden user agents. Some are quite graphic.

Do you all get many or any of these kinds of UAs?

If so how do you handle them?

They never seem to take many files so I've been leaving them alone. But now I'm thinking it's time to start denying access whenever I see certain profane words. Just like I do with other words like "User Agent:".

tangor

9:51 pm on May 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see 'em. Ignore them. Never more than a half dozen files at a time. Should that change, then yes I would take action.

dstiles

9:55 pm on May 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are blocked here. Also quite a few of them in referers also blocked.

GaryK

4:50 pm on May 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's interesting you find them in referrers. Might this in any way impact your SE rankings? Do any of these words actually exist in your site's content?

I've never seen any one UA take more than a half-dozen files at a time. Mostly it's just one or two. But I am seeing dozens of unique UAs each week now.

Since these UAs serve no useful purpose, and I'm now concerned they might impact my SE rankings, I'm going to deny them access to all but my browser project site.

As always, thanks for the info.

enigma1

5:07 pm on May 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes I see it quite a lot too with referrers. It's sort of referrer spam. One way they utilize it is through error logs or other reporting features on web sites, where the referrer ends up as a link so spiders see lots of references for a site.

One particular case I noticed was with the access of favicon.ico (default icon file). I was seeing independent access on that file quite a lot in my logs lately (and the irrelevant spamy referrers). So I changed the code to use a different icon file to see what happens. As a result the number of requests towards the old non-existing icon file were increased presumably due to the redirects I am forcing in this case. At least I don't waste b/w.

Also since the icon file change I was able to identify a number of IPs that will previously come in having everything setup correctly (headers, ptr, ua etc) but will not access the new icon file.

dstiles

9:56 pm on May 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary: Much as Enigma says - it's referer spam based on recurring (often non-offensive) domains but with #*$! words in the referring filename.

The only time it MAY be a problem is if you let google scan your site logs. Which, unless you have a VERY good reason to allow, is a very bad idea. And drives me nuts when trying to track down strange new UAs.

I have to add: although offensive UAs have been around for a long time I don't get a lot of them. I tend to get far more idiots with alphabetic UAs such as egewoighew. Which are also inhumed with extreme prejudice.

Enigma - I've kept my favicons out of sight ever since icon-leeching became a sport. Main problem is that Firefox auto-loads from the root before looking at the page header to discover where it should be looking. At least, it used to and I think it still does.

Also, I don't think you can reliably detect baddies on a bad header for an icon request, even on the resulting 404. I've seen some blank headers, depending on browser, in the past, though I can't recall details now.

enigma1

9:24 am on May 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles, regarding ff, at least the later versions I see do not request the default icon. In any case I do not ban anything because the file is missing I just watch patterns and I may block the specific request by doing a redirect instead of the 404 or 403. So if someone comes in requests the home page say and requests the favicon file too the later gets a redirect the rest of request is fulfilled.

Now there is another pattern I am seeing that there are way too many requests on the favicon file (today under a minute I saw like 10 requests on the default icon file with spamy referrers and different IPs). It may be more than just leeching. What I haven't figured out yet is why this increase since I switched the icons (perhaps the redirects?).

GaryK

6:01 am on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks about the referrer spam. My log files are not public, never have been, so I guess I'm safe if I don't outright ban these offensive UAs.

There are lots of tips out there for disabling favicons icons in FF. It's just a quick about:config edit. So I guess since FF users tend to be a bit more technically proficient, and since most of my friends, at least, cannot stand favicons, there's just probably more people telling FF not to fetch them.

enigma1

9:21 am on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gary, as far I've seen from servers I visited due to clients, there're different configurations from the various cpanels to setup the logs. Some they store them inside the webspace of a site, others put some password protection at the top. This may not be done with bad intends but is due to negligence or convenience. (eg: security company wants to get some info from the site logs or run tests, admin sets them up in the web space of the server.)

Now I do not think the site owner makes the logs public. But there are some pre-defined or standard folders where these go into. The bots scan these folders and in turn make them "public".

Also there are other reasons for inserting information in the logs like this (UA or referrer). Log browsing s/w may run say php. So by inserting some code in between php tags within the referrer or ua, the attacker has a chance to run php and may gain access on other things.

Eg:
127.0.0.1 - - [10/May/2005:00:50:50 -0400] "GET / HTTP/1.1" 200 2000 "<?php phpinfo(); ?>" "<?php $j=1; ?>"
So depending in what context the log is viewed the php code can be executed. I remember seeing references about it probably here.

For the favicon, in my case it wasn't matter of not accessing it. What's happening is that an icon is listed with the html but another icon is requested.
Eg:
<link rel="shortcut icon" href="http://example.com/test.ico">
Now I would expect to see requests for the test.ico only and not for the favicon.ico. But I see lots of requests on the later, a non-existing file, so why a browser will ever try to fetch it since it reads the html? The thing is only that file is requested, so that's why the chances are more towards some spam or a hack. And so it's not a browser it can be something else, a link posted perhaps elsewhere, or a bot that probes or attempts to insert some spamy referrer in the logs.

GaryK

7:50 pm on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a full-time sysadmin and we do everything manually. No control panels. So when I create a new site I create the log folders and only assign it enough permissions for the files to get written. There is no public access at all. Admin access is required to read the logs.

Oh, I misunderstood what you meant about the favicon.

dstiles

8:03 pm on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looking more closely at my icon 404s it looks like groups of several at a time can either be:

1. all from a single IP with some dumb MSIE UA, often with MSN rubbish in it and probably, I think, some kind of toolbar (or possibly an allied page monitor);

2. several IPs all hitting the same site/page, again with stupid (and sometimes variable) MS UAs;

3. a few firefox UAs, probably genuine but certainly (in the few I saw) version 3.

I doubt many firefox users are "semi-techie". I recommend all of my clients and contacts to use FF and few of them have any clue, even to installing add-ons I recommend.

If anything is asking for the wrong icon name I suspect it's either an icon scraper or, as I said, a (possibly old) version of FF or some stupid toolbar/monitor, none of which would bother to load the page first (FF does eventually get the correct icon but it (used to?) try the root default first.

dstiles

8:05 pm on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary - I always create the logs outside of the public root. That way a permission reset doesn't affect it. MS has screwed my permissions a few times in the past!

enigma1

9:50 am on May 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it's not only the sever logs though. They're are other targets eg: ga or similar log methods where attackers may know of certain weaknesses and craft the ua or referrer in a certain way to emit a script, link etc.

dstiles

10:17 pm on May 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see hundreds of awstats references in various logs - I don't use it so no problem but I wonder about the attack vector for those that do.

SE sitemap names worry me and it's past time I renamed them before submitting them to the SEs.

I'm also nervous about the SE verifier strings. How much would someone want to try to crack one? I'm very tempted to only present them in page headers if it's the correct SE IP asking for the page, but even then the big-3 are so lax about IP usage and UAs that it could still be an external scan. I certainly distrust the rDNS of some of them so that's not a good answer.

Paranoid? Me? Ha! :)

GaryK

6:31 am on May 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gary - I always create the logs outside of the public root. That way a permission reset doesn't affect it. MS has screwed my permissions a few times in the past!

Same here.

>example.com container folder for each website
>>logs subfolders hold all log files
>>www website root

enigma1

9:50 am on May 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



for the sitemaps I also see the same, accesses on the default filenames like sitemap.xml. That's another area that can be examined perhaps to construct a fake sitemap on the fly returning 200 or forcing a redirect so you can see what ips are coming in to the non-existing folders or files (listed inside the fake sitemap.xml). It may help to identify some scrappers.