homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Need Help Blocking Rogue Agent
aristotle




msg:4658455
 5:05 pm on Mar 29, 2014 (gmt 0)

About two weeks ago I noticed that one of my sites was getting repeated visits from the following User-Agent (as shown in Latest Visitors):
Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0

This agent always downloads the same two pages (but no images). The website uses shared hosting on an Apache server.

THE NUMBER OF VISITS HAS BEEN INCREASING DAILY, and now exceeds several hundred per day. They come from 100's of different IPs, mostly in the U.S.

A couple of days ago I did some searching and found that this user-agent is a fake used by a worm called WORM_KELIHOS.TSH. One article says:
This worm arrives on a system as a file dropped by other malware or as a file downloaded unknowingly by users when visiting malicious sites.
It executes commands from a remote malicious user, effectively compromising the affected system. It connects to a website to send and receive information.

I did some more searching and came up with the following .htaccess code to try to block this agent:
BrowserMatchNoCase x86_64 bad_bot
Order Deny,Allow
Deny from env=bad_bot

But this code doesn't work, and this rogue agent continues to arrive every few minutes and download the same two pages from my site.

So I'm hoping that someone can suggest a way for me to block this thing before it gets even more out of control than it already is.

Edit: P.S. I've also tried the following code, but it dosen't work either:
BrowserMatchNoCase x86_64; bad_bot
Order Deny,Allow
Deny from env=bad_bot

 

aristotle




msg:4658460
 6:49 pm on Mar 29, 2014 (gmt 0)

Well I did some more searching and came up with the following completely different code:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} x86_64;
RewriteRule .* - [F]

This looks like it might be working!

But I would still appreciate any comments or suggestions, because I don't have any confidence using .htaccess

not2easy




msg:4658509
 3:18 am on Mar 30, 2014 (gmt 0)

RewriteEngine is usually already on if you are redirecting to or from the www. form of your domain, it only needs to be on once so that first line can probably be dropped.
Non alphanumeric characters should be escaped with \

RewriteCond %{HTTP_USER_AGENT} x86\_64\;
RewriteRule .* - [F]


might work better - but are you sure that you want to block ALL Linux x86_64; users? x86_64 by itself is not an indicator of anything but the OS. I can't say I've seen lots of good ones, but that would be like blocking all Ubuntu or Opera or Safari users. I mean, the x86_64; part of the UA string does not positively identify only one UA.

You could study it to see if there is anything unique about the problematic UA, I have no suggestion for that, sorry. You could 'bundle' the UA with others to accomplish much more with the same basic code - example:

RewriteCond %{HTTP_USER_AGENT} (snippets|spbot|spider|urllib|x86\_64\;) [NC,OR]

and use multiple lines to block more known bots like nutch or others that ignore instructions.

The [NC,OR] (it just means No Case, or) at the end of the line is for adding another line below it - the last line would be [NC] at the end, without the OR part.

If you decide to block all x86_64; users, be sure your 403 offers a way for humans to tell you they got blocked, and be sure you spend lots of time checking to see what and who gets blocked in case they don't bother to let you know.

[edited by: phranque at 2:15 pm (utc) on Mar 31, 2014]
[edit reason] disabled graphic smileys [/edit]

lucy24




msg:4658532
 7:52 am on Mar 30, 2014 (gmt 0)

Non alphanumeric characters should be escaped with \

Oh, my ###.

Excuse me while I go lie down and put cold compresses on my forehead.

Non-alphanumerics DO NOT NEED TO BE ESCAPED categorically. The only characters that need to be escaped in any Regular Expression, ever, are
(1) characters that have special meaning in RegEx. In most circumstances that means
.()^$
and sometimes
[]{}-
(2) characters that have special meaning within the current application. For example, if your language uses / and / as RegExp delimiters, then literal / slashes have to be escaped. Otherwise they don't. In Apache, literal spaces have to be escaped because they have syntactic meaning; line-final spaces can't be used at all.

Lowlines never, ever need to be escaped; in fact they count as "word" characters right along with alphanumerics. (One reason they're so attractive in URLs!)

Escaping non-word characters is not actively harmfull-- the RegEx engine simply ignores any superfluous escaping-- but it makes the text awfully hard to read. And, of course, adds a few bytes to its weight.

Order Deny,Allow
Deny from env=bad_bot

The combination of mod_setenvif with mod_authz-whatcha is generally a good bet when you're matching against something simple. It's probably much easier on your server than using mod_rewrite. I don't know if there exist formal tests; it would almost have to be done on a production server, and most people don't have spare servers with expendable websites sitting around. But gut feeling tells me that mod_rewrite is the most energy-intensive of all possible methods.

But don't use NoCase unless you're absolutely certain that all forms are bad. As an especially dramatic illustration, I recently added a
BrowserMatch GoogleBot keep_out
Obviously not a time for NoCase! But if a dimwitted robot chooses to adopt this UA, so much the easier for me :)

Have you got any other "Deny from..." directives in place? I find it very hard to believe you have none, so any additions should slot right in.

You don't want
Order Deny,Allow
(Others do, but I'm tolerably certain you don't.) That's the whitelisting format. It's for very large websites that can afford to lose some visitors-- especially sites that are attractive enough that humans will try to make contact and ask for exemptions. Typically with this you say "Deny from all" and then add a short specific list of people to allow.

If you're blacklisting, the form is
Order Allow,Deny
Start with "Allow from all" and then pile on the Deny conditions.

aristotle




msg:4658585
 1:50 pm on Mar 30, 2014 (gmt 0)

Thanks for the replies.
Unfortunately, the problem isn't solved, because about 10 hours after I blocked visits with the user-agent string shown in my original post, that string disappeared from the logs and was replaced by the following string:
Agent: Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.15

So now visits with this new string occur every few minutes and fetch the same two pages as before. In other words, the only thing that has changed is the user-agent string.

I got some more information about the Kelihos worm (also called the Kelihos botnet), as follows:
The first version of the botnet was mainly involved in denial-of-service attacks and email spam, while version two of the botnet added the ability to steal Bitcoin wallets, as well as a program used to mine bitcoins itself.[2][20] Its spam capacity allows the botnet to spread itself by sending malware links to users in order to infect them with a Trojan horse, though later versions mostly propagate over social network sites, in particular through Facebook.

Anyway, the problem isn't solved, and I fear that it could get much worse.

This doesn't threaten me financially, because this site is strictly non-commercial with no products or ads. Its sole purpose is to educate people about a controversial issue. But the other side is loaded with money and are the type of individuals who would have no scruples about hiring someone to stage an attack against me. So I'm concerned that what I'm seeing now could be a preparatory stage for a future DDOS attack. In fact there have already been DDOS attacks against several sites similar to mine during the past few months.

aristotle




msg:4658598
 3:23 pm on Mar 30, 2014 (gmt 0)

You don't want
Order Deny,Allow
(Others do, but I'm tolerably certain you don't.) That's the whitelisting format. It's for very large websites that can afford to lose some visitors-- especially sites that are attractive enough that humans will try to make contact and ask for exemptions. Typically with this you say "Deny from all" and then add a short specific list of people to allow.

Thanks Lucy -- Do you happen to know (or does anyone know) if it's practical to create a whitelist of allowed user-agent strings and block everything else. Of course such a list obviously couldn't include every possibility, but can a whitelist be created that would allow in the vast majority of real visitors?

wilderness




msg:4658630
 6:54 pm on Mar 30, 2014 (gmt 0)

If these pests are going to continue coming at you with different browsers, than you'll likely need to implement some severe restrictions, whilst going back latter and making adjustments for the innocents.

SetEnvIf User-Agent Opera keep_out
SetEnvIf User-Agent Ubuntu keep_out

RewriteCond %{HTTP_USER_AGENT} Linux [NC]
RewriteCond %{HTTP_USER_AGENT} !Linux;\ U;\ Android [NC]
RewriteRule .* - [F]
RewriteCond %{HTTP_USER_AGENT} Linux;\ U;\ Android
RewriteCond %{REMOTE_ADDR} ^12\.34[5-9]\. [OR]
RewriteCond %{REMOTE_ADDR} ^34\.56\.
RewriteRule .* - [F]

aristotle




msg:4658635
 7:30 pm on Mar 30, 2014 (gmt 0)

Here is another thread about this general topic from about two months ago.
[webmasterworld.com...]

But in my case, being on a shared server, and not knowing as much as most people here, I really don't know what to do at this point.

aristotle




msg:4658663
 8:46 pm on Mar 30, 2014 (gmt 0)

I noticed that visits by this rogue user-agent always use the home page URL as the referer. Thus In Latest Visitors, all the visits look like this:
Host: 79.143.191.206

/
Http Code: 200 Date: Mar 30 15:28:53 Http Version: HTTP/1.1 Size in Bytes: 37392
Referer: http://www.example.com/
Agent: Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.15

/page2.html
Http Code: 200 Date: Mar 30 15:28:54 Http Version: HTTP/1.1 Size in Bytes: 15453
Referer: http://www.example.com/
Agent: Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.15

In other words, http://www.example.com/ is given as the referer for fetching both pages. It's kind of like a "self-referal". I don't think this ever happens with real human visitors or legitimate bots. So I'm wondering if it might be possible to use this behavior as a way to filter out visits from these rogues. Does anyone know if this can be done?

lucy24




msg:4658664
 8:49 pm on Mar 30, 2014 (gmt 0)

Do you happen to know (or does anyone know) if it's practical to create a whitelist of allowed user-agent strings and block everything else.

Well, it's obviously practical for some sites, because plenty of people hereabouts really use it. The thread you linked to is one recent example-- though it did get a bit derailed in spots!

SetEnvIf User-Agent Opera keep_out
SetEnvIf User-Agent Ubuntu keep_out

RewriteCond %{HTTP_USER_AGENT} Linux [NC]
RewriteCond %{HTTP_USER_AGENT} !Linux;\ U;\ Android [NC]
RewriteRule .* - [F]

Caution! Those are two different and unrelated rules. Each module issues its own 403s; an "allow" from one mod can't override a "deny" from a different one. The one big exception is that if you have URI-based lockouts, you can use mod_rewrite to change from a non-permitted to a permitted form. You can't do much about the other aspects of the request, though.

mod_authz-thingummy tends to execute very late, immediately before the core. But in general you can't make assumptions about rule ordering. So make sure that your rules work, no matter what their order.

But in my case, being on a shared server

We're all on shared servers-- at least Don and I are. Everyone is, except the grownups with very large sites. And, of course, the serious technogeeks who run their own little servers out of their garage. But those come in with a different range of questions.

aristotle




msg:4659213
 4:05 pm on Apr 1, 2014 (gmt 0)

I would appreciate it if someone could take a look at some code that I've finally come up with after a lot of time and effort. It appears to work when I test it on one of my other websites. Note: This is just some temporary test code, not something I would actually use.
# Block User-Agent Strings
SetEnvIf User-Agent attach ban
SetEnvIf User-Agent "Advanced Email Extractor" ban
SetEnvIf User-Agent BlackWidow ban
SetEnvIf User-Agent "Sqworm/2.9.85-BETA" ban
SetEnvIf User-Agent Bot.mailto:craftbot@yahoo\.com ban
SetEnvIf User-Agent ChinaClaw ban
SetEnvIf User-Agent "OPR/20.0.1387.77" ban
SetEnvIf User-Agent 20100101 ban
Order Allow,Deny
Allow from all
Deny from env=ban

P.S. I originally had a ^ sign at the beginning of each term, because most of the examples I've seen have it. But nothing gets blocked in my tests when that sign is used, so that's why I don't have it in the code now.

lucy24




msg:4659325
 11:56 pm on Apr 1, 2014 (gmt 0)

The ^ anchor means "this text has to come at the very beginning of the element I'm looking at". The beginning of the IP, the beginning of the UA and so on.

I've got a recently acquired hunch that some people overuse or misuse the ^ because they've misunderstood its purpose in Apache. It's not a punctuation mark with semantic meaning; it's part of the Regular Expression. In the case of an IP it's obviously crucial if you're in a mod that uses Regular Expressions. In ordinary mod_authzzzz CIDR syntax, don't use it or everything will break.

aristotle




msg:4660697
 2:41 pm on Apr 5, 2014 (gmt 0)

After investigating this matter for several days, I've identified the following user agents that have accessed this particular site and fit a particular pattern. They all come numerous times, request the same two pages, and use the home page URL as the referrer. I found some of these by going back and looking through my archived logs for March. I had noticed about three weeks ago that something kept fetching the same two pages, but didn't pay enough attention to realize that different user agents were doing it. Anyway, here is the list:
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 5.8 (build 4157); .NET CLR 2.0.50727; AskTbPTV/5.11.3.15590)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; chromeframe/19.0.1084.52)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1
Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.01
Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.02
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36
Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.52 Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.52 Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (X11; U; Linux x86_64; C) AppleWebKit/534.3 (KHTML, like Gecko) Qt/4.6.2 Safari/534.3
Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:14.0; ips-agent) Gecko/20100101 Firefox/14.0.1
Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0
Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.15

Another part of the pattern is that a new user agent appears every day or so as the previous one gradually fades from the logs, although the old ones continue to reappear sporadically over subsequent weeks. Each new one fetches the same two pages every few minutes, and continues to do so for 24-36 hours before being replaced by another one. There is always some time overlap of several hours when both are coming. In other words, there's a continual succession taking place, but with a time overlap and occasional reappearances by older ones. This succession appears to happen regardless of whether I block the old one in .htaccess or not.

My suspicion is that a new botnet is in the process of being created through malware distribution, and what I'm seeing now is a testing process associated with slightly different versions of the malware rolling out in succession. I also suspect that all of these user agents might be unleashed simultaneously at some future time. But who knows.

I think that most of these user-agent strings are either fakes or obsolete. That means that I should be able to block most of them individually in .htaccess without also blocking most of the real human visitors. But the ultimate list could become very long. So I'm wondering if it would be possible to catch a lot of them by blocking old versions of Firefox and Chrome, but don't have enough knowledge at this point to know where the version cutoffs should be or the details of how to do it.

lucy24




msg:4660756
 8:06 pm on Apr 5, 2014 (gmt 0)

So I'm wondering if it would be possible to catch a lot of them by blocking old versions of Firefox and Chrome, but don't have enough knowledge at this point to know where the version cutoffs should be or the details of how to do it.

It depends on your individual site. What I do is search raw logs (with, ahem, a text editor, not by hand!) and see if a given version has been used by humans within the past year. And even then, there are two tiers: The versions that are absolutely impossible and can be blocked without question, and the ones that get redirected to a special page "I'm sorry, but the server thinks you are a robot". On the off chance of a human with an antiquated machine, this seems kinder than saying "Get out of my sight, you foul Ukrainian!"

:: shuffling papers ::

Unconditional blocks, done via BrowserMatch in htaccess shared by all sites:
Firefox/[12]\b
Mozilla/[0-3]
MSIE [1-4]\.
Opera [3-9]

Limited access, with overrides for certain circumstances which I obviously won't spell out:
RewriteCond %{HTTP_USER_AGENT} MSIE\ [56]\.\d [OR]
RewriteCond %{HTTP_USER_AGENT} Chrome/[1-8]\.\d [OR]
RewriteCond %{HTTP_USER_AGENT} Firefox/(3\.[0-5]|[567])

This is a very conservative list, based in part on my target audience. Be sure each number ends with either \b ("word boundary") or some specific character like \. or you'll accidentally lock people out every time Chrome or FF jumps to the next multiple of 10. I think I once locked out MSIE 10 by mistake when it was first introduced.

To make things run more expeditiously, almost all mod_rewrite blocks are constrained to requests for pages.

It isn't more than 1 or 2 days since I went to a site that wouldn't let me in because I was using an older version of Safari. My offense? Safari
6 (SIX), aka the last version that can be used by OS 10.6.8-- numbers which are also present in my UA string. (Many sites collaterally block Camino because of the "like Firefox/3.6" in its UA string. I'm accustomed to this.) You'd better believe I fired off an irate email at once.
aristotle




msg:4660763
 9:24 pm on Apr 5, 2014 (gmt 0)

Thanks for the code examples, Lucy, although it looks like I'll have to do some more studying on Apache code in order to understand all of it, but should figure it out eventually.

Also, does you code only block Firefox through version 12? ( i know you said "conservative", but that might be too conservative for my purposes.) Because if you look at my list of user agents, there's some Firefox/20, 21, and 22 versions which I'd like to block. I've done some initial checking and those versions don't seem to be very common. In other words, I'm concerned that there could be circumstances in which I might have to block a small percentage of real humans in order to stay online.

lucy24




msg:4660831
 9:48 am on Apr 6, 2014 (gmt 0)

My unconditional mod_setenvif block covers only 1 and 2. The next tier says
(3\.[0-5]|[567])
meaning 3.0-3.5 and then 567. That's based on personal study of logs. 3.6 obviously has to be allowed in or I'd lock myself out ;) and for some reason I've seen the occasional human using 4. But then nothing until you get to 8-and-up.

If you're being less conservative you could certainly start with something like
Firefox/1?\d\b
meaning everything from 1 through 19. And some things really are no-brainers, like Mozilla < 4.

aristotle




msg:4660836
 10:37 am on Apr 6, 2014 (gmt 0)

Thanks Lucy
Apparently I mis-understood your code because of my ignorance of the syntax.

I need to find a tutorial that lists all the rules and describes the basic techniques.

aristotle




msg:4660858
 1:18 pm on Apr 6, 2014 (gmt 0)

Hmmm
I've been investigating the current usage of old versions of Firefox and came across the following Latest Visitors entry for one of my websites:
Host: 66.249.85.24
/
Http Code: 200 Date: Apr 05 23:56:55 Http Version: HTTP/1.0 Size in Bytes: 9215
Referer: -
Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

/favicon.ico
Http Code: 200 Date: Apr 05 23:56:55 Http Version: HTTP/1.0 Size in Bytes: 70
Referer: -
Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

This looks like a Google favicon fetching bot that includes Firefox/6.0 in the user-sgent string. So is this one old version of Firefox that shouldn't be blocked? Does anyone know?

wilderness




msg:4660899
 5:12 pm on Apr 6, 2014 (gmt 0)

Google will not penalize websites for the inability to access images (favicon or otherwise).

The file sizes for favicon are generally so small that despite the quantity of requests, there is not any excessive load on any server for same.

It's more effective to store all your images in specific image folders and simply list those image folders in robots.txt, and let the compliant bots honor you request.

aristotle




msg:4660917
 5:55 pm on Apr 6, 2014 (gmt 0)

Thanks wilderness
My favicons are 1 pixel (70 kilobytes). I still don't want to block anything from Google, but that's a minor problem that I can take care of later. I just wonder why they're using such an old version of Firefox in their user-agent string
Edit: Oops - I meant 70 bytes, not 70 kilobytes

lucy24




msg:4660945
 8:07 pm on Apr 6, 2014 (gmt 0)

This looks like a Google favicon fetching bot that includes Firefox/6.0 in the user-sgent string. So is this one old version of Firefox that shouldn't be blocked? Does anyone know?

This is actually the new faviconbot. The old one-- up to, I think, less than a year ago-- sent no UA string at all. This undoubtedly got it blocked by many, many sites:

BrowserMatch ^-?$ keep_out
Technically, ^-$ ("-" in logs) in almost any category means no header was sent, while ^$ ("" in logs) means the header was empty. So cover yourself both ways.

For me personally it makes no difference, because
#1 almost all my RewriteRules are constrained to requests for pages, and
#2 I have a specific <Files> exemption-- similar to the common "robots.txt" one-- that lets everyone get the favicon. This was originally intended to help me flag humans who got locked out by mistake; the faviconbot was just a collateral benefit.

Speaking of firefox and favicons: There's an addon called, I think, Favicon Reloader. You'll see it in logs as requests for the favicon only with a recent FF as UA. Some browsers pick it up much more often than necessary, but overall it's A Good Thing because it means someone's got your site bookmarked ;) And bookmarks with attached favicons are more attractive; just look at your own bookmarks menu.

aristotle




msg:4660958
 8:54 pm on Apr 6, 2014 (gmt 0)

thanks for the replies about the favicon bot. There's just too many things that webmasters have to deal with now. Also, as a final question about this (hopefully), can someone explain why Google's favicon bot also fetched the home page just before it got the favicon.

lucy24




msg:4660970
 10:38 pm on Apr 6, 2014 (gmt 0)

why Google's favicon bot also fetched the home page just before it got the favicon.

Dunno, but it always seems to do it that way. Maybe it wants to start by assuring itself that the site exists. (But if it doesn't exist, why would there be a favicon at all?) Maybe it's some arcane Google Database thing, where they can't have favicons lying around loose but have to associate them with a front page.

The funny part is that if the faviconbot is blocked from the page-- which is all too likely for anything that uses the FF 6 UA string or nothing at all-- it will then not even ask for the favicon. Even if, in my case, it would have had no trouble getting it :)

aristotle




msg:4663488
 12:47 pm on Apr 16, 2014 (gmt 0)

I've found strong evidence in my logs that what I'm seeing as possible preparation for a DDOS attack is connected to a company called ColoCrossing, which has a hosting facility in Buffalo, New York. Here is a link to a Google advisory page for sites hosted at that facility: [google.co.uk...]

Edit:Here is another link with information about this:
[cleantalk.org...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved