keyplyr

msg:4303727 | 6:23 pm on Apr 25, 2011 (gmt 0) |
I block all blank UAs; have for over 10 years with no ill-effects.
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR] RewriteRule .* - [F]
|
wilderness

msg:4303750 | 6:53 pm on Apr 25, 2011 (gmt 0) |
ditto
|
dstiles

msg:4303765 | 7:17 pm on Apr 25, 2011 (gmt 0) |
I also block all blank UAs but I have seen legit bots occasionally come in that way. Ezine did for a brief period until I badgered them into resetting their UA to what it previously was. If the source really is legit then tell them to fix their problem or die.
|
lucy24

msg:4322709 | 1:40 am on Jun 7, 2011 (gmt 0) |
And just when you thought it was safe to go back in the water... Call me slow on the uptake. It only took me five weeks to figure out why gwt has been showing the generic www icon instead of my real favicon. 74.125.nnn.nnn - - [01/May/2011:01:03:24 -0700] "GET /favicon.ico HTTP/1.1" 403 476 "-" "-" 74.125.nnn.nnn - - [05/Jun/2011:19:14:19 -0700] "GET /favicon.ico HTTP/1.1" 403 908 "-" "-" So they've got a special favicon-getting robot that's so far down the hierarchy, they couldn't even be bothered to give it a name. :: wandering off to add line beginning !74 to .htaccess file ::
|
wilderness

msg:4322720 | 4:15 am on Jun 7, 2011 (gmt 0) |
lucy, If your only making and exception to the blank UA? That is solid. On the other hand if your "opening the door" for 74.125., IMO that's not wise. Don
|
lucy24

msg:4322896 | 1:39 pm on Jun 7, 2011 (gmt 0) |
Yes, I'm changing it to two conditions: blank UA and not 74.125... :: inescapable mental picture of aspiring googlebot spending probationary period as nameless faviconbot ::
|
dstiles

msg:4323169 | 9:23 pm on Jun 7, 2011 (gmt 0) |
The spec for websites includes an option to place favicon in another folder (eg /images). I did this (and still do) after a major pest came round ripping off icons several years ago. Sadly, this spec seems to be one that few browsers and bots fully adhere to. Firefox is notoriously poor at it, though it does load it eventually.
|
caribguy

msg:4323318 | 7:00 am on Jun 8, 2011 (gmt 0) |
74.125.16.nn is Google Webmaster Tools
|
g1smd

msg:4323331 | 7:56 am on Jun 8, 2011 (gmt 0) |
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR] RewriteRule .* - [F] |
| The [OR] looks like an error. I block on blank user agent AND blank referrer:
RewriteCond %{HTTP_REFERER} ^-?$ RewriteCond %{HTTP_USER_AGENT} ^-?$ RewriteCond %{REQUEST_METHOD} !^HEAD$ RewriteCond %{REQUEST_URI} !^/robots\.txt RewriteRule .* - [F]
|
keyplyr

msg:4323858 | 7:59 am on Jun 9, 2011 (gmt 0) |
@ g1smd Not an error. I just quickly grabbed it out of context so I could post an example for the OP. But even if used stand alone, some unix/apache server configs may treat the superfluous [OR] as an error, mine would not. As for blocking blank referrers, too many legit users do this nowadays for my needs. Stock IE offers this feature as do several FF add-ons.
|
wilderness

msg:4323879 | 8:48 am on Jun 9, 2011 (gmt 0) |
keyplr, FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR] I don't block blank refers on their own, however have some lines where I utilize that with multiple conditions. EX: Blank refer from a specific IP Blank refer from a specific UA or any combination. One could even include header checks. Any general visitor that goes browsing with both blank refers and blank UA's certainly deserves denying, despite their lack of awareness ;)
|
wilderness

msg:4324013 | 1:58 pm on Jun 9, 2011 (gmt 0) |
Eleven requests for the same page in 42-seconds. All requests with blanks refer and UA. Whether a MS-User or an MS Bot, it/they certainly deserves denial. 65.52.33.85 - - [09/Jun/2011:07:21:45 -0600] "GET /MyFolder/MyPage.html HTTP/1.1" 403 583 "-" "-"
|
keyplyr

msg:4324199 | 6:28 pm on Jun 9, 2011 (gmt 0) |
keyplr, FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR] |
| They must exist together in the example g1smd gave, not the way I have it written. I have other conditions combined for brevity, keeping my entire .htaccess under 10k, including 100s of blocked ip ranges. I also got rid of the heavy defensive scripts to improve response time.
|
g1smd

msg:4324310 | 10:09 pm on Jun 9, 2011 (gmt 0) |
I am intrigued by some behaviour seen a few days ago in ranges supposedly owned by Google.
74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver" 74.125.75.17 - - [nn-Jun-2011:09:18:11 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-" 74.125.126.83 - - [nn-Jun-2011:20:56:27 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-" Are the last two GWT? What's the first one?
|
dstiles

msg:4324311 | 10:11 pm on Jun 9, 2011 (gmt 0) |
Not a bot - at least, that IP has no rDNS indicating it's a bot.
|
lucy24

msg:4324350 | 11:46 pm on Jun 9, 2011 (gmt 0) |
| 74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver" |
| ! I met one of those just recently too. 66.249.82.195 - - [02/Jun/2011:16:29:17 -0700] "HEAD / HTTP/1.1" 200 269 "-" "urlresolver" There were g### visits in the immediate neighborhood, but not from the identical IP. Conversely, I've met 66.249.82. before for assorted legitimate g### purposes, mainly translate.
|
wilderness

msg:4324359 | 12:00 am on Jun 10, 2011 (gmt 0) |
Here's a 1999 explanation [web.archive.org] which includes a brief line for URLResolver. There's also an RFC Protocol page that's similar. Whether that's what's going here is anybody's guess.
|
lucy24

msg:4325676 | 9:48 pm on Jun 13, 2011 (gmt 0) |
Back with the nameless faviconbot... Can anyone foresee any ill effects if I make the htaccess say (in English) "block all blank UAs unless they are asking for the favicon"? Had another one from a google range the other day, but I can't just keep throwing in IP exemptions, because they keep adding more and there will be no end to it.
|
tangor

msg:4325799 | 6:42 am on Jun 14, 2011 (gmt 0) |
I don't allow favicon on blanks... regardless of source IP... if they don't want to identify themselves properly, I'm not inclined to grant access... YMMV
|
dstiles

msg:4326139 | 9:14 pm on Jun 14, 2011 (gmt 0) |
I block all blank UAs but favicon MAY get away with it - I'd have to check my code as they are treated differently.
|
cyberdyne

msg:4337340 | 9:53 am on Jul 9, 2011 (gmt 0) |
I visited Google Webmaster tools yesterday and upon immediately checking my logs after I found that Google had attempted to fetch my favicon with a blank UA. The request was 403'd and I must admit this has made me rethink. Relevant IP: 209.85.228.83 Currently using in htaccess:
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR] Have commented out the above for now to see how it goes.
|
wilderness

msg:4337380 | 1:57 pm on Jul 9, 2011 (gmt 0) |
If you start deleting lines to appease google, before long you won't have any need for an htaccess. #Blank UA except this IP range RewriteCond %{HTTP_USER_AGENT} ^-?$ RewriteCond %{REMOTE_ADDR} !^209\.85\.(12[89]|1[3-9][0-9]|2[0-5][0-9])\.
|
Pfui

msg:4337399 | 3:37 pm on Jul 9, 2011 (gmt 0) |
I've got a bunch of sites 'in' GWT and for as long as I can remember, my hitting GWT's "Home" means GWT hits my 'homes' for their favicons to display on that page. Hits are always by IP address, always with a blank UA, and typically hail from the standard ranges. For example, here's this morning's Home run lineup (partial listing): 72.14.212.81 72.14.212.82 72.14.212.85 74.125.52.86 74.125.126.80 74.125.126.82 74.125.126.87 I block all blank-UA requests and I don't need my GWT Home prettified with my favicons. But where GWT's hits do come in handy is they immediately confirm G IPs and whether or not my blocks are still working.
|
wilderness

msg:4337402 | 3:50 pm on Jul 9, 2011 (gmt 0) |
AFAIK, there's never been any penalty (in page results) from the major SE's due to a failed/403 request for a favicon.
|
cyberdyne

msg:4337403 | 3:52 pm on Jul 9, 2011 (gmt 0) |
| If you start deleting lines to appease google, before long you won't have any need for an htaccess. |
| Completely agree as it happens, amendments made. Thanks
|
|