homepage Welcome to WebmasterWorld Guest from 54.204.64.152
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
a legitimate blank?
lucy24




msg:4303133
 10:45 pm on Apr 23, 2011 (gmt 0)

Just when you thought you'd got it all figured out...

I have accidentally discovered a situation where a perfectly legitimate, law-abiding activity results in a blank user-agent string. If you want to split hairs, it's a single hyphen "-" but I think my server puts that in as a placeholder. I've never seen a genuine blank "".

I've got a png that I use as part of my signature in a small forum. The png lives in my personal www space but isn't linked from anywhere, so it doesn't get crawled. It does show up in logs in the expected form, with "viewtopic" et cetera as referrer.

Recently the Forums administrator did some stuff with the signature file at my request. Logs for the relevant time period showed a series of 403's for the png-- with blank user-agent, and a source IP that agrees with the Forums. (I specifically asked her.)

Coincidentally I use the same host, but we're not on the same server so I don't think that's relevant. Oh, and the 403-- from the blank UA lockout in .htaccess-- doesn't seem to have prevented her from fixing the sig. It may be what prevented me from fixing it, though.

Huh.

 

keyplyr




msg:4303727
 6:23 pm on Apr 25, 2011 (gmt 0)

I block all blank UAs; have for over 10 years with no ill-effects.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteRule .* - [F]

wilderness




msg:4303750
 6:53 pm on Apr 25, 2011 (gmt 0)

ditto

dstiles




msg:4303765
 7:17 pm on Apr 25, 2011 (gmt 0)

I also block all blank UAs but I have seen legit bots occasionally come in that way. Ezine did for a brief period until I badgered them into resetting their UA to what it previously was.

If the source really is legit then tell them to fix their problem or die.

lucy24




msg:4322709
 1:40 am on Jun 7, 2011 (gmt 0)

And just when you thought it was safe to go back in the water...

Call me slow on the uptake. It only took me five weeks to figure out why gwt has been showing the generic www icon instead of my real favicon.

74.125.nnn.nnn - - [01/May/2011:01:03:24 -0700] "GET /favicon.ico HTTP/1.1" 403 476 "-" "-"

74.125.nnn.nnn - - [05/Jun/2011:19:14:19 -0700] "GET /favicon.ico HTTP/1.1" 403 908 "-" "-"

So they've got a special favicon-getting robot that's so far down the hierarchy, they couldn't even be bothered to give it a name.

:: wandering off to add line beginning !74 to .htaccess file ::

wilderness




msg:4322720
 4:15 am on Jun 7, 2011 (gmt 0)

lucy,
If your only making and exception to the blank UA?
That is solid.

On the other hand if your "opening the door" for 74.125., IMO that's not wise.

Don

lucy24




msg:4322896
 1:39 pm on Jun 7, 2011 (gmt 0)

Yes, I'm changing it to two conditions: blank UA and not 74.125...

:: inescapable mental picture of aspiring googlebot spending probationary period as nameless faviconbot ::

dstiles




msg:4323169
 9:23 pm on Jun 7, 2011 (gmt 0)

The spec for websites includes an option to place favicon in another folder (eg /images). I did this (and still do) after a major pest came round ripping off icons several years ago.

Sadly, this spec seems to be one that few browsers and bots fully adhere to. Firefox is notoriously poor at it, though it does load it eventually.

caribguy




msg:4323318
 7:00 am on Jun 8, 2011 (gmt 0)

74.125.16.nn is Google Webmaster Tools

g1smd




msg:4323331
 7:56 am on Jun 8, 2011 (gmt 0)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteRule .* - [F]

The
[OR] looks like an error.

I block on blank user agent AND blank referrer:

RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REQUEST_METHOD} !^HEAD$
RewriteCond %{REQUEST_URI} !^/robots\.txt
RewriteRule .* - [F]

keyplyr




msg:4323858
 7:59 am on Jun 9, 2011 (gmt 0)

@ g1smd

Not an error. I just quickly grabbed it out of context so I could post an example for the OP. But even if used stand alone, some unix/apache server configs may treat the superfluous [OR] as an error, mine would not.

As for blocking blank referrers, too many legit users do this nowadays for my needs. Stock IE offers this feature as do several FF add-ons.

wilderness




msg:4323879
 8:48 am on Jun 9, 2011 (gmt 0)

keyplr,
FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR]

I don't block blank refers on their own, however have some lines where I utilize that with multiple conditions.
EX:
Blank refer from a specific IP
Blank refer from a specific UA
or any combination.
One could even include header checks.

Any general visitor that goes browsing with both blank refers and blank UA's certainly deserves denying, despite their lack of awareness ;)

wilderness




msg:4324013
 1:58 pm on Jun 9, 2011 (gmt 0)

Eleven requests for the same page in 42-seconds.
All requests with blanks refer and UA.

Whether a MS-User or an MS Bot, it/they certainly deserves denial.

65.52.33.85 - - [09/Jun/2011:07:21:45 -0600] "GET /MyFolder/MyPage.html HTTP/1.1" 403 583 "-" "-"

keyplyr




msg:4324199
 6:28 pm on Jun 9, 2011 (gmt 0)

keyplr,
FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR]

They must exist together in the example g1smd gave, not the way I have it written. I have other conditions combined for brevity, keeping my entire .htaccess under 10k, including 100s of blocked ip ranges. I also got rid of the heavy defensive scripts to improve response time.

g1smd




msg:4324310
 10:09 pm on Jun 9, 2011 (gmt 0)

I am intrigued by some behaviour seen a few days ago in ranges supposedly owned by Google.

74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver"
74.125.75.17 - - [nn-Jun-2011:09:18:11 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-"
74.125.126.83 - - [nn-Jun-2011:20:56:27 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-"


Are the last two GWT?
What's the first one?

dstiles




msg:4324311
 10:11 pm on Jun 9, 2011 (gmt 0)

Not a bot - at least, that IP has no rDNS indicating it's a bot.

lucy24




msg:4324350
 11:46 pm on Jun 9, 2011 (gmt 0)

74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver"

! I met one of those just recently too.

66.249.82.195 - - [02/Jun/2011:16:29:17 -0700] "HEAD / HTTP/1.1" 200 269 "-" "urlresolver"

There were g### visits in the immediate neighborhood, but not from the identical IP. Conversely, I've met 66.249.82. before for assorted legitimate g### purposes, mainly translate.

wilderness




msg:4324359
 12:00 am on Jun 10, 2011 (gmt 0)

Here's a 1999 explanation [web.archive.org] which includes a brief line for URLResolver.
There's also an RFC Protocol page that's similar.

Whether that's what's going here is anybody's guess.

lucy24




msg:4325676
 9:48 pm on Jun 13, 2011 (gmt 0)

Back with the nameless faviconbot...

Can anyone foresee any ill effects if I make the htaccess say (in English) "block all blank UAs unless they are asking for the favicon"?

Had another one from a google range the other day, but I can't just keep throwing in IP exemptions, because they keep adding more and there will be no end to it.

tangor




msg:4325799
 6:42 am on Jun 14, 2011 (gmt 0)

I don't allow favicon on blanks... regardless of source IP... if they don't want to identify themselves properly, I'm not inclined to grant access... YMMV

dstiles




msg:4326139
 9:14 pm on Jun 14, 2011 (gmt 0)

I block all blank UAs but favicon MAY get away with it - I'd have to check my code as they are treated differently.

cyberdyne




msg:4337340
 9:53 am on Jul 9, 2011 (gmt 0)

I visited Google Webmaster tools yesterday and upon immediately checking my logs after I found that Google had attempted to fetch my favicon with a blank UA. The request was 403'd and I must admit this has made me rethink.

Relevant IP: 209.85.228.83

Currently using in htaccess:
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR]

Have commented out the above for now to see how it goes.

wilderness




msg:4337380
 1:57 pm on Jul 9, 2011 (gmt 0)

If you start deleting lines to appease google, before long you won't have any need for an htaccess.

#Blank UA except this IP range
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REMOTE_ADDR} !^209\.85\.(12[89]|1[3-9][0-9]|2[0-5][0-9])\.

Pfui




msg:4337399
 3:37 pm on Jul 9, 2011 (gmt 0)

I've got a bunch of sites 'in' GWT and for as long as I can remember, my hitting GWT's "Home" means GWT hits my 'homes' for their favicons to display on that page. Hits are always by IP address, always with a blank UA, and typically hail from the standard ranges.

For example, here's this morning's Home run lineup (partial listing):

72.14.212.81
72.14.212.82
72.14.212.85
74.125.52.86
74.125.126.80
74.125.126.82
74.125.126.87

I block all blank-UA requests and I don't need my GWT Home prettified with my favicons. But where GWT's hits do come in handy is they immediately confirm G IPs and whether or not my blocks are still working.

wilderness




msg:4337402
 3:50 pm on Jul 9, 2011 (gmt 0)

AFAIK, there's never been any penalty (in page results) from the major SE's due to a failed/403 request for a favicon.

cyberdyne




msg:4337403
 3:52 pm on Jul 9, 2011 (gmt 0)

If you start deleting lines to appease google, before long you won't have any need for an htaccess.


Completely agree as it happens, amendments made.
Thanks

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved