homepage Welcome to WebmasterWorld Guest from 50.17.66.61
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
a legitimate blank?
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4303131 posted 10:45 pm on Apr 23, 2011 (gmt 0)

Just when you thought you'd got it all figured out...

I have accidentally discovered a situation where a perfectly legitimate, law-abiding activity results in a blank user-agent string. If you want to split hairs, it's a single hyphen "-" but I think my server puts that in as a placeholder. I've never seen a genuine blank "".

I've got a png that I use as part of my signature in a small forum. The png lives in my personal www space but isn't linked from anywhere, so it doesn't get crawled. It does show up in logs in the expected form, with "viewtopic" et cetera as referrer.

Recently the Forums administrator did some stuff with the signature file at my request. Logs for the relevant time period showed a series of 403's for the png-- with blank user-agent, and a source IP that agrees with the Forums. (I specifically asked her.)

Coincidentally I use the same host, but we're not on the same server so I don't think that's relevant. Oh, and the 403-- from the blank UA lockout in .htaccess-- doesn't seem to have prevented her from fixing the sig. It may be what prevented me from fixing it, though.

Huh.

 

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 6:23 pm on Apr 25, 2011 (gmt 0)

I block all blank UAs; have for over 10 years with no ill-effects.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteRule .* - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 6:53 pm on Apr 25, 2011 (gmt 0)

ditto

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4303131 posted 7:17 pm on Apr 25, 2011 (gmt 0)

I also block all blank UAs but I have seen legit bots occasionally come in that way. Ezine did for a brief period until I badgered them into resetting their UA to what it previously was.

If the source really is legit then tell them to fix their problem or die.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4303131 posted 1:40 am on Jun 7, 2011 (gmt 0)

And just when you thought it was safe to go back in the water...

Call me slow on the uptake. It only took me five weeks to figure out why gwt has been showing the generic www icon instead of my real favicon.

74.125.nnn.nnn - - [01/May/2011:01:03:24 -0700] "GET /favicon.ico HTTP/1.1" 403 476 "-" "-"

74.125.nnn.nnn - - [05/Jun/2011:19:14:19 -0700] "GET /favicon.ico HTTP/1.1" 403 908 "-" "-"

So they've got a special favicon-getting robot that's so far down the hierarchy, they couldn't even be bothered to give it a name.

:: wandering off to add line beginning !74 to .htaccess file ::

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 4:15 am on Jun 7, 2011 (gmt 0)

lucy,
If your only making and exception to the blank UA?
That is solid.

On the other hand if your "opening the door" for 74.125., IMO that's not wise.

Don

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4303131 posted 1:39 pm on Jun 7, 2011 (gmt 0)

Yes, I'm changing it to two conditions: blank UA and not 74.125...

:: inescapable mental picture of aspiring googlebot spending probationary period as nameless faviconbot ::

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4303131 posted 9:23 pm on Jun 7, 2011 (gmt 0)

The spec for websites includes an option to place favicon in another folder (eg /images). I did this (and still do) after a major pest came round ripping off icons several years ago.

Sadly, this spec seems to be one that few browsers and bots fully adhere to. Firefox is notoriously poor at it, though it does load it eventually.

caribguy

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4303131 posted 7:00 am on Jun 8, 2011 (gmt 0)

74.125.16.nn is Google Webmaster Tools

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4303131 posted 7:56 am on Jun 8, 2011 (gmt 0)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteRule .* - [F]

The
[OR] looks like an error.

I block on blank user agent AND blank referrer:

RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REQUEST_METHOD} !^HEAD$
RewriteCond %{REQUEST_URI} !^/robots\.txt
RewriteRule .* - [F]

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 7:59 am on Jun 9, 2011 (gmt 0)

@ g1smd

Not an error. I just quickly grabbed it out of context so I could post an example for the OP. But even if used stand alone, some unix/apache server configs may treat the superfluous [OR] as an error, mine would not.

As for blocking blank referrers, too many legit users do this nowadays for my needs. Stock IE offers this feature as do several FF add-ons.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 8:48 am on Jun 9, 2011 (gmt 0)

keyplr,
FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR]

I don't block blank refers on their own, however have some lines where I utilize that with multiple conditions.
EX:
Blank refer from a specific IP
Blank refer from a specific UA
or any combination.
One could even include header checks.

Any general visitor that goes browsing with both blank refers and blank UA's certainly deserves denying, despite their lack of awareness ;)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 1:58 pm on Jun 9, 2011 (gmt 0)

Eleven requests for the same page in 42-seconds.
All requests with blanks refer and UA.

Whether a MS-User or an MS Bot, it/they certainly deserves denial.

65.52.33.85 - - [09/Jun/2011:07:21:45 -0600] "GET /MyFolder/MyPage.html HTTP/1.1" 403 583 "-" "-"

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 6:28 pm on Jun 9, 2011 (gmt 0)

keyplr,
FWIW, the first two RewriteCond's g1smd provided must exist together (double conditions of both blank refer and blank UA), were they separate, the first line would end with an [OR]

They must exist together in the example g1smd gave, not the way I have it written. I have other conditions combined for brevity, keeping my entire .htaccess under 10k, including 100s of blocked ip ranges. I also got rid of the heavy defensive scripts to improve response time.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4303131 posted 10:09 pm on Jun 9, 2011 (gmt 0)

I am intrigued by some behaviour seen a few days ago in ranges supposedly owned by Google.

74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver"
74.125.75.17 - - [nn-Jun-2011:09:18:11 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-"
74.125.126.83 - - [nn-Jun-2011:20:56:27 +0200] "GET /favicon.ico HTTP/1.1" 200 234 "-" "-"


Are the last two GWT?
What's the first one?

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4303131 posted 10:11 pm on Jun 9, 2011 (gmt 0)

Not a bot - at least, that IP has no rDNS indicating it's a bot.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4303131 posted 11:46 pm on Jun 9, 2011 (gmt 0)

74.125.75.17 - - [nn-Jun-2011:19:11:56 +0200] "HEAD / HTTP/1.1" 200 482 "-" "urlresolver"

! I met one of those just recently too.

66.249.82.195 - - [02/Jun/2011:16:29:17 -0700] "HEAD / HTTP/1.1" 200 269 "-" "urlresolver"

There were g### visits in the immediate neighborhood, but not from the identical IP. Conversely, I've met 66.249.82. before for assorted legitimate g### purposes, mainly translate.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 12:00 am on Jun 10, 2011 (gmt 0)

Here's a 1999 explanation [web.archive.org] which includes a brief line for URLResolver.
There's also an RFC Protocol page that's similar.

Whether that's what's going here is anybody's guess.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4303131 posted 9:48 pm on Jun 13, 2011 (gmt 0)

Back with the nameless faviconbot...

Can anyone foresee any ill effects if I make the htaccess say (in English) "block all blank UAs unless they are asking for the favicon"?

Had another one from a google range the other day, but I can't just keep throwing in IP exemptions, because they keep adding more and there will be no end to it.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 6:42 am on Jun 14, 2011 (gmt 0)

I don't allow favicon on blanks... regardless of source IP... if they don't want to identify themselves properly, I'm not inclined to grant access... YMMV

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4303131 posted 9:14 pm on Jun 14, 2011 (gmt 0)

I block all blank UAs but favicon MAY get away with it - I'd have to check my code as they are treated differently.

cyberdyne

5+ Year Member



 
Msg#: 4303131 posted 9:53 am on Jul 9, 2011 (gmt 0)

I visited Google Webmaster tools yesterday and upon immediately checking my logs after I found that Google had attempted to fetch my favicon with a blank UA. The request was 403'd and I must admit this has made me rethink.

Relevant IP: 209.85.228.83

Currently using in htaccess:
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR]

Have commented out the above for now to see how it goes.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 1:57 pm on Jul 9, 2011 (gmt 0)

If you start deleting lines to appease google, before long you won't have any need for an htaccess.

#Blank UA except this IP range
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REMOTE_ADDR} !^209\.85\.(12[89]|1[3-9][0-9]|2[0-5][0-9])\.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4303131 posted 3:37 pm on Jul 9, 2011 (gmt 0)

I've got a bunch of sites 'in' GWT and for as long as I can remember, my hitting GWT's "Home" means GWT hits my 'homes' for their favicons to display on that page. Hits are always by IP address, always with a blank UA, and typically hail from the standard ranges.

For example, here's this morning's Home run lineup (partial listing):

72.14.212.81
72.14.212.82
72.14.212.85
74.125.52.86
74.125.126.80
74.125.126.82
74.125.126.87

I block all blank-UA requests and I don't need my GWT Home prettified with my favicons. But where GWT's hits do come in handy is they immediately confirm G IPs and whether or not my blocks are still working.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4303131 posted 3:50 pm on Jul 9, 2011 (gmt 0)

AFAIK, there's never been any penalty (in page results) from the major SE's due to a failed/403 request for a favicon.

cyberdyne

5+ Year Member



 
Msg#: 4303131 posted 3:52 pm on Jul 9, 2011 (gmt 0)

If you start deleting lines to appease google, before long you won't have any need for an htaccess.


Completely agree as it happens, amendments made.
Thanks

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved