Forum Moderators: phranque

Message Too Old, No Replies

Traffic with no HTTP USER AGENT

         

csdude55

5:39 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This isn't a super critical section of my site, and it's really obsolete, but I try to determine whether the user is mobile or not by looking at common user agents. I still use JavaScript and CSS to check, too, but this was something I did a few years back to determine whether to show the user Rich HTML or plain textareas via PHP before bothering to load a lot of unnecessary data if they didn't support it.

The code in question is simply:

$ua = strtolower($_SERVER['HTTP_USER_AGENT']);


I see that I get a handful of warnings of Undefined index: HTTP_USER_AGENT, though, which implies that the browser didn't have a user agent!

It's an easy fix *, but do you guys and gals think these are coming from bots that should be ignored, bad bots that should be repelled, a glitch on my server that's not storing the user agent to the $_SERVER array, or legit users with some sort of security extension?


* easy fix:
$ua = isset($_SERVER['HTTP_USER_AGENT']) ? strtolower($_SERVER['HTTP_USER_AGENT']) : false;

not2easy

5:53 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Those are generally unwanted visitors.
What are they doing? That is often enough to decide whether you want them or not and can help you rule out some server glitch. Every blank UA I've seen is unwanted and they can be blocked with a single rule.

I use
# BLOCK BLANK USER AGENTS
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^ - [F]
in .htaccess files.

csdude55

7:51 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's what I was thinking, too, until I worried that it COULD be a legit user with some sort of extension for blocking "personal" info. I've seen a lot of them that are weak attempts to prevent viruses and block ads, but they also block cookies and other stuff that sounds scary to the less-savvy user. I wouldn't be at all surprised for them to be blocking user agents, too.

Or, possibly a search engine bot.

Or, since it's a new VPS, some mistake somewhere along my setup.

But if I could save some server load by safely blocking them then that's much better :-)

I already block a ton of "bad" user agents, anyway (I know I could make this a one-liner using |, this is just to make it more readable):

# starts with
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]

# contains
RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} python [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nikto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scan [NC,OR]
RewriteCond %{HTTP_USER_AGENT} java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} winhttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} clshttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} loader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} < [OR]
RewriteCond %{HTTP_USER_AGENT} > [OR]
RewriteCond %{HTTP_USER_AGENT} ' [OR]
RewriteCond %{HTTP_USER_AGENT} %27 [OR]
RewriteCond %{HTTP_USER_AGENT} %3C [NC,OR]
RewriteCond %{HTTP_USER_AGENT} %3E [NC,OR]
RewriteCond %{HTTP_USER_AGENT} %00 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (?:[;<>'"()]|%22|%28).*(HTTrack|archiver|email|harvest|extract|grab|miner) [NC]
RewriteRule ^ - [F]

not2easy

8:55 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That first list you have for UA's is not very current. Also, if you use the multi-list format separated with | then that is already "contains" vs. entire UA and it has many duplicates.

In your ^ list, it looks like you missed "Netscape".

You can spot a copy/paste list of UAs a mile away ;)

csdude55

9:13 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can spot a copy/paste list of UAs a mile away ;)

Haha, of course it is... I probably picked it up here! LOL It was part of my old VPS configuration, so it dates back to at least 2013, maybe as far back as 2008. Shoot, for all I know I could have picked it up in a newsgroup in 2001 :-O

robzilla

9:42 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No need to guess, just look at your access logs. Who owns their IP addresses? Are they using the site like normal users?

lucy24

11:04 pm on Jan 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is a very, very long time since any legitimate visitor has come in with no UA. It definitely never happens with humans: privacy extensions may leave off the referer (even for supporting-file requests, which annoys the ### out of me), but certainly not the user-agent.

Exceptions I can personally remember:

--many years ago, Google's faviconbot did not send a UA. I stress: many years.
--for a couple of years, Facebook sometimes came without a UA. Mercifully they have now dropped this unwise habit.

In short: a request with no UA can safely be blocked at the gate, so your php need never trouble its pretty little head about the possibility.

tangor

10:44 pm on Jan 10, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been blocking for years .... Thanks, IncrediBILL!

There's enough to deal with in UAs as it is ... see no reason to revisit this at this time. Willing to be convinced otherwise, IF there is proof there's gold in those null fields!