Forum Moderators: open
BTW, it is a non-commercial site, but has depth of about 100,000 pages of original material. I am concerned about protecting that... does this in any way change your opinion?
I'm no expert on mod rewrites so I couldn't comment here but I'm sure someone will be along shortly.
I would think you would only want to disallow spam spiders, email collectors and such.
There is no guarantees though that is what you are seeing are these. I have seen some referrals in my own logs which are probably anti-spam technology, firewalls and such to protect the user from unwanted garbage.
A quick test of what could be what is did the referral grab all references for the page or just the html. Most visitors browser would grab everything.
This is not in reaction to any specific attempt or anything, just a thought I had last night. To be honest, I have not even checked my logs to see if no UA is a problem. I do know that a site similar to mine DOES block any no UA clients... and I thought the users of this forum would have a pretty good idea on whether the lack of a UA was 50% bad guys, 50% web privacy issues, or more 80% bad guys, 20% Privacy... just trying to get a feel for that!
Also, so as not to be totally rude, I would not just dump to "-"- I am creating an "Error: Unsupported or Inappropriate Client Software" page...
dave
My first thought is still valid, you want visitors coming in regardless of business model.
Privacy is a big issue these days. I get this often and don't have any banners at all. referrer blocked by AdSubtract
RewriteCond %{HTTP_USER_AGENT} !^$ [OR]
It's hard to tell if this is correct or not, since you don't show any more RewriteConds or RewriteRules.
What you have here reads - basically - "If the user-agent is NOT blank, OR". The [OR] implies/requires that another RewriteCond follow this one, and the "If" refers to a RewriteRule (which you didn't show) that will be applied if this RewriteCond (or the ones following it) is/are true.
If you post over in the Web Site Technology Issues forum and describe exactly what you want to do, lots of folks around here can help.
Jim
In the end, it really depends on who your target audience is. Ie., if you are running a Mulder Fan Site, you probably would not want to block masquerading user agents ;)
Below is the Mod_Rewrite rule that I use for such UAs:
RewriteCond %{HTTP_USER_AGENT} ^(-?None?Empty?Mozilla/[123456].0.*\(compatible\)?Mozilla/3.Mozilla/2.01?Mozilla)$ [NC]
RewriteRule ^.*$ - [F,L] # deny access and last
I believe that what he meant was that an increasing number of real human users (potential customers) are using firewalls and applications like Norton Internet Security to protect themselves from spammers, hackers, etc. These devices and programs often block the user-agent string provided by the browser. In more than one case, I have had to "unblock" certain users with blank UA strings because of this.
In one case, the user had bought his machine with NIS installed, and had no idea how to turn it off or lower its security settings to allow his UA string to be passed.
I'm glad I included an emergency contact graphic in my custom 403 page, otherwise he would have been shut out and gone for good.
Be careful not to block proxies (many of which show up as simply "Mozilla/3.01 (compatible;)". These are often caching proxies, requesting the page for real users on the other side of the cache. In this case, the real user's UA is not passed through.
Jim
The user agent is not a privacy issue! Spiders and crawlers should clearly identify themselves and there is no privacy issue in web sites knowing what browser is visiting them.
The people who block or spoof their user agent for privacy reasons are paranoid, right?
I can understand why email scrapers and other nusiance spiders and crawlers want to spoof their identity. But, anybody doing it for "privacy" concerns are just mis-guided. Confusion about the so called privacy implications of this practice only serve to throw up dust and promote an environment that makes it easier for undesirable stealth spiders and agents to maneuver.
Webmasters should be clear that blocking or spoofing the user agent is a bad practice, with no legitimate justification and should be discouraged.
Can anybody justify blocking or spoofing the user agent as a privacy issue?
My point is simply that out of several users that I blocked "unintentionally" (i.e. they got blocked because of my "blank-ua" block and not because they did anything wrong), one of them didn't even know his Norton Internet Security program was blocking UA's - In fact, he didn't know anything about the program - it came with his computer "all set up". He was not at all technically savvy - His money was good, though.
And that was my main point.
In his case, I was able to "unblock" because he had a fixed IP address. Our site is small, which makes this feasible to do. For bigger sites, unblocking on a case-by-case basis would not be feasible.
I subsequently gave up on blocking by blank-UA-only because there was just too much overall maintenance involved in relation to cases like this. Now, I combine other info available from the HTTP_REQUEST and selectively block certain on-site assets. I have also re-arranged things so that the site still works if an innocent party somehow falls into a trap.
When I finally figured out that it was Norton Internet Security blocking UA's I was really disappointed, since I had relied heavily on UA blocking before. But I had to balance my "site security" with how many users I was willing to turn away at the door.
I wish Norton didn't do this, and I wish I could keep my .htaccess file below 10kB, but as usual, things didn't work out neatly or efficiently...
Jim