Welcome to WebmasterWorld Guest from 54.196.190.32

Forum Moderators: Ocean10000 & phranque

Block single backslash as user agent on Apache 2.2 server?

Correct code to block odd single characters in UA field

     
1:29 pm on May 7, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


In trying to reduce unnecessary memory load on my shared server site I've been attempting to block bots and other visitors whom I don't feel will be of value to me. BrowserMatchNoCase in my .htaccess file has been very useful for UA control, and I've also mastered a little bit of REGEX to help in odd situations.

What has me stumped currently, though, is a visitor(s), not an ordinary one, who arrives from various IPs sporting only a single backslash - the REGEX escape character - as his UA.

Under the theory that REGEX caret-dollar sign successfully blocks blank UAs (nothing between start of string and end of string), I've tried caret-backslash-dollar sign (i.e. only backslash as string), caret-backslash-backslash-dollar sign (under the theory that the first instance escapes the second as a literal character), and, from another site, even three and four backslashes given an explanation of how REGEX is ultimately interpreting this sort of character.

So far without success.

Either I am using the REGEX incorrectly with the BrowserMatchNoCase command, or the single backslash I am seeing in my access.log in the UA field is in fact something else only appearing to me to be that single character.

A question implicit in this one even if it is solved is obviously what the correct BrowserMatchNoCase command would be were Mr. Backslash to switch to a single or simple multiple alphanumeric string as his UA, e.g. I would prefer not to have to have separate individual blocks for single alphanumeric UA "a", single alphanumeric UA "b", and so forth.

That one can apparently change the UA displayed by one's browser to anything at all with but a few configuration changes does not leave me hopeful. Aggravating this is the fact that my host has only been partially successful in implementing mod_cloudflare, meaning that an IP block as an alternative would cast too broad a net for miscreants using that CDN or other proxies.

Any help appreciated.
1:45 pm on May 7, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4055
votes: 249


Have you tried using %{HTTP_USER_AGENT} rather than BrowserMatchNoCase?

You would not need a separate line for each UA, they can be combined as (a|b|c|d|e|etc) in one line.
2:17 pm on May 7, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 494
votes: 43


I'll try:
RewriteCond %{HTTP_USER_AGENT} ^\\$

Use [regextester.com...] to test.
2:34 pm on May 7, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


I recently switched all my UA blocks from RewriteCond to SetEnv for two reasons, first, because for some reason my host's server logs now display RewriteCond transgressions indiscriminately as 500 server errors rather than 403 forbiddens, but moreso because Lucy24 who has been of immense help to me here had recommended BrowserMatchNoCase.

Let me try RewriteCond %{HTTP_USER_AGENT} ^\\$ for these special cases, though, to see if it can do what I have failed to achieve with BrowserMatchNoCase and I'll report back if it achieves any success.
2:59 pm on May 7, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 494
votes: 43


I do both RewriteCond and SetEnvIf, from Lucy's recommendation as well, so I know what you mean. They happily coexist together. If you have multiple web sites in directories and the htaccess in the root the SetEnvIf will apply to all.
4:15 pm on May 7, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15254
votes: 691


I use a simpler version:
^\W
No legitimate UA starts with a non-word character, so this covers all bases.

The RegEx for a backslash is the same as the RegEx for any other special character: escape it. So
^\\
ought to work fine (and if it doesn't, your server will throw a 500-class error on every request, so you can test it instantly). As a subcategory of the above: no legitimate UA starts with \ and then goes on to other stuff, so there's no need for the closing anchor at all.

NoCase or [NC] is only meaningful with alphabetic characters, so you may as well save the six bytes.
5:20 pm on May 7, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 494
votes: 43


^\W

You are the Queen. That is gold.
9:20 pm on May 7, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 494
votes: 43


^\W [rexegg.com...] Reference for those of us mortals.

\W One character that is not a word character as defined by your engine's \w
\w Most engines: "word character": ASCII letter, digit or underscore
12:26 am on May 8, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11451
votes: 172


when using to the rexegg reference mentioned above, note that mod_rewrite uses the PCRE (perl compatible regular expressions) engine.

you should consult with the perl regular expressions man page:
[perldoc.perl.org...]

and note the issues regarding locale and unicode support described in the PCRE man page:
[pcre.org...]
11:15 am on May 9, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


Thanks, Lucy, as always, and forgive my apparent lack of manners for not responding sooner. I was awaiting Mr. Backslash who has not yet returned, so I cannot report on whether he has escaped or whether he requires more escaping, but a lonesome hyphen was thwarted, which seems to validate the ^\W catchall handily. Thanks again, and should any further useful info present itself I'll append it here.
4:03 pm on May 9, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15254
votes: 691


a lonesome hyphen was thwarted

An actual, literal hyphen, or a "-" in logs? When a given header isn't sent, it comes through in logs as a -. In Apache logs, you'll see this most often in the Referer area. Yes, I guess that does mean that if someone sends a lone hyphen as their UA, you can't tell from ordinary access logs, though you'd see the difference in the actual headers.

You've probably already got a block on empty user-agents--with exceptions to taste, as discussed in another long thread somewhere hereabouts.

I was awaiting Mr. Backslash
Wasn't that a famous play?
6:17 pm on May 9, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


A "-" in the log UA section. which, now that I look at it, is identical to the many "-"s in the empty referer sections. So, yes, then that is probably what it is: my existing block on empty UAs being triggered.

Frankly, I would probably never have been driven to drill this deep into the UA side were my host ever able to implement mod_cloudflare correctly and not leave me with logs full of mixed original and Cloudflare IPs, even Googlebot and Bingbot running side by side from their home IPs and Cloudflare IPs, hence my greater attention to UAs. Would that there were an .htaccess-side fix to penetrate those clouds.

You're not thinking of "Looking for Mr. Goodbar", I hope. That would bode ill.
5:27 am on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


A very popular bookmarker starts its UA string with "\". If blocked, the user may loose your icon from their favorites menu.

A frequent Safari icon downloader will also start one of the requests with "\" presumably to associate a URL with the icon. Example: an iPad user browsing with Safari wants to save your link to their home screen.

I see these every day and allow these UA strings.
2:32 pm on May 10, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


Unfortunately, like that sound your car makes right up to the moment you take it into the shop, no backslash of any sort has reappeared since I implemented Lucy's fix. The original problem was a single \ with nothing following which I was never able to block with preceding escaping backslashes. I've never seen a backslash leading additional characters, but I appreciate the heads up and I'll keep my eyes peeled so as not to block something legitimate.
2:57 pm on May 10, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 494
votes: 43


Give it time. All bots return to the feeding trough eventually. They are hoping you will forget them.
4:58 pm on May 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15254
votes: 691


I see these every day and allow these UA strings.

###, really? The only bookmarking evidence I see routinely is the favicon requests from Firefox, which use the requesting human's own UA.

The non-word character I see most often is = followed by a complete humanoid UA, which I think means they didn't read the Your New Robot instructions carefully enough. For a few days last summer, the Yandexbot--the real one--similarly goofed and put ": " in front of its UA string.

Second-most-common is a } (close-brace) followed by string of gibberish that probably translates to a malign php script. Or, for variety's sake, it begins "<?php".

In logs, there's also the occasional \xblahblah (hexadecimal strings) but that's simply an artifact of logging non-ASCII characters; in headers it looks different.

Back to the question:
If the offending UA is just a backslash and-that's-all, you could also express it as
^.$
because no legitimate UA consists of exactly one character. (The "-" in logs doesn't count.)
6:20 pm on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


@Iskandrian - I find it interesting your immediate resolve is to block it because you know how, instead of doing the research to find out what this UA actually is.

Every time I've seen a single backslash, it's been one of the two UAs I mentioned above... and I pay close attention to UAs.
7:53 pm on May 10, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


"@Iskandrian - I find it interesting your immediate resolve is to block it because you know how, instead of doing the research to find out what this UA actually is. "

That's probably because I'm not a Webmaster and would have no clue how to do such research. I'm the part-time caretaker of a ghost town on a shared server packed like a slave ship which I pay for myself, the host of which, after two weeks of email correspondence with support offering me every irrelevant free link available, finally confessed to having no clue how to properly implement mod_cloudflare (and apparently completely ignorant of mod_remoteip) on my server in order to provide correct IP addresses.

They do know how to kill my processes, however, and then take the occasion to try to up-sell me to a higher priced service which, I must assume, comes with equivalent non-support.

So, if it doesn't look normal to me - and a single backslash certainly didn't look normal to me - and if it's not of potentially critical importance (e.g. dozens and dozens of Googlebot instances apparently crawling simultaneously from Cloudflare as well as Mountain View) I kill it dead at the door to save that marginal waste of bandwidth I cannot afford. In the process I'm sure I do block a few innocents.
8:51 pm on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Helpful discussions:

Search Engine Spider & User Agent ID Forum [webmasterworld.com]

Server Farm IP Ranges [webmasterworld.com]

Blocking Methods [webmasterworld.com]

Amazon IP ranges [webmasterworld.com]
9:29 pm on May 10, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


Thanks for the links, Keyplyr. I do try to check Webmasterworld when encountering what is clearly a bot to see what you folks think of it.

I wasn't as successful when searching for a single backslash, though. I see in your Msg#:4900587 that every solely single backslash you have encountered has actually belonged to the full UA strings you describe in Msg#:4900433.

How were you able to connect the dots between the single backslash you saw in your logs and the full, benign UA strings beginning with a backslash you described? Is it possible the UA field in the access.log my host provides me with is incorrect or incomplete? Does a single backslash in the UA field actually stand for a full UA string beginning with a backslash the way the single "-" Lucy described is actually the log representation for a blank UA field?
9:34 pm on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Do any of these look something like this?
 "\"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1\"
This is a common bookmarker, the kind that saves a link to your page on the user's Home screen on their phone. This would likely be done by someone who likes your page enough to want to visit it often.
9:52 pm on May 10, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


No. The UA that launched this thread looked like this:

\

I generally leave strings like yours alone, evaluating them instead on the basis of country or network of origin if I pursue them further.
9:55 pm on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Just to be clear, are you reporting what you see from the raw server access log, or from some stats software report. I ask because sometimes reporting software will only display what is in the first set of parenthesis.
10:15 pm on May 10, 2018 (gmt 0)

New User

joined:July 7, 2014
posts: 25
votes: 1


As raw as I can get it: the access.log my host provides me with daily along with an error.log.
10:51 pm on May 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15254
votes: 691


Do any of these look something like this?
"\"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1\"
Those aren't literal backslashes, though. They are escaped quotation marks, necessary because logs use quotation marks as a delimiter. You can verify this by cross-checking the User-Agent header.
11:00 pm on May 10, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Headers show same. I look at heards first, then investigate in logs.
3:56 am on May 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15254
votes: 691


Here's an example that looks like what you are describing.

Access logs:
1.2.3.4 - - [09/Apr/2018:10:08:03 -0700] "GET /ebooks/barrow/ HTTP/1.1" 200 31814 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko)" 
1.2.3.4 - - [09/Apr/2018:10:08:05 -0700] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 200 1583 "-" "Safari/12605.1.33.1.3 CFNetwork/811.5.4 Darwin/16.7.0 (x86_64)"
1.2.3.4 - - [09/Apr/2018:10:08:05 -0700] "GET /ebooks/barrow/ HTTP/1.1" 403 1838 "-" "\"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1\""
1.2.3.4 - - [09/Apr/2018:10:08:06 -0700] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 200 1582 "-" "Safari/12605.1.33.1.3 CFNetwork/811.5.4 Darwin/16.7.0 (x86_64)"
(Well, ###. That icon request means they slipped under the radar and I didn't realize it was not a human. I need to fine-tune my log wrangling; what would a desktop be doing with the apple-touch-icon?)
Header logs:
2018-04-09:10:08:03
URL: /ebooks/barrow/
IP: 1.2.3.4
Dnt: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko)

{snip}

2018-04-09:10:08:05
URL: /ebooks/barrow/
IP: 1.2.3.4
Dnt: 1
User-Agent: "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"


I just looked it up. 1.2.3.4--which I tend to use as a generic IP-substitute--is either a server farm in Brisbane, or the APNIC Debogon Project. No humans at risk either way.
4:37 am on May 11, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Yeah, that's the one I mentioned above. There's another one that's uses "\" all by itself, a desktop bookmarker... and I think I've seen one more a while ago.

what would a desktop be doing with the apple-touch-icon?
I would assume the same thing... to be used as an icon for a saved link.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members