Welcome to WebmasterWorld Guest from 34.201.121.213

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

RewriteCond %

help with script

     
11:25 pm on Oct 19, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 19, 2004
posts:16
votes: 0


Hi.

We have over 1500 informational files on the site and recently have had to watch spam bots as they have begun to overwhelm by trying to devour all 1500 and then coming back the following week to do it again.

Recently, after checking logs and finding only one or two a week of possible real people who use useragent "-" I decided to block all ips who have just the - as their useragent.

(We get a lot of those, and denying them by useragent will cut a lot from individual deny of ip)

I came up with this, which may be good script or not, but which seems to be working.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^-$
ReWriteRule ^.*$ no.html [L]

(no.html is a txt file remamed no.html. It has email address only, if a real person should read it. Nothing else. It is 26 bits)

Checking the logs the last few days, I am finding this:

68.216.6.5 - - [18/Oct/2004:14:08:12 -0700] "GET /favicon.ico HTTP/1.1" 200 26 "-" "-"

Just a few times, but coming from real people who have downloaded a bunch of gif and jpe's. There is also the line denying the favicon.ico

Because the favicon.ico has "-" as the useragent.

All other download of files for the same ip show:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" (or some such user agent)

Most of the time the favicon for MSIE has a normal useragent, (Netscape, Opera, and Mozilla all seem to download with normal useragent) but a few special browsers give the favicon useragent as "-"

One individual tried to download favicon.ico three times, so I am guessing it was something noticed.

(If they were bookmarking and the bookmark didn't work I don't know)

I thought about using the following script:

RewriteEngine on
RewriteCond %{GET} ^favicon.ico$
RewriteCond %{HTTP_USER_AGENT} ^-$
ReWriteRule ^.*$ favicon.ico [L]

(I figured a file that asked for favicon.ico and a useragent of "-" would be redirected to favicon.ico, and, being the default when two RewriteCond % lines)

But I don't think there is any such thing as

RewriteCond %{GET}

I only came into scripting for .htaccess last week so all this is new.

So now I am stuck.

3:31 pm on Oct 23, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 28, 2002
posts:564
votes: 0


Thanks very much. That's been puzzling me for ages.
4:03 pm on Oct 24, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Feb 6, 2004
posts:250
votes: 0


Jim stated:
I have *never* seen a legitimate request using a user-agent or referrer of "-", so frankly, I wouldn't even worry about it. I just use:

RewriteCond %{HTTP_REFERER}<>%{HTTP_USER_AGENT} ^-<>¦<>-$
RewriteRule!^custom403\.html$ - [F]

I find this very interesting. However, I am a bit confused reading this. So please allow me to pose a question regarding the above.

IIRC, I have read some posts here that suggested blocking out the ones that had a blank or a "-" for BOTH the UA AND the Referrer.

If we were to block when either UA OR the referrer is blank (or a "-"), would it not block "type-in" traffic, etc.?

Thanks for your help with this, Jim.

5:09 pm on Oct 24, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, blocking a blank referrer kills type-in traffic and bookmarks, as well as client-side-scripted requests, media player requests, etc.

However, we're not looking at blank referers in the code you cited, but rather at the case where a malicious agent supplied a non-blank referer that is equal to a literal hyphen. In Apache access logs, this will look like a blank referrer, because Apache substitutes a hyphen and puts "-" in your logs for a blank referrer, rather than showing "".

So if you see "-" in your logs, it could be a blank referrer or it could be a referrer using a literal hyphen.

That is the basis of my comment about 'no legitimate referrer of "-" ' ... I mean that no legitimate request will contain only the literal hyphen.

Even though the difference between a blank and a hyphen is not visible in the log files, it is visible to mod_rewrite, and so can be blocked.

Here's what I recommend:

  • Block if both Referrer AND User-agent = blank, AND Request_Method is not HEAD
  • Block if Referrer OR User-Agent = hyphen

    I suspect that entries that appear in Apache logs as "" (which I have only seen recently) are using a backspace or delete character; Otherwise, I can't see how they bypass Apache's behaviour of subitituting a hyphen for a blank. I've been meaning to try to test that, but haven't had the time.

    Jim

  • 12:20 pm on Oct 25, 2004 (gmt 0)

    New User

    10+ Year Member

    joined:Oct 19, 2004
    posts:16
    votes: 0


    Hi.

    I thought I had solved this one but apparently not.

    xx.xx.#*$!.#*$! - - [24/Oct/2004:19:35:06 -0700] "SEARCH /\x90\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02\xb1\x02
    \xb1\x02\xb1\x02

    (I've broken the string and given only 10% of it for the purposes of the query)

    The string is only broken on the site by a 414 340

    (time)

    which then adds "-" "-"

    HTTP_REFERER and HTTP_USER_AGENT I presume

    My first attempt was to try:

    <LimitExcept GET POST>
    order allow,deny
    deny from all
    </LimitExcept>

    When that didn't work I tried:

    <LimitExcept GET HEAD POST>
    Deny from all
    </LimitExcept>

    When that didn't work I tried both:

    <LimitExcept GET HEAD POST>
    Deny from all
    </LimitExcept>

    and:
    RewriteEngine on
    RewriteCond %{REQUEST_METHOD} SEARCH [OR]

    with other RewriteCond %

    (The other rewrite conditions work.

    I looked at www.w3.org/Protocols/HTTP/Methods.html

    and it says

    "SEARCH
    Proposed only. The index (etc) identified by the URL is to be searched for something matching in some sense the enclosed message. How does the client know what message fromats are acceptable to the server?"

    Back to me.

    The long search string (a virus meant to attack Windows servers) is still hitting the Apache site, giving a headache to the log every time.

    1:58 pm on Oct 25, 2004 (gmt 0)

    Full Member

    10+ Year Member

    joined:Feb 6, 2004
    posts:250
    votes: 0


    Kewen,

    Your server seems to have correctly responded with a 414 (URL too long) response as you have noted above:

    The string is only broken on the site by a 414 340

    Here is another thread about this 414 bombardment. [webmasterworld.com]

    HTH

    2:11 pm on Oct 25, 2004 (gmt 0)

    Full Member

    10+ Year Member

    joined:Feb 6, 2004
    posts:250
    votes: 0


    jdmorgan stated:
    Even though the difference between a blank and a hyphen is not visible in the log files, it is visible to mod_rewrite, and so can be blocked.

    Here's what I recommend:
    # Block if both Referrer AND User-agent = blank, AND Request_Method is not HEAD
    # Block if Referrer OR User-Agent = hyphen

    Thanks for the clarification. Greatly appreciated.

    I suspect that entries that appear in Apache logs as "" (which I have only seen recently) are using a backspace or delete character; Otherwise, I can't see how they bypass Apache's behaviour of subitituting a hyphen for a blank. I've been meaning to try to test that, but haven't had the time.

    Yes, this would be quite interesting to explore. Sticky mail me if I could be of any help in running any tests, etc.

    4:17 pm on Oct 25, 2004 (gmt 0)

    New User

    10+ Year Member

    joined:Oct 19, 2004
    posts:16
    votes: 0


    Hi.

    That would work but it would screw up this:

    RewriteCond %{REQUEST_METHOD} GET
    RewriteCond %{REQUEST_METHOD}<>%{REQUEST_URI}!^GET<>/favicon\.ico$
    ReWriteRule ^ /folder/file6\.html [L]

    (The - before L causes a system failure in the tests I have done. It works with F

    ReWriteRule ^ 6\.html - [L]
    ReWriteRule ^ 6\.html$ - [L]

    Both cause system failures.

    Whether the ^ in front makes a difference here I don't know.

    Here's examples from two tutorials:
    RewriteRule \.(gif¦jpg)$ [mydomain.com...] [R,L]
    RewriteRule ^.*$ X.html [L]
    )

    And at the moment I am using this to stop a valid search engine bot from going to the site until it reads robot.txt.

    RewriteCond %{HTTP_USER_AGENT} ^.*dhsjeyr.*$
    RewriteCond %{REQUEST_METHOD}<>%{REQUEST_URI}!^GET<>/robots\.txt$
    ReWriteRule .*\.(htm¦html¦jpg¦gif¦jpe)$ - [F]

    (I didn't redirect it to robots.txt because it has to know it is reading a robots.txt file.)

    If I have to choose between stopping the favicon icon from being downloaded and the pesky, but not threatening string from appearing daily, I probably choose to keep the favicon.

    Maybe, as some famous person asked, (can't remember), is there is a third option?

    Will not bring politics into this however.

    5:10 pm on Oct 25, 2004 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

    joined:Mar 31, 2002
    posts:25430
    votes: 0


    I have to ask... Have you studied the mod_rewrite documentation [httpd.apache.org]?

    ReWriteRule ^ 6\.html - [L]
    ReWriteRule ^ 6\.html$ - [L]

    Both of these should cause a server error, because they are invalid.

    bose is correct, there is no reason to worry about the long "SEARCH /\x90\x02\xb1..." request intended to compromise windows servers, first because your server is Apache, and therefore immune, and also because your server provided the correct "Request too long" response.

    Other than blocking these requests at your firewall, there's nothing much you can do about them.

    Jim

    5:44 pm on Oct 25, 2004 (gmt 0)

    New User

    10+ Year Member

    joined:Oct 19, 2004
    posts:16
    votes: 0


    Hi.

    I have tried the following and it seems to work as regards allowing the favicon.ico to be downloaded.

    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteCond %{HTTP_REFERER} ^$
    RewriteCond %{REQUEST_METHOD}!^HEAD
    RewriteCond %{REQUEST_METHOD}<>%{REQUEST_URI}!^GET<>/favicon\.ico$
    ReWriteRule ^ /folder/file6\.html [L]

    I tested with a useragent -

    It allows normal GET

    but not GET with useragent -

    When requesting a GET favicon.ico it asks where to download.

    So I am assuming the logic sequence is correct.

    Whether this stops the SEARCH string I don't know, there being no valid REQUEST_METHOD SEARCH

    Whoever programmed the virus may be forcing something that appears as a REQUEST_METHOD SEARCH but is a GET, or other REQUEST_METHOD in disguise.

    I have a firewall but have never tried blocking anything but .exe files.

    As you know the SEARCH string comes from many sources.

    Using the above rewrite conditions is one more attempt, anyway.

    6:05 pm on Oct 25, 2004 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

    joined:Mar 31, 2002
    posts:25430
    votes: 0


    > Whether this stops the SEARCH string I don't know, there being no valid REQUEST_METHOD SEARCH
    >
    > Whoever programmed the virus may be forcing something that appears as a REQUEST_METHOD SEARCH
    > but is a GET, or other REQUEST_METHOD in disguise.

    No, what you see in your log is what the server received.

    I'm not sure exactly what your first statement above means; I can successfully block SEARCHes using


    RewriteCond %{REQUEST_METHOD} ^SEARCH$
    RewriteRule .* - [F]
    This 25 message thread spans 3 pages: 25