Forum Moderators: open

Message Too Old, No Replies

Downloads from a blank UA?

         

keyplyr

6:06 am on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




All the work I do tracking down Robots, UAs, IP addresses, et al and updating rewite rules to block access and along comes some guy on Verizon with software that uses my site as a referrer but a blank UA and takes every file in my account - in 4 minutes!

70.16.148.** - - [28/Jun/2006:16:32:53 -0400] "GET /webpage.html HTTP/1.1" 200 9729 "http://www.mydomain.com" "-"

How does one combat this?

bobothecat

6:35 pm on Jun 29, 2006 (gmt 0)



I use:

RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR]

Though there could be a better way, but it works for me.

There's also this thread:

[webmasterworld.com...]

never got Jim's example to work for me though :(

incrediBILL

6:45 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If blank UAs are just catching your attention wait until you see these:

Small sampling:
66.148.68.x 2uigq2oecesvv2nwso rwiakBsBue Bobgw2nuB
202.125.44.x efeSthqvkr11ticgo1iovjjrdwakbbd
66.148.68.x emwx4cxnd pedafhfpac
66.148.68.x ymdexin7xpebtulwnxew
202.125.44.x pepgfu wjdjqrxckulhwiflmrdsmkc mjvldn
84.180.94.x mairwthe Ifirpl8tiwotwyi lsu
84.180.94.x r9Hreiynmkxmpjh ioHmmknpdmid
66.148.68.x ewoqaohlcegoD emkdywx
66.148.68.x obtDrqhxogxsewDfcDktb
209.190.21.x bedmdFjkFhc4a noFjajakffieapvngdtpwxk
209.190.21.x gdouk6Ss6nnykg66hvojc6txjsecuu
209.190.21.x aphErvbtijj vulgctlslo
209.190.21.x jgbhwntsdlprxcwogijI8orrw b8
209.190.21.x DrbspcgyubxrpeikfiihxD mh
209.190.21.x jvAhnviAjwwud8gymvewtcqhehgbAcytyqdxq
209.190.21.x cvwkvl6kfujhqlujqblFl dffrepmrxdspmdFjq

I'd like to see that rewrite rule ;)

wilderness

7:19 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd like to see that rewrite rule wink

Course it's not one however it may be reduced to three lines.
However, from my own point of view (and with frequent visitors from Ohio and Columbus interested in my pages), the fourth line a bit too extreme (even for me!)

deny from 202.
deny from 84.
RewriteCond %{REMOTE_ADDR} ^66\.148\.(6[4-9]¦[7-9][0-9]¦1[01][0-9]¦12[0-7])\. [OR]
RewriteCond %{REMOTE_ADDR} ^209\.190\.([0-9]¦[1-9][0-9]¦1[01][0-9]¦12[0-7])\. [OR]

keyplyr

8:08 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, immediately after I posted I started using:

RewriteCond %{HTTP_USER_AGENT} ^$ [OR]

80 HEAD requests from an AOL IP address were blocked because of blank UA, but I think a lot of bad guys get an AOL account to do their mischief believing that we won't block them for fear of loosing legit hits from AOL users.

So - I am wondering about the downside about blocking blank UA. Any opinions?

wilderness

8:22 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many IP use either blank UA's or "-" UA's to cache their pages.

Many folks prefer to allow the blank UA's.
I've never. (not even from AOL).

Guess it's one of those personal choices that we must decide are beneficial or detrimental?

incrediBILL

8:45 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness, THAT won't work, that's just a small sample.

I get random UA's from all over the place so blocking by IP won't fly as you can't block comcast, roadrunner, AOL, etc.

Just tossed it in for amusement factor because there is no rewrite rule for that, it was a trick problem :)

Pointing out why I do opt-in rules, not opt-out rules, as opt-in rules trap the random UAs, blank UAs, and everything else you've never seen before.

[edited by: incrediBILL at 8:48 pm (utc) on June 29, 2006]

keyplyr

8:47 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use:

RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR]

Though there could be a better way, but it works for me.


Thanks bobothecat :)

wilderness

9:16 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BILL,
Comcast is one of a few that have extenisve subnet delegations to each state and regions of that state (wish all IP's did similar). It makes it very easy to deny access to a very small range that a pest is coming from.

RoadRunner only has a few ranges which seem to cluster all the "pests" together.

AOL is somebody on their own planet. Their cache requests are for the most part Mozilla and version number in the UA, however the Blank UA is frequently used for their HEAD requests.

The majority of my own pages are no cache in the meta tags and I see no reason to give any bot the chance to deviate from what I desire.
As a result I've been using the BLANK UA method (bob) provided since (I don't know when, long time.)

Don

jdMorgan

9:53 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



None of this is addressed to the blank-UA-only problem. As Don states above, some block those, and some don't, because it will make your site look broken to those who use Internet Security software or who come to your site through a caching (i.e. corporate or ISP) proxy.

However, since there are two other code-related problems posted here, I'll just toss these out:

For blank referrer and UA:


# BLOCK blank referer -AND- UA (except for HEAD and favicon requests
RewriteCond %{REQUEST_METHOD} !^HEAD$
RewriteCond %{HTTP_REFERER}<>%{HTTP_USER_AGENT} ^<>$
RewriteRule !\.ico$ - [F]

For random-letter User-agents:

# Block random-letter. non-Mozilla user-agents
RewriteCond %{HTTP_USER_AGENT} !^Mozilla
# 15 or more chars with no "/.{};" characters
RewriteCond %{HTTP_USER_AGENT} ^[a-z0-9\ ]{15,}$ [NC]
# no vowels after 5 characters
RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC]
RewriteRule .* - [F]

Both snippets from live servers.

Jim

incrediBILL

10:40 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim gets it better, but it's still blacklisting ;)

I'll see you two @ PubCon and 'splain it then ...

Pfui

12:12 am on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplyr mentioned the bad guy using keyplyr's own site as referer -- "http://www.mydomain.com".

I first saw something similar with my main site, and I knew the pages hit were not accessible from the main URL. So I added these and the bad guys were 403-goners:

# Stop fake referers (note carat placement)
SetEnvIfNoCase Referer "^example.com" no_way
SetEnvIfNoCase Referer "^www.example.com" no_way

# Stop misconfigured URLs -> errors (note dot before slash)
SetEnvIfNoCase Referer "^http://example.com./" no_way
SetEnvIfNoCase Referer "^http://www.example.com./" no_way

Alas, I can't get either of the next variations to work for my little triplet to be complete. The difference between the two sets is the last slash:

SetEnvIfNoCase Referer "^http://example.com/:80" no_way
SetEnvIfNoCase Referer "^http://www.example.com/:80" no_way

SetEnvIfNoCase Referer "^http://example.com/:80/" no_way
SetEnvIfNoCase Referer "^http://www.example.com/:80/" no_way

I know this isn't the Apache forum but does anybody see where I'm goofing up with either set? TIA for telling me!

jdMorgan

12:22 am on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Again, I believe the best approach is a whitelist backed up by a blacklist (with further whitelisted exceptions added to that blacklist, like allowing Superpages to use ^Mozilla/4\.0$ on my sites).

But if I have the code to solve a simple problem that's posted, then I'll post it without trying to convert folks to my preferred overall implementation. Besides, that would be one extremely long post...

Those random-user-agent guys used to really bug me on a site that -due to its readership- needed to be very 'open', making a whitelist too large and cumbersome. So, after analyzing their 'so-called random' UAs, I found that simple solution.

---

Having diverted Keyplayer's thread far enough, the only thing I can suggest for now is Key_master's and AlexK's bad-bot scripts (posted in the WebmasterWorld PERL and PHP libraries, respectively). Either of those would likely have stopped the downloader after a few pages.

Jim

abates

8:35 am on Jun 30, 2006 (gmt 0)

10+ Year Member



I'd like to see that rewrite rule ;)

Since none of the random letter user agents have punctuation in them, I just blocked user agents which didn't have /, . or ( in them. That let the vast majority of valid user agents through...

Romeo

9:58 am on Jun 30, 2006 (gmt 0)

10+ Year Member



I used to use the `^[a-z0-9\ ]{15,}$ [NC]` for a long time here, too, and thanks for the `# no vowels` trick (I like it to learn something new every day ...), but I fear that these two rules will work only until "they" will learn and adapt by extending their random character set by a blank, some punctuation and more weight on vowels.

Another thing: instead of a `RewriteRule .* - [F]`, a silent rewrite to a valid alternate page with some short broken content would give "them" a nice '200' and the feeling that their scipts work fine and everything seems OK, instead of a harsh '403' return code, which would make "them" aware that we are aware of them -- but I see no need to tell them.

Kind regards,
R.

keyplyr

10:40 am on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone know of an online tool to test HEAD requests whith a blank UA?

wilderness

1:53 pm on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



online tool to test HEAD requests whith a blank UA

[wannabrowser.com...]

keyplyr

8:34 pm on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks wilderness, I'm aware of wannabrowser. Altough it does allow blank referrer for HTTP requests, I wish to test HEAD requests with a blank referrer.

Sorry for the Off Topic nature of this post.

jdMorgan

11:08 pm on Jun 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can always use the HyperTerm program bundled with Windows. You have to type in the same lines that a browser would send. Capitalization, apelling, and spacing are critical, and there is no 'backspace' available, so you have to get it right.

On XP, it's at C:\Program Files\Windows NT\hypertrm.exe

You connect using your IP address, on port 80 (typically), and using TCP/IP (Winsock) as set up under Files->Properties in HyperTerm.

Example:

Host: www.example.com <Enter>
User-agent: <Whatever you want to test, as long as it's valid><Enter>
HEAD /page_name.html HTTP/1.1<Enter,Enter>

Using HyperTerm is primitive, and it's not the easiest tool to use, but it lets you send *any* request you like.

Jim

keyplyr

12:36 am on Jul 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim, but I don't have a bundled version of Windows :)

My new custom built machine is sleak, lean and mean... Hooyah!

Wizcrafts

8:55 pm on Jul 8, 2006 (gmt 0)

10+ Year Member



Incredibill stated:

I get random UA's from all over the place so blocking by IP won't fly as you can't block comcast, roadrunner, AOL, etc.

Just tossed it in for amusement factor because there is no rewrite rule for that, it was a trick problem :)


Bill, Wilderness, KeyPlyr;
I have a solution to the random User-Agent strings that is 100% effective at this time. PM me for the details, or read this post [webmasterworld.com] (member's only), with a solution I posted a few months ago.

Wiz

incrediBILL

5:47 pm on Jul 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wiz,

Took a look, not bad, but if you anchor Mozilla and Opera you'll clean up some other crud as well as these two typically don't float. Try "^Mozilla/" and "^Opera/" which should let all Opera in as it's either the first thing in the UA as I showed, or Mozilla is, so you're covered either way.

Wizcrafts

6:26 am on Jul 25, 2006 (gmt 0)

10+ Year Member



Hey Bill; thanks for that tip. I'll try it out.
BTW: Love your blog!

Wiz