Forum Moderators: phranque

Message Too Old, No Replies

Regex help for blocking linko/0.1

         

heini

2:30 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, for some reason (okay actually it's limited regex capabilities) I can't seem to be able to get this bot
linko/0.1 libwww-perl/5.65
banned in my htaccess.
Whatever syntax I make my rewrite rule, it does not work.
Spent two fruitless hours now, so surrender...go ask the experts.

Birdman

2:39 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello heini,

I'm no expert, but based on toolman's Perfect .htaccess ban list [webmasterworld.com], this is what I came up with.

Or, you can just wait for the real experts to drop in :)

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^linko/0.1
RewriteRule ^.* - [F]

brotherhood of LAN

2:40 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^linko\/0\.1\slibwww\-perl\/5\.65

have a stab using that while the experts come ;) ....

lil' different from the PHP regex im used to i think, but AFAIK .'s and someo ther characters will need to be escaped.

heini

2:47 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, guys!
Birdman, that is one of the first things I tried. I have a full list of banned bots, but that one keeps escaping me.
BOL, - no. Don' ask me why, but it doesn't work either.

brotherhood of LAN

3:05 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tried this one...got a 403

RewriteCond %{HTTP_USER_AGENT} ^linko/0\.1.libwww-perl/5\.65

Usually in PHP I use \s to match space, just replaced that with the meta dot

jdMorgan

3:14 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "minimalist" approach would be something like this:

RewriteCond %{HTTP_USER_AGENT} ^linko/0\.1\ libwww-perl/5\.65$
RewriteRule .* - [F]

However, that wont't work as soon as the revision levels get bumped, so to future-proof it, I'd use:

RewriteCond %{HTTP_USER_AGENT} ^linko/[^\ ]*\ libwww-perl/
RewriteRule .* - [F]

(Note the omitted end-anchor)

You might also have a custom 403 page, and any User-agent should be allowed to read robots.txt if you have an entry that covers that UA, so let's not block in those cases:


RewriteCond %{HTTP_USER_AGENT} ^linko/[^\ ]*\ libwww-perl/
RewriteRule !^(custom403\.html¦robots\.txt) - [F]

The custom 403 case can be important; If you have a custom 403 page and don't provide for access to it in your blocking rules, you can get into a server loop because the custom 403 page itself is forbidden!

Rememeber to replace those broken vertical pipe characters with solid ones!

Jim

heini

3:41 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, thanks both, works like a charm.

Now, for improving my understanding, what if I would try to stop just "linko"?
That's what I tried first, as I can't imagine a legitimate use of that string in a UA.

Okay, just answered my question:
RewriteCond %{HTTP_USER_AGENT} ^linko/ [OR]

Thanks again.

Birdman

3:48 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>what if I would try to stop just "linko"?

This worked for me, heini:

RewriteCond %{HTTP_USER_AGENT} ^linko.*

jdMorgan

3:52 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Birdman

There is no need to add ".*" to the end, since there is no "$" end-anchor.

"^linko/ [OR]" is equivalent to "^linko/.* [OR]" in every way, except that it is processed faster.

Jim

Birdman

4:35 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks jd, I know I should stay out of these regex threads, I just can't help myself ;) Maybe one day I'll actually get one right :)

Thanks again

jdMorgan

5:06 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Birdman,

No, keep at it - there are a lot of these posts and more hands make for easier work!

Thanks,
Jim