libwww-perl/5.812

Forum Moderators: open

Message Too Old, No Replies

libwww-perl/5.812

wilderness

10:29 pm on Aug 21, 2008 (gmt 0)

216.138.118.zz - - [20/Aug/2008:18:47:51 +0100] "GET /MyFolder/MyPage.html HTTP/1.1" 403 998 "-" "libwww-perl/5.812"

Just a heads up.

Requested the same two specific pages, six times over fourteen minutes.
No robots.txt, no images.

Although most everybody has the UA denied (in one form or another).

The subnet provider (which is the same company as the backbone) offers a specific explantion on document retrieval.

Pfui

6:37 pm on Aug 25, 2008 (gmt 0)

I've banned libwww-perl for as long as I can remember (basically any libwww, including libwww-FM). But this weekend, three libwww-perl variations showed up from a single .tele.dk-based account:

viw1219675461840484619yvmtlibwww-perl/5.801
kxd1219610210589019775qelflibwww-perl/5.801
wuw1219636009107147216fobklibwww-perl/5.801

(CPAN shows the current mod is v5.814, circa July 25, 2008.)

I don't know if that's simply a misconfigured script, or the front-end data is intentionally designed to break "^libwww-perl" rewrites, etc. FWIW

incrediBILL

7:45 pm on Aug 25, 2008 (gmt 0)

designed to break "^libwww-perl" rewrites

That's why I don't use anchors for the most part so "libwww-perl" would zap any variation on that theme.

Then again, I whitelist so any variation wouldn't pass the whitelist in the first place which is limited to googlebot, slurp, msnbot, teoma, MSIE, Firefox and Opera.

Everything else goes away.

I also post filter MSIE, Firefox and Opera for bad keywords like "crawl" or "download" or "http:" addresses and dump those as well.

The downside is mobile devices get whacked but the upside is I don't get many of them in the first place.

[edited by: incrediBILL at 7:46 pm (utc) on Aug. 25, 2008]

g1smd

11:58 pm on Aug 25, 2008 (gmt 0)

@incrediBILL

SeaMonkey and Safari are kicked to the side too?

phranque

12:38 am on Aug 26, 2008 (gmt 0)

lynx?
konquerer?
mozilla?
netscape?

incrediBILL

3:05 am on Aug 26, 2008 (gmt 0)

OK, I was being a tad brief earlier on the browser side but the 4 search engines is a finite group.

I let in Safari, Mozilla, Netscape and Konqueror pass but they only pass because they satisfy my browser filter rules.

Why would anyone let Lynx in? It's usually used by tools that strip off the HTML just to scrape the text so it's kicked to the curb.

Anyone that installs any junk that adds promotional HREFs in the UA get the boot so this month 800+ of one browser plug-in with their promotional hits went straight into the trash ;)

[edited by: incrediBILL at 4:29 am (utc) on Aug. 26, 2008]

libwww-perl/5.812

wilderness

Pfui

incrediBILL

g1smd

phranque

incrediBILL

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week