homepage Welcome to WebmasterWorld Guest from 54.167.96.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
RobotPal
lucy24




msg:4390104
 2:35 am on Nov 23, 2011 (gmt 0)

PeoplePal is apparently a legitimate add-on. But these folks are not my pals:

Mozilla/ 4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)

I just got hit by a blizzard of auto-referers,* all from different IPs. (I'll look them up in case some turn out to be from regions I don't mind blocking on general principle, but overall the IP route is not going to cut it.) Took up a huge slab of log space, but on closer inspection it all happened within a single span of less than ten minutes. Another batch with different UA came by later in the day.

Made me so riled, I'm now off to fiddle with htaccess. Auto-referer, meet auto-redirect. No more Mr. "I don't like your face" Nice Guy; they're all going to 127.0.0.1 from here on.

Final quirk: They all home in on my second-fattest page. The #1 fattest is in the e-books directory, which for some reason tends to frighten robots. Neither of the two serves any purpose except to screw up my Keyword count. Same goes for old familiar robots like my Ukrainians. There's got to be something in the robotic algorithm that tells them to go for weight.


* I don't know if there's an official term. The ones that give the requested site as its own referer.

 

Pfui




msg:4390126
 4:36 am on Nov 23, 2011 (gmt 0)

1.) Sounds like a botnet. You can chase 'em forever but it's pretty futile because they're programmed shape-changers.

What to do? Most of the time the UA's fake -- [projecthoneypot.org...] -- so block by UA with caution.

Instead, run the worst of the IPs through Project Honey Pot and see if they're compromised; chances are they will be. Then note the threat levels: >30-40 means you'll likely see them again. Block by IP/CIDR/country accordingly.

2.) When you describe things, it would really help if you include log excerpts because descriptions can be tough to decipher.

3.) FWIW, this self-referrer is fake --

http://www.yourdomainhere.com

-- and thus blockworthy as-is. Is that what you saw?

wilderness




msg:4390140
 5:44 am on Nov 23, 2011 (gmt 0)

FunWebProducts is the key here.

It's been a compromised tb for some years.

lucy24




msg:4390149
 6:55 am on Nov 23, 2011 (gmt 0)

I looked up the IPs. They came from all over the place, but the batch included about half a dozen previously unsuspected China ranges, so I blocked those on general principle.

2.) When you describe things, it would really help if you include log excerpts because descriptions can be tough to decipher.

:-p

Like this. (We don't have to protect robots' identities do we?)

119.167.225.1 - - [22/Nov/2011:08:54:46 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.0" 200 998 "http://www.example.com/fun/AlonzoMelissa.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)"
94.199.182.46 - - [22/Nov/2011:08:54:54 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.1" 200 1035 "{exactly the same}"
189.16.82.34 - - [22/Nov/2011:08:55:03 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.0" 200 1035 "{ditto}"
79.143.182.253 - - [22/Nov/2011:08:55:14 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.1" 200 1035 "{ditto}"
79.143.182.253 - - [22/Nov/2011:08:55:15 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.1" 200 1035 "{ditto}"
114.215.28.125 - - [22/Nov/2011:08:55:50 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.1" 200 1035 "{ditto}"
50.56.84.106 - - [22/Nov/2011:08:56:01 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.0" 200 794 "{ditto}"
94.199.182.46 - - [22/Nov/2011:08:56:24 -0800] "GET /fun/AlonzoMelissa.html HTTP/1.1" 200 1035 "{ditto}"


The size of the file they're aiming for is in six digits. What you're seeing here is the "I don't like your face" page that they got rewritten to. I don't understand why it doesn't come out exactly the same size every time. HTTP 1.1 is always a little bigger, but HTTP 1.0 alone generates five or six different sizes. The file itself is 716 bytes.

3.) FWIW, this self-referrer is fake --

http://www.yourdomainhere.com

-- and thus blockworthy as-is. Is that what you saw?

Yup. I don't have a global block because they're not that common overall. I check this specific file, and also the ebooks directory. The current batch of robots would have been rewritten to the same page anyway because I've got a separate routine for MSIE [56]. But the auto-referer rewrite-- which has now been replaced by a redirect-- comes earlier. The rare human might show up using MSIE 5, so I can't send them off to contemplate their navels. But no human is ever going to give the requested page as its own referer.

FunWebProducts is the key here.

It's been a compromised tb for some years.

Well, that's nice to know, because I always thought it was a stupid name. But a quick riffle through logs tells me there are still humans using it, so I can't take the easy way out.

Anyone have any bright ideas about filesize? I honestly can't think of any other factor that would send robots flocking to this one little-visited page. It has always been unduly popular with the Wrong Kind of robot.

keyplyr




msg:4390163
 7:42 am on Nov 23, 2011 (gmt 0)

From a couple years ago: [webmasterworld.com...]

Pfui




msg:4390255
 1:27 pm on Nov 23, 2011 (gmt 0)

Thanks for the additional info. Thoughts...

1.) A malicious botnet is not merely a batch of robots. Infected machines, a.k.a. zombies, are remotely activated and orchestrated, like so many sleeper secret agents in spy stories.

2.) As suspected, a sampling of your visitors are long-compromised:

Hungary Szervernet - Threat 33 [projecthoneypot.org...]
Slicehost cloud-ips.com - Threat 16 [projecthoneypot.org...]
Giga-Hosting giga-dns.com - Threat 19 [projecthoneypot.org...]

(Aside: I don't always include full IPs in posts because I routinely see individuals' machines controlled right along with the big dogs, like those in server farms.)

3.) Botnets also do not function like 'regular' robots. Your page may have a word, phrase, title, e-mail address, something, anything that makes it hit-worthy to some master program. Or your page may have landed on some master hit list, reported by zombies as a likely target to vandalize. Or -- whatever.

4.) "not that common overall" ... Why wouldn't you put the kibosh on a proven botnet tell?

lucy24




msg:4390442
 8:31 pm on Nov 23, 2011 (gmt 0)

Why wouldn't you put the kibosh on a proven botnet tell?

I generally don't like rules that force the server to stop and investigate every single page request. And my informational niches are soooo narrow that I really don't want to risk locking out even one human by mistake.

otoh, I don't know why I don't block slicehost wherever I find them. Not much to choose between them and softlayer or limestone, which I think do get blocked on sight. I've got huge slabs of Hetzner locked out, even though not all their residents are malicious. I feel guilty locking out theplanet, because everything emanating from my corner of the country has to pass through Planet to reach the rest of the internet. (This is literally true. We are connected to the rest of the world via one physical cable that runs under a landslide-prone stretch of highway.)

It may simply depend on what mood I'm in when I find the IP range ;) And I'm more likely to slap global blocks on non-English-speaking, especially non-Roman-script-using, sources.

wilderness




msg:4390447
 8:39 pm on Nov 23, 2011 (gmt 0)

I generally don't like rules that force the server to stop and investigate every single page request. And my informational niches are soooo narrow that I really don't want to risk locking out even one human by mistake.


Over time, you'll realize, that potential solitary user does not offer enough benefit to offset the open-door vulnerabilities, nor will you ever be repaid (cash or otherwise) for your compassion.

Pfui




msg:4390469
 9:15 pm on Nov 23, 2011 (gmt 0)

@lucy24: I don't like slamming the door on real people either. So I set up a reCAPTCHA Mailhide [google.com...] link in my custom 403 so real people can reach me.

lucy24




msg:4390495
 10:47 pm on Nov 23, 2011 (gmt 0)

Ooh, lovely. That's just what I needed for the "I don't like your face" page, which is specifically meant for the ones I'm not sure about.

:: wandering off to play with popup window so the color scheme doesn't clash quite as glaringly with the rest of the site ::

Pfui




msg:4390549
 1:00 am on Nov 24, 2011 (gmt 0)

GMTA... The pop-up 'version' works like a charm:)

wilderness




msg:4418330
 1:41 pm on Feb 16, 2012 (gmt 0)

lucy,
Nothing like inactivity to get a person out-of-touch.

Previously I had a mult-conditional deny based upon
the "PeoplePal" UA and then specific IP's.

These were configured to thwart some pest active on an internet forum. (I had removed them in my dissecting, however just added the custom setup back in.

Ten or more years ago Ford Motor Co, offer employees and other affiliated computers are a very low price. In addition and with the purchase the buyer was given a basic dial-up connection for $5.00 a month.
All these computer were by PeoplePC and had the PeoplePal software. In fact, I have one of these computers given by a friend with a whopping 768CPU.

It's surprising that either the machines are still being used, and/or that the affiliation has survived the past few years.
I looked at the PeoplePC websites and saw some notes about 2010, however nothing recent.
Guess if they weren't still in business, they wouldn't have a website ;)

Don

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved