Forum Moderators: DixonJones
Our website hit report lists the second largest hit coming from
66-194-6-78.gen.twtelecom.net. - for the last two months.
I think it is Time Warner - so is this coming from aol?
Any help appreciated.
Your hits are comping from the company, but not from the ISP. As whois shows:
OrgName: Time Warner Telecom
OrgID: TWTC
Address: 10475 Park Meadows Drive
City: Littleton
StateProv: CO
PostalCode: 80124
Country: US
66.***.6.73 - - [06/Dec/2003:12:04:23 -0500] "GET / HTTP/1.1" 403 916 "-" "Mozilla/5.0 (compatible; Konqueror/3.1-rc4; i686 Linux; 20020810)"
66.***.6.70 - - [06/Dec/2003:12:26:28 -0500] "GET /index.html HTTP/1.1" 403 916 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312469)"
Plonk.
Jim
That's a new IP address range for an old pest that typically starts out with a User-Agent of "Konqueror" and then switches to "MSIE" on a nearby IP address if that attempt doesn't work. Both user-agents are spoofs.
Why would anyone want to spoof that? I've heard reports of people doing it, but I never really understood why anyone would even bother.
Some webmasters ban harvesters by user-agent name instead of by individual IP addresses. So, spoofing is an attempt to look like a human using a legitimate browser in order to bypass user-agent blocking.
However, if the harvesting company switches user-agents, you have to block 'em by individual IP address or IP address range.
sgkohler,
I haven't looked up Sqworm recently, but it's blocked from accessing my sites. And as stated previously, I don't allow access from that 66.194. IP address range, either.
The main point of this is to prevent "expeditionary incursions," where a harvester will come in and grab a few pages. It will then build a site map of your site, and comes back to download the whole thing looking for e-mail addresses. Whether it finds any or not, it's a huge waste of bandwidth and clutters up the log files.
So, among others, the following types of visitors are not welcome around here:
All of these 'services' have several things in common: They feel free to download your entire site without regard to robots.txt restrictions, and often use a false or variable user-agent name to escape detection. Then they gather data that is either sold to competitors or to spammers, or to firms looking to make a buck by offering to sue you for mentioning someone's brand name. In most cases, the harvesters charge someone a fee to do this. So they try to sneak in, steal your bandwidth, costing you money, slow your server down, costing you legitimate visitors, and then make money off your losses. I don't think much of that, so I just serve them nice, short 403-Forbidden pages. Maybe I'm just cranky.
Jim