Forum Moderators: DixonJones

Message Too Old, No Replies

Tracking a repetitive hit

         

spayne

5:42 pm on Dec 22, 2003 (gmt 0)



I'm new at this.

Our website hit report lists the second largest hit coming from
66-194-6-78.gen.twtelecom.net. - for the last two months.

I think it is Time Warner - so is this coming from aol?

Any help appreciated.

WebJoe

6:24 pm on Dec 22, 2003 (gmt 0)

10+ Year Member



Welcome to WebmasterWorld [webmasterworld.com], spayne

Your hits are comping from the company, but not from the ISP. As whois shows:


OrgName: Time Warner Telecom
OrgID: TWTC
Address: 10475 Park Meadows Drive
City: Littleton
StateProv: CO
PostalCode: 80124
Country: US

Elijah

6:47 pm on Dec 22, 2003 (gmt 0)

10+ Year Member



My homepage seems to get visited by: 66-194-6-78.gen.twtelecom.net every day. Does anyone know what it is? It it some sort of Time Warner spider or something?

Thanks,

Elijah

WebJoe

5:55 am on Dec 24, 2003 (gmt 0)

10+ Year Member



I don't think that anyone can tell what it is by the address it's coming from. What user agent string comes with it, does it read and obey robots.txt, does it go for all files or just html ad leaves pictures out etc. are questions that might hlp determine an answer to yours.

jdMorgan

6:30 am on Dec 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's a new IP address range for an old pest that typically starts out with a User-Agent of "Konqueror" and then switches to "MSIE" on a nearby IP address if that attempt doesn't work. Both user-agents are spoofs.

66.***.6.73 - - [06/Dec/2003:12:04:23 -0500] "GET / HTTP/1.1" 403 916 "-" "Mozilla/5.0 (compatible; Konqueror/3.1-rc4; i686 Linux; 20020810)"
66.***.6.70 - - [06/Dec/2003:12:26:28 -0500] "GET /index.html HTTP/1.1" 403 916 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312469)"

Plonk.

Jim

sgkohler

3:30 pm on Dec 26, 2003 (gmt 0)

10+ Year Member



Hello - another "newbie" here~
I have also noticed the same thing happening at my website! Here are my files if anyone cares to disipher them for me, please :-)
66.194.6.2 - - [23/Dec/2003:05:58:07 -0500] "GET /entry/entry.html HTTP/1.0" 404 17547 "-" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
66.194.6.2 - - [23/Dec/2003:05:58:14 -0500] "GET /kitchen/kitchen.html HTTP/1.0" 404 17543 "-" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
66.194.6.2 - - [23/Dec/2003:05:58:20 -0500] "GET /masterbath/masterbath.html HTTP/1.0" 404 17547 "-" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
66.194.6.2 - - [23/Dec/2003:05:58:22 -0500] "GET /outlines/outlines.html HTTP/1.0" 404 17539 "-" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
66.194.6.72 - - [23/Dec/2003:06:06:18 -0500] "GET /toddsarahkohler/familyroom/familyroom.html HTTP/1.1" 200 21591 "-" "Mozilla/5.0 (compatible; Konqueror/3.1-rc6; i686 Linux; 20020126)"
66.194.6.72 - - [23/Dec/2003:18:45:14 -0500] "GET /toddsarahkohler/index.html HTTP/1.1" 200 14367 "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc2; i686 Linux; 20021024)"
66.194.6.74 - - [23/Dec/2003:16:53:53 -0500] "GET /craft/craftroom.html HTTP/1.1" 404 17388 "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc5; i686 Linux; 20021115)"
66.194.6.74 - - [23/Dec/2003:16:54:01 -0500] "GET /playroom/playroom.html HTTP/1.1" 404 17383 "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc3; i686 Linux; 20020923)"
66.194.6.74 - - [23/Dec/2003:16:54:20 -0500] "GET /laundry/laundry.html HTTP/1.1" 404 17384 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312468)"
66.194.6.74 - - [23/Dec/2003:16:59:47 -0500] "GET /guestroom1/guestroom1.html HTTP/1.1" 404 17386 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312466)"
66.194.6.74 - - [23/Dec/2003:16:59:51 -0500] "GET /guestbath/guestbath.html HTTP/1.1" 404 17383 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312467)"
66.194.6.74 - - [23/Dec/2003:20:48:46 -0500] "GET /outlines/outlines.html HTTP/1.1" 404 17386 "-" "Mozilla/5.0 (compatible; Konqueror/3.1-rc5; i686 Linux; 20020620)"
66.194.6.75 - - [23/Dec/2003:11:07:49 -0500] "GET /masterbath/masterbath.html HTTP/1.1" 404 17384 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312468)"
66.194.6.75 - - [23/Dec/2003:14:07:54 -0500] "GET /kitchen/kitchen.html HTTP/1.1" 404 17386 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312462)"

panic

6:43 pm on Dec 26, 2003 (gmt 0)

10+ Year Member



That's a new IP address range for an old pest that typically starts out with a User-Agent of "Konqueror" and then switches to "MSIE" on a nearby IP address if that attempt doesn't work. Both user-agents are spoofs.

Why would anyone want to spoof that? I've heard reports of people doing it, but I never really understood why anyone would even bother.

jdMorgan

7:35 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



panic,

Some webmasters ban harvesters by user-agent name instead of by individual IP addresses. So, spoofing is an attempt to look like a human using a legitimate browser in order to bypass user-agent blocking.

However, if the harvesting company switches user-agents, you have to block 'em by individual IP address or IP address range.

sgkohler,

I haven't looked up Sqworm recently, but it's blocked from accessing my sites. And as stated previously, I don't allow access from that 66.194. IP address range, either.

The main point of this is to prevent "expeditionary incursions," where a harvester will come in and grab a few pages. It will then build a site map of your site, and comes back to download the whole thing looking for e-mail addresses. Whether it finds any or not, it's a huge waste of bandwidth and clutters up the log files.

So, among others, the following types of visitors are not welcome around here:

  • E-mail address harvesters
  • Web site update detection services
  • Copyright infringement detectors
  • Brand name protection services
  • Competitive information harvesters

    All of these 'services' have several things in common: They feel free to download your entire site without regard to robots.txt restrictions, and often use a false or variable user-agent name to escape detection. Then they gather data that is either sold to competitors or to spammers, or to firms looking to make a buck by offering to sue you for mentioning someone's brand name. In most cases, the harvesters charge someone a fee to do this. So they try to sneak in, steal your bandwidth, costing you money, slow your server down, costing you legitimate visitors, and then make money off your losses. I don't think much of that, so I just serve them nice, short 403-Forbidden pages. Maybe I'm just cranky.

    Jim

  • sgkohler

    7:49 pm on Dec 26, 2003 (gmt 0)

    10+ Year Member



    Thank you for the info. Now my questions would be this...I took my site offline over a month ago - this "spoofer" continues to come every single day...what is the use for them to to that? Are they just using bandwidth, 'cause there isnt anything else there!?