Forum Moderators: DixonJones
For example, these are common to most all of the new sites: 66-194-6-67.gen.twtelecom.net. Change the last 2 digits of the IP to these: 68 71 72 73 74 75 76 77 78 79 80 81 83 and 84. There are at least these 15 gen.twtelecom.net IP's frequenting the new sites during the early stages of the website.
Strangely, those visits all consist of 1 or 2 hits spread out over the first few weeks online and the same small KB usage of 37.08 KB per visit.
Since the sites are new and usually get little in the way of other traffic these visits really stand out in the logs. Why they occur is becoming somewhat disturbing and even intimidating to us.
Often times during the first week the site is online most all the visists are from these IP's, sometimes say 7 out of 10 visits or say 4 of 5, or perhaps the first 8 hits from anyone else except our own (spread out over say 8 of the past 10 days with the 8 hit example).
These mysterious hits are not from us as our visits and IP is identified seperately in the Hosts Report and we do not use twtelecom anyway.
It appears that after the first week or two online those IP's no longer are obvious in the server host logs for some odd reason. That could be related to the fact the sites tend to pickup more traffic and as a result perhaps the twtelecom hits are not that obvious after a while.
Another oddity is this happens with from what I can tell the majority of new websites we put online but not all of them (some of which never get a hit from those 15 IP's), which also seems strange.
Does anyone know or even guess as to what this could be as we are worried perhaps it is some kind of adware, virus or spybot or something like that, perhaps residing in our server. We do not have spies or virus on the local PC which is scanned often for them but the colocated dedicated server is not scannned as far as we know by the colocated firm as we were always told a sever can not get a virus? Thanks.
How would they even know they exist? I see gen.twtelecom.net visits the same day (even within minutes) when a new site first starts resolving on the web, and long before they get any other traffic or the SE's index them.
In addition to all the visits to our newly created sites, today I see in my hosts logs that twtelecom is also visiting some of my old websites (1 site with a number of new visits has been online for 5 yrs) too with frequent visits from that same cluster of IP's mentioned in post 1.
This is very confusing and also quite disturbing. Anyone else comment on this?
On the two occasions (out of thousands and thousands and THOUSANDS of hits) a real person replied, each was using a work network. So I banged their specific IPs to let them in:
RewriteCond %{REMOTE_HOST} ^[^.]+\.gen\.twtelecom\.net$
RewriteCond %{REMOTE_HOST}!^specific-numeric-IP-1\.gen\.twtelecom\.net$
RewriteCond %{REMOTE_HOST}!^specific-numeric-IP-2\.gen\.twtelecom\.net$
RewriteRule ^.*$ [numeric.redirect.address...] [R,L]
(Apologies to mod_rewrite expert jdMorgan for possibly poor form but that works for me:)
Unfortunately, because Time Warner Telecom is so HUGE, it's not practical to killfile 66.194.6.0/255 via our firewall. But at least whatever/whomever is behind the robotic hits is no longer getting at my most content-rich IPs.
I would love to find out who or what is behind an obviously huge, frequent, intrusive and tenacious effort that's usually unrelated, at least on my IPs, to real-time visitor activity.
Each webmaster has to decide whether the bandwidth Websense consumes is 'worth it' in regard to the number of visitors that Websense may block from your site if you block them or redirect them. Since there is no way to tell how many visitors you might lose, it's rather a toss-up.
---
Where possible, test %{REMOTE_ADDR} and IP addresses or ranges, rather than using %{REMOTE_HOST} and hostnames. Testing %{REMOTE_HOST} forces your server to request a reverse-DNS lookup for each HTTP request, negatively affecting your server's performance, and also introduces a performance and reliability dependency of your server upon the DNS server that it uses. No request to your server can proceed until after the reverse-DNS query is satisfied, and if the DNS server is slow or fails, then your server will also be slow or fail.
If you must test %{REMOTE_HOST}, then use any request characteristics you can think of to disqualify as many requests as possible from testing %{REMOTE_HOST}. For example, if you notice that these guys request pages and not images, add a RewriteCond to disqualify images. Combining both of these recommendations with the previously-posted example, we get something like this:
RewriteCond %{REMOTE_ADDR} !^specific-numeric-IP-1$
RewriteCond %{REMOTE_ADDR} !^specific-numeric-IP-2$
RewriteCond %{REMOTE_HOST} \.gen\.twtelecom\.net$
RewriteRule !\.(gif¦jpg¦png)$ http://numeric.redirect.address/index.html [R,L]
Change all broken pipe "¦" characters above to solid pipe characters before use; Posting on this board modifies them.
Jim
Second, sometimes I see visits from these IP's almost instatly after the site is created. There have been times when we newly reg'd a domain say 1 hr ago (the new fast propogation can do that) and within 1 more hr our small temporary website already resolves on the web. Amazingly there are immidiate visits from one or more of those IP'S.
That raises the obvious question as to how in the world would Websense know we reg'd the domain and also know we created a new site right away as they do?
BTW, I googled some of the IP's and te gen.twtelcom.net name and see lots of references to them, including many suspected spam or black hole reports but not sure what to make of it all as it is quite confusing.
Or maybe your domain registrar gets $0.05 for each new registration they tell WS about. Or maybe it's your hosting provider, or even your own ISP triggering WS to visit. It could be anyone.
Websense's services are purchased by corporations and by ISPs to filter content and provide 'security' -- That means when a user requests a page from your site, it may trigger an instant request from WS, with the requested content "passing through" WS to the client. Or it may trigger WS to queue a request for that page or even your whole site for an audit at a later time. It depends on whether they're providing real-time filtering or simple black- and white-listing.
Why you? Because you're only looking at your logs. I'll sticky you my logs too, if it'll make you feel any better, but this is happening to pretty much everybody. Websense thinks they're doing a stand-up job of keeping their filters current and their customers well-taken-care-of, and Webmasters typically feel they're a nuisance and a drain on bandwidth. Block 'em if you think it's a good idea -- or don't.
One of the sites I look after is somewhat controversial -- some people might like it and some might not, and those that don't like it might want to block it. So, I theorize that that is the reason that this one site gets more-frequent attention from WS that some of the others, but I see WS requests on all of the others as well.
Jim
On some sites I see as many as 50 to 100 visits from those IP's. In fact I noticed the more traffic the site gets the more visits from gen.twtelecom.net, whereas smaller sites normally have fewer visits.
The new sites with low traffic (or even no traffic) make their visits extremely obvious, sometimes accounting for most all the visits to the new sites.
What continues to baffle me is why Websense would be so interested in my sites as I am a relatively small player as far as total traffic goes (even though I do have a lot of small websites)?
Why in the world would Websense be willing to pay anyone 5 cents (or whatever) to get notified by someone we put a new site online, what possible reason would there be for it to be worth their money and time?
Because they consider it their business to know what is on any site that any user of any ISP (or any employee of any comapany) that they sell their services to might request. As I stated above, it is likely that they fetch pages from a site for two reasons; First, because it comes up as a new domain, and second, because a user of one of their customers visits the site. If your ISP uses their services, for example, then your own visits to your web site would trigger a WS 'scan.'
You're not seeing anything that I haven't seen on new or small sites. Your description of their activity sounds entirely normal to me from what I've seen on my own new and small sites. I'd personally be much happier if they would reduce the number of their requests to my sites -- I think they're significantly over-doing it -- but I'm reluctant to ban them because I do not know how much of a customer base they represent. Other webmasters ban them without a second thought. Again, this is a decision that every webmaster has to make for him or her self.
Jim
Becoming quite concerned about this so we have reported the ip's below to datashaping dot com / blacklist.shtml who's website say's they will investigate before they are put on the black list.
66.194.6.67
66.194.6.68
66.194.6.71
66.194.6.72
66.194.6.73
66.194.6.74
66.194.6.75
66.194.6.76
66.194.6.77
66.194.6.78
66.194.6.79
66.194.6.80
66.194.6.81
66.194.6.83
66.194.6.84
123-45-678-9.gen.twtelecom.net
My server has "HostnameLookups on" so it's easier to work with a REMOTE_HOST in .htaccess rather than an ADDR. (Aside: Most of my visitors' hosts don't include a Canonical string and sometimes it's hard to figure those out, even with whois and/or DNS lookups. ANYway...)
1.) Is it even possible to rewrite .gen.twtelecom.net (or any) unwanted 'visitors' when the variable(s) are hyphenated, such that a range can be isolated? E.g. --
RewriteCond %{REMOTE_HOST}^123-45-678-[(0-255)]\.gen\.twtelecom\.net$
(I realize that won't work but neither would a wildcard.)
2.) Where available, is REMOTE_ADDR always favored over REMOTE_HOST when rewriting?
Thanks in advance for your reply.
Almost immidiately visits somehow started on Nov 18 from these IP's, with only 3 visits from other places since then apparently because the site is not listed in SE's yet and has no traffic source.
What in the world is going on? I know Jim tried to explain it but I still fail to understand why Websense.com is so interested in so many of my sites, which are often very small and unimportant players like this one.
Reprinted below is the Hosts Report Nov/Dec. Visits arranged by my stats program in order of bandwidth used, not date order.
We see very similar hosts reports on many other sites we run. Any more opinions on this?
66-194-6-68.gen.twtelecom.net 15 Dec 2005 - 05:36
66-194-6-75.gen.twtelecom.net 14 Dec 2005 - 20:27
66-194-6-74.gen.twtelecom.net 09 Dec 2005 - 02:45
66-194-6-80.gen.twtelecom.net 05 Dec 2005 - 13:45
66-194-6-83.gen.twtelecom.net 16 Dec 2005 - 08:17
66-194-6-73.gen.twtelecom.net 02 Dec 2005 - 09:59
66-194-6-84.gen.twtelecom.net 10 Dec 2005 - 12:35
dex-252-16.dxi.net 1 1 14.83 15 Dec 2005 - 00:26
66-194-6-70.gen.twtelecom.net 13 Dec 2005 - 00.45
gen.twtelecom.net 27 Nov 2005 - 17:02
66-194-6-68.gen.twtelecom.net 24 Nov 2005 - 05:37
66-194-6-73.gen.twtelecom.net 25 Nov 2005 - 02:21
66-194-6-76.gen.twtelecom.net 28 Nov 2005 - 13:54
64.124.85.79.become.com Nov 2005 - 13:49
66-194-6-12.gen.twtelecom.net Nov 2005 - 18:53
66-194-6-71.gen.twtelecom.net Nov 2005 - 01:30
66-194-6-83.gen.twtelecom.net 26 Nov 2005 - 19:33
66-194-6-74.gen.twtelecom.net 20 Nov 2005 - 05:07
38.118.42.36 1 1 570 Bytes 21 Nov 2005 - 04:58
66-194-6-84.gen.twtelecom.net 25 Nov 2005 - 22:55
66-194-6-78.gen.twtelecom.net 18 Nov 2005 - 14:01
66-194-6-80.gen.twtelecom.net 30 Nov 2005 - 12:17
66-194-6-70.gen.twtelecom.net 19 Nov 2005.
P.S. Anyone know why the IP's use hyphens instead of dots?
66-194-6-11.gen.twtelecom.net
66-194-6-12.gen.twtelecom.net
66-194-6-2.gen.twtelecom.net
66-194-6-68.gen.twtelecom.net
66-194-6-70.gen.twtelecom.net
66-194-6-71.gen.twtelecom.net
66-194-6-72.gen.twtelecom.net
66-194-6-73.gen.twtelecom.net
66-194-6-74.gen.twtelecom.net
66-194-6-75.gen.twtelecom.net
66-194-6-76.gen.twtelecom.net
66-194-6-77.gen.twtelecom.net
66-194-6-79.gen.twtelecom.net
66-194-6-80.gen.twtelecom.net
66-194-6-81.gen.twtelecom.net
66-194-6-83.gen.twtelecom.net
66-194-6-84.gen.twtelecom.net
(Listed quasi numerically by Host, courtesy of my stats program.)
Clearly these guys are seemingly everywhere, all the time. Google the following and you'll see:
"66-194-6-"
Thing is, none of us may ever know what's going on, or precisely who/what's behind the mystery, let alone why. But if you're looking for a solution of sorts, I guess what you opt to do will depend on how much the hits impact or annoy you.
Again, like you, I was seeing 'them' a LOT and they were clearly automated. Because they never asked for robots.txt, I decided to redirect them away. (Sure beats obsessing about them:) And as Jim already mentioned, others block them, and still others simply ignore them.
All the best with whichever option(s) you choose.
Did you notice the IP addresses use hyphens and not dots as you would expect?
It appears the numbers are NOT really IP's but are in fact sub-domain names. It seems odd that whoever is behind this wants to fool everyone into thinking they are IP addresses when they apparently are not?
Anyone else got some thoughts on all this?
This is normal.
Websense is an internet content filtering service. As such, they feel they need to know what is on *all* web sites.
They will scan all domains -- and possibly even all IP addresses -- as often as they feel necessary.
They don't care if you are Joe-Bob's Bait Shack or Amazon.com -- They want to know what's on your site.
Companies and ISPs pay them to block 'bad' sites, and they don't want to miss any.
In short, you're seeing a well-known company doing what they've historically always done to pretty much *all* sites on the Web, and there is no mystery at all. It's like posting that a very weird bright ball rises in the sky east of my house every day, and why is it shining on me, and how come it sneaks away and hides when I'm sleepy, and why does it hurt my eyes if I try to examine it carefully? Sorry, but that's why there's little interest in this subject, except as regards the decision to block them to limit bandwidth or to allow them so that they don't block your site from viewing by their clients or their clients' clients.
Clearly only one web segment can make an easy decision: If you are sure that a content filtering company would block your web site because of its content, then go ahead and block these guys to save some bandwidth. Otherwise, it's a business decision, trading bandwidth for wider exposure of your site.
If you'd like to do some research to make this thread valuable, try to find out (by searching) how many customers they have, and what number of end-users they claim to 'protect.' Maybe looking through public financial (e.g. Securities Exchange) filings or marketing material might turn up some of this info. If you can find out how 'big' they are, then that can serve as a guideline to decide how important it is to allow them to examine your site and be listed as "OK" in their database.
Jim
What confuses me more than anything is why they need to look at the sites so disturbingly frequently (especially my new and very small sites) to such a degree they become the #1 traffic source to so many of my newly established websites?
Perhaps a bright spot is that if I ever decide to sell the domain it appears to a buyer to have much more unique visitor traffic than it really does (of value).