Forum Moderators: open

Message Too Old, No Replies

twtelecom.net bot identifying as Konqueror

Doesn't read robots.txt

         

abates

10:51 pm on Aug 1, 2005 (gmt 0)

10+ Year Member



I noticed that my site was hit fairly regularly last month by IPs in the range 66.194.6.XX, with a user agent of "Mozilla/5.0 (compatible; Konqueror/3.0; i686 Linux; 20020222)" and varients. Sometimes I'd get sequential hits from the same IP but with different Konqueror versions in the UA.

The hits were all to HTML pages and not followed up by hits to stylesheets or images. 301 responses were not followed. I believe this to be a bot of some kind, but I have no idea what it's collecting pages for...

[edited by: volatilegx at 1:37 am (utc) on Aug. 2, 2005]
[edit reason] removed specifics [/edit]

wilderness

3:31 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Websense.

Not sure how long that I've had the entire Class D denied.

[google.com...]

Using the search method that balam provided in Msg#10 of this thread:
[webmasterworld.com...]

abates

4:13 am on Aug 2, 2005 (gmt 0)

10+ Year Member



Aha, thanks, I'll ban away.

jdMorgan

7:02 pm on Aug 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That IP address belongs to a fairly well-know web filtering company. If you block them, it is possible that they will add your site to their blacklist.

I'm not sure how many customers they have, but it might be a considerable number. Then the impact would depend on whether the people using that filtering service fall into the demographic that your site addresses.

So, blocking them might be bad for your site, or it might not matter at all. The point is that every Webmaster should research the impact of blocking an IP range in-depth and determine the impact on their own site.

Jim

abates

9:23 pm on Aug 11, 2005 (gmt 0)

10+ Year Member



If they're running a legitamite bot, it should identify as such, and not hide behind randomised user agents. Hiding in such a way makes it appear as though they're up to something shady. I will look into this further...

jdMorgan

9:13 pm on Aug 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They're not running a bot at all. These services operate in two ways:

In some cases, they proxy all user requests, check the content, and either pass it through to the user or block it. This is fine for small services with a limited number of users where the bandwidth won't be too high.

In other cases, like this one, they function asynchronously to the user requests. They track users' requests, investigate the URLs the user requests, and then either whitelist or blacklist the sites. So, if you block the user-agent, you run the risk of blacklisting your own site.

In most cases, these services are used by corporate clients. But there are a few ISPs who offer this 'filtering' as a service as well.

I agree that the changing user-agents look dodgy, but in fact this is necessary to prevent 'bad' sites from cloaking by user-agent -- supplying innocent-looking pages to the filter, while provided their real content to users.

So, I advise caution before 'banning away,' especially if you see a lot of these requests. Think about your site's demographics, and whether it's likely that many of your visitors may be behind corporate proxies and filters, or whether your site might attract a group of people who would be likely to want to filter pages for their own or their children's use.

Jim

DanA

9:21 pm on Aug 14, 2005 (gmt 0)

10+ Year Member



I wouldn't ban them, but they not only use a fake konqueror UA but also MSIE 6 user agent.
In my case, I have more hits from their robot or link checker than from their users asking for a specific page.