homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
63.148.99.247
palmpal




msg:395853
 7:15 am on Jan 11, 2003 (gmt 0)

Is this a good bot to have visit?

Thanks!

 

Key_Master




msg:395854
 7:44 am on Jan 11, 2003 (gmt 0)

It belongs to Cyveillance.com

It's a spybot and you should ban it.

SetEnvIf Remote_Addr ^63\.148\.99\.(22[4-9]¦2[3-5][0-9])$ ban

pendanticist




msg:395855
 9:55 am on Jan 11, 2003 (gmt 0)

It's a spybot and you should ban it.

I can't attest to the validity at the moment due to that IP Number timing out after running it thru CMD as a ping. SpamCop resolves it to Qwest. That's the best I can find.

Cyveillance's Unauthorized Distribution Solution helps companies actively prevent revenue leakage and brand dilution by identifying unauthorized partners and product distribution. By identifying sites diverting revenues and damaging reputations, Cyveillance arms companies with the intelligence they need to mitigate this common risk.

However, if it is indeed Cyveillance's bot, then I'd have to disagree with the assessment of this being a bot in need of banning. At least from what it says above, which is a quote from www.cyveillance.com/web/solutions/unauth_dist.htm

For a site whose content is 100% above board, being visited by a bot of this nature should be no-problem.

Perhaps it's a perspective thing.

Pendanticist.

wilderness




msg:395856
 1:10 pm on Jan 11, 2003 (gmt 0)

<snip>Perhaps it's a perspective thing.>

pendanticist,
It has been discused here often.
The decision is left to each webmaster to decide how their own resources are used.

Cyveillance collects a fee for it services while using your/my resources and bandwidth. Not compensating either of us in the process.
It's no different than somebody deep-linking to your images or duplicating your pages.
Not a hard concept to accept, no matter what your perspective is?
Unless your a Cyveillance employee and host of other such bots employees?

palmpal




msg:395857
 1:44 pm on Jan 11, 2003 (gmt 0)

My host has an option called IP filtering. Is this where I ban the IP? Do I need to also modify my robots.txt and how?

Thanks!

pendanticist




msg:395858
 2:09 pm on Jan 11, 2003 (gmt 0)

Cyveillance collects a fee for it services while using your/my resources and bandwidth. Not compensating either of us in the process.

Just so we don't get too far afield here. How is it that when I run the IP Number (the only data provided so far) thru SamSpade or SpamCop all I get is Qwest? I've not seen any kind of UA string posted, just an IP Number. See where I'm going here?

Looking at this thread chronologically, that's all I see: a question about an IP Number and a response urging banning.

It's no different than somebody deep-linking to your images or duplicating your pages.

I think that's the concept of these type of bots. To find those who steal or otherwise infringe upon ones intellectual property rights (et., al.). Deep-Linking is an entirely different issue.

Let's take a hypothetical: If your content was duplicated, how else would you expect to find the perpetrator (in a timely fashion), if not for using a bot? Wait for someone, or you to 'discover' that infringement? <-Rhetorical Questions.

Not a hard concept to accept, no matter what your perspective is?

I recognize that you have a differing opinion on this matter, yes.

Unless your a Cyveillance employee and host of other such bots employees?

Not sure if there's a question here, <shrug> but I'll address it anyway. For the record, not only am I unemployed, my domain doesn't even set cookies much less have anything to do with bots.

What I'm opposed to are blanket, categorically declarative statements that don't provide enough supportive information which would allow folks to make up their own minds, other that to say: ban it based soley on an IP Number.

One might argue those who have the most to hide, will take the strongest measures to hide what they have.

Pendanticist.

wilderness




msg:395859
 2:25 pm on Jan 11, 2003 (gmt 0)

I despise tearing mails/responses apart in such a manner.
As a result I'll make these separate.

First I rarely use trace or ping. They don't provide enough information.
A Whois/Arin Search results in:
Qwest Communications NET-QWEST-BLKS-2 (NET-63-144-0-0-1)
63.144.0.0 - 63.151.255.255
Cyveillance QWEST-63-148-99-224 (NET-63-148-99-224-1)
63.148.99.224 - 63.148.99.255
end of quote.
Quest is the backbone provider for Cy.
This has been provided here before in this forum repeatedly.
It can be found in the archive search.

So the Qwest "not sure" Cy connection is NOT valid.

pendanticist




msg:395860
 2:29 pm on Jan 11, 2003 (gmt 0)

My host has an option called IP filtering. Is this where I ban the IP? Do I need to also modify my robots.txt and how?

I'd say you could use the IP filtering you mentioned, yes.

Keep in mind you can ban by IP in .htaccess too, if you have those capabilities. Individual host server services vary considerably.

As for robots.txt, I think you need bot names, not just IP Numbers.

In any case, here's a couple of good thread to get you started:

[webmasterworld.com...]

[webmasterworld.com...]

[webmasterworld.com...]

Pendanticist.

pendanticist




msg:395861
 2:33 pm on Jan 11, 2003 (gmt 0)

So the Qwest "not sure" Cy connection is NOT valid.

I stand better informed.

:)

Pendanticist.

wilderness




msg:395862
 2:33 pm on Jan 11, 2003 (gmt 0)

<snip1>I think that's the concept of these type of bots. To find those who steal or otherwise infringe upon ones intellectual property rights (et., al.). Deep-Linking is an entirely different issue.>

The concept of these bots is to use YOUR $$$ to be compensated by their customers for a service provided. It has nothing to do with the legitimacy of your content.

<snip2>Let's take a hypothetical: If your content was duplicated, how else would you expect to find the perpetrator (in a timely fashion), if not for using a bot? Wait for someone, or you to 'discover' that infringement? <-Rhetorical Questions.>

I surely wouldn't use services of CY. Instead I'd use my own method of accumaulating data and evidence from within my logs. It's obvious to me that you are not as adept at this as I.

wilderness




msg:395863
 2:39 pm on Jan 11, 2003 (gmt 0)

<snip>Not sure if there's a question here, <shrug> but I'll address it anyway>

I was being sarcastic. The result of my perception of your viewpoint.

wilderness




msg:395864
 2:44 pm on Jan 11, 2003 (gmt 0)

<snip>What I'm opposed to are blanket, categorically declarative statements that don't provide enough supportive information which would allow folks to make up their own minds, other that to say: ban it based soley on an IP Number.>

In the beginning when I first began denying IP ranges I would use the lowest ranges possible from that probe. The result (over and again) was a return by the bot with a major spidering of something that should have been resolved by denying a larger IP range intially.
EXPERIENCE and my kindness taught me otherwise. :-(

pendanticist




msg:395865
 2:45 pm on Jan 11, 2003 (gmt 0)

I surely wouldn't use services of CY. Instead I'd use my own method of accumaulating data and evidence from within my logs.<snipo>

You own methods from within your logs? Please elaborate: How would you know I've copied your content from your logs? What logs are you referring to?

<snipo>It's obvious to me that you are not as adept at this as I.

Be nice. :) Otherwise I might get the impression you're becoming condescending and I wouldn't want to think that.

I was being sarcastic. The result of my perception of your viewpoint.

LOL, you did it well, my friend. :)

Pendanticist.

pendanticist




msg:395866
 2:48 pm on Jan 11, 2003 (gmt 0)

In the beginning when I first began denying IP ranges I would use the lowest ranges possible from that probe. The result (over and again) was a return by the bot with a major spidering of something that should have been resolved by denying a larger IP range intially.
EXPERIENCE and my kindness taught me otherwise. :-(

Interesting concept, lower ranges I mean. Never thought of doing it that way.

Pendanticist.

wilderness




msg:395867
 2:48 pm on Jan 11, 2003 (gmt 0)

<snip>One might argue those who have the most to hide, will take the strongest measures to hide what they have.>

hide/protect?

The preception depends entirely on what side of the fence your standing on and the purpose and/or audience of your wesbite!

pendanticist




msg:395868
 2:57 pm on Jan 11, 2003 (gmt 0)

<snip>One might argue those who have the most to hide, will take the strongest measures to hide what they have.>

hide/protect?

Sure. If a site hasn't pilfered anything, that site has nothing to hide, nor to protect from discovery. Such to say, If I had pilfered material residing on my site, I'd take measures to ban the bots that are out there seeking such pilferages too. Then again, noindex, nofollow might be a better way to go, eh?

The preception depends entirely on what side of the fence your standing on and the purpose and/or audience of your wesbite!

That's exactly what I'm getting at.

Pendanticist.

wilderness




msg:395869
 3:27 pm on Jan 11, 2003 (gmt 0)

<snip>You own methods from within your logs? Please elaborate: How would you know I've copied your content from your logs? What logs are you referring to?>

There are three questions here. Although the first two are one in the same.
The third question is visitor logs.

This could get complicated and extend very fast outside the paramters of this forum. However I'll attempt to provide a parallel using WIDGETS :-)

My interest is in Bwidgets. Six plus years ago my internet activity began in a very large (500+ participant) email forum on Bwidgets. Three plus years ago I created my own email forums. Then a few months afterwards websites to support those newly created forums and add better depth to Bwidgets on the entire internet.
The overall goal is two-fold. This process is to create a domination of the Bwidget "internet" market. Not in the production of Bwidgets instead services to Bwidgets on the internet. While providing resources and promotion of the Bwidget market for its very narrow users.

The Bwiget market is very narrow. It requires extensive knolwege about most every Bwiget on the internet. As it should be.
Keeping abrest of every new Bwidget site or product on the internet is gained from chasing down many things/leads. Among those sources are the referring pages, referring searches (by perhaps adding some keywords the user omitted,)
and the previously mentioned email forums.

It's a very simple task to inject typographical errors in content which only I would be aware of.

In summary there is not much that goes on in the Bwidget internet world that I'm not aware of. All these tools together provide the necessary insight.

pendanticist




msg:395870
 5:04 pm on Jan 11, 2003 (gmt 0)

Your approach certainly is very situationally specific.

Pendanticist.

wilderness




msg:395871
 5:40 pm on Jan 11, 2003 (gmt 0)

<snip>very situationally specific>

Why are Bwidgets any different from Widgets and Swidegts?

Aren't all webmasters regardless of their product looking to narrow the internet to their particular market share rather the entire internet?

pendanticist




msg:395872
 6:04 pm on Jan 11, 2003 (gmt 0)

Aren't all webmasters regardless of their product looking to narrow the internet to their particular market share rather the entire internet?

Look, you're basing your opinion on an assumption that all webmasters own commercial sites.

I think now would be a good time to agree to disagree and put this issue to bed. That's what I'm doing.

Pendanticist.

Key_Master




msg:395873
 8:45 pm on Jan 11, 2003 (gmt 0)

Cyveillance's ability to use its technology to study competitors provides detailed insights into competitive branding and partnering strategies.

[cyveillance.com...]

Cyveillance can also be hired by your competitors to spy on you. Their bots do not identify who they are and they do not follow robots.txt. They try to evade attention furthur by leasing IP blocks that will not resolve back to them. They will eat up your bandwidth and leave you with the bill. You have absolutely zero to gain from allowing them to spider your site.

So, who else besides pendanticist thinks this is a "good bot to have visit"?

pendanticist




msg:395874
 9:15 pm on Jan 11, 2003 (gmt 0)

Hey Key_Master, one of the mods broke the url in my post earlier today so as to prevent referrers showing up in Cvs logs. Perhaps you'd like to do the same.

However, if it is indeed Cyveillance's bot, then I'd have to disagree with the assessment of this being a bot in need of banning. At least from what it says above, which is a quote from www.cyveillance.com/web/solutions/unauth_dist.htm

See the bold text above?

Cyveillance can also be hired by your competitors to spy on you. Their bots do not identify who they are and they do not follow robots.txt. They try to evade attention furthur by leasing IP blocks that will not resolve back to them. They will eat up your bandwidth and leave you with the bill. You have absolutely zero to gain from allowing them to spider your site.

If you'da said all that in your initial response much of this could have been avoided. As with anything in life, the more information one has the better informed one can be. Capish?

Pendanticist.

Romeo




msg:395875
 9:31 pm on Jan 11, 2003 (gmt 0)

My server got several visits by that address (and others from the same net) during the last months. It pretened to be a "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" UA, but is infact a rude bot spidering lots of pages in a very short time.
But since it was in such a hurry that it had no time to check my /robots.txt first, it just ran into a bot trap shortly after ... hehe.

A whois lookup on the address says "Cyveillance QWEST-63-148-99-224 ...".

I don't care much about Cyveillance. It just got banned here due to its antisocial behavior.
I don't have to hide anything on my sites, but I don't like aggressive, rude and ignorant bots (aka bad bots) eating up my bandwidth and my paid traffic.
We all really don't need this.

R.

Key_Master




msg:395876
 9:37 pm on Jan 11, 2003 (gmt 0)

If you'da said all that in your initial response much of this could have been avoided.

Why? Because I wrote it or is it because you researched Cyveillance on your own and came to the same conclusion?

pendanticist




msg:395877
 9:46 pm on Jan 11, 2003 (gmt 0)

Why? Because I wrote it or is it because you researched Cyveillance on your own and came to the same conclusion?

  • The bot has never visited me.

  • I didn't 'research' a thing. In fact, I see no posted log files backing up anything said in this entire thread.

  • My concerns from the beginning were declarative sentences without supportive material. Hell, they still are.

  • Having said that, I still would have taken you for your word.......initially.

    Is there going to be a point to this?

    Pendanticist.

  • Key_Master




    msg:395878
     11:56 pm on Jan 11, 2003 (gmt 0)

    The point being, do your own research. I identified the IP as belonging to Cyveillance.com and suggested a course of action. It's up to each individual Webmaster to follow up on the lead, do the research, and make the final decision.

    jdMorgan




    msg:395879
     12:59 am on Jan 12, 2003 (gmt 0)

    My friends...

    This is an easy one for me... Respect robots.txt, or eat a 403! (carfac loves it when I say that.)

    Cyveillance is a pain in the bandwidth... They come back too often, disregard robots.txt, ban themselves by requesting bad-bot trap files, and use bandwidth that I pay for to do all this mickey-mouse and sell the results to their clients. Even if they didn't pester my logs so often, I'd ban them because their 'bot is so badly written - I got tired of all the "pollution" they injected in my error logs.

    Anyone who wants to search my site for infringment of their copyrighted material or intellectual property is welcome to do so - Using a browser and a search engine. May I recommend Google? We're quite well-indexed. :)

    Jim

    andreasfriedrich




    msg:395880
     1:33 am on Jan 12, 2003 (gmt 0)

    Wow, quite an interesting read.

    If a site hasn't pilfered anything, that site has nothing to hide, nor to protect from discovery.

    This is one of the worst kinds of reasoning I have ever heard. While it is widely accepted in the general public (which is quite amazing since this is a reasoning employed by totalitarian systems1) the courts havenīt followed that stressing privacy instead.

    You do not need to have anything to hide to stress and enforce your privacy rights.

    Andreas


    1 To be sure Iīm not calling you totalitarian pendanticist and I do understand your reluctance to heed any advice given without a whole lot of facts, reasons and explanations.
    pendanticist




    msg:395881
     10:59 am on Jan 12, 2003 (gmt 0)

    The one thing about living in a free country is everyone (no matter how diverse the philosophical belief), is entitled to their own opinion.

    mgream




    msg:395882
     9:17 pm on Jan 13, 2003 (gmt 0)

    Any user agent that fails to honour robots.txt and accesses my site contrary to acceptable terms that I outline is subject to having its access revoked: entirely fair.

    If any organisation wants to engage in covert collection of website information, then they should do it properly and not in a half baked manner :-). I don't have any IPR to hide, but I do have a limited bandwidth budget to manage.

    If you search for cyveillance, you will find an article in a popular business magazine with the title "Helping to Keep the Web Honest". This is hypocritical.

    Matthew.

    This 39 message thread spans 2 pages: 39 ( [1] 2 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved