deepak-USC/ISI-1.0

Forum Moderators: open

Message Too Old, No Replies

deepak-USC/ISI-1.0

2nd or 3rd visit

wilderness

2:45 pm on Jul 26, 2004 (gmt 0)

to my sites

deepak-USC/ISI-1.0(+http://www.isi.edu/~ravichan/deepak-usc-isi.html)

Information Sciences Institute

volatilegx

2:52 pm on Jul 26, 2004 (gmt 0)

"I'm Deepak's Robot."

Classic

bcolflesh

2:59 pm on Jul 26, 2004 (gmt 0)

[webmasterworld.com...]

jdMorgan

3:01 pm on Jul 26, 2004 (gmt 0)

Yeah, this user-agent string sets a precedent that others should follow: It not only identifies the using organization, but actually identifies the user's full name! And the Web page does a pretty good job of describing what the collected data will be used for. Nice picture of the robot itself, too.

Deepak's example should be followed by more robot users.

Jim

Goober

3:07 pm on Jul 26, 2004 (gmt 0)

"Bill Clinton was the former President of United States."

As far as I know, he still IS.

Goobs

wilderness

4:11 pm on Jul 26, 2004 (gmt 0)

Is there any considerable difference between what this bot is doing and what is currently ongoing at :

http:// www.infoplease.com/
or
http:// en.wikipedia.org/wiki/Main_Page

As related to the content of my sites, all these two do is distract visitors with links to insufficient explanations, while waiting for somebody else to add plagarized data from my sites :( ALL the while somehow grabbing irrelavant PR's.

Even the "this day in time" type pages although enjoyable and perhaps beneficial, in most instances provide sparse data, while detracting visitors who are in most instances looking for depth.
JMHO

Here's a prime example which is nothing but keyword spamming:
http:// en.wikipedia.org/wiki/Hambletonian
The links go nowwhere in most instances while awaiting some unknowing soul to come along and copy data from another source in which credit is never provided.

or is deepak gathering data which will perhaps help SE's in the future provide some significant explantion of what the following means?
Knight Dream, p, 3, 1:59
:):)

[edited by: volatilegx at 5:55 pm (utc) on July 27, 2004]
[edit reason] delinked URLs [/edit]

jdMorgan

4:24 pm on Jul 26, 2004 (gmt 0)

I'd recommend reading his papers... He is studying semantics and extracting "meaning" from natural-language text. This kind of robot usage is "classic internet" from years ago when the internet was mostly academic and scientific. He needs pages of natural text to analyze and test his algorithms. As such, these accesses may be seen by some as "non-beneficial" traffic, but the 'bot is not malicious.

I think he did a very good job on the user-agent string and on his info Web page - Lots better than many commercial robot users provide. Call it enlightened self-interest if you like, but at least he *is* enlightened regarding the fact that Webmasters don't like "mysterious visitors with inknown intent."

Jim

wilderness

5:08 pm on Jul 26, 2004 (gmt 0)

Jim,
As a webmaster the words "university, research and grants" are items that get my dander up :( Most U's have resources and benefactor's beyond the imagination of John Q. Public and even many websmasters. Students have access to many vendues, under the theme of research, that we as webamsters do not have available to us. And yet, these vendues are filling their appetite with data from John Q. Public's resources.

The old "keebler thread" in the archives is a prime example. When I initiated that thread I was unaware of the real source of all the prodding and poking around. I later in a land-line conversation realted to an article I used as an example in one of the threads, found out that all the prodding was the result of a MAJOR corporation seeking data for their 100-year anniversary. They besiged the same Public Library-Historical Center that I retrieved some data from with a couple of dozen techs gathering data or at least what they could find (path and keywords unknown.)
All these techs getting paid good money from said corp.

To think that all this nonsense, secrecy and expense was generated from an accidental reading of my page by somebody at Keebler and how much expense was wasted when all they needed to do was to send a cordial email inquiry :)
Had I not stopped their spidering of my sites, they would have likely still been crawling. All the time at the expense of my bandwidth and my research.

I do realize that my example is extreme as related to most websites. However, severity doesn't lessen what takes place or how actions should be reacted to, at least IMO.

As previously stated many times, each webmaster must determine what is both benefical and detrimemtal for their own website and visitors.

I personally see no difference between university research and other bots who collect and provide data to 3rd parties under a fee-based umbrella.

BTW, I did read some portions from a couple of his papers and found it quite interesting, in fact, one portion which I read provide nearly a duplicate example of the question I asked at the end of the previous post.

deepak-USC/ISI-1.0

2nd or 3rd visit

wilderness

volatilegx

bcolflesh

jdMorgan

Goober

wilderness

jdMorgan

wilderness

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week