new private search engine?

Forum Moderators: open

Message Too Old, No Replies

new private search engine?

jeteye.com

macrost

5:32 pm on Jul 11, 2004 (gmt 0)

Been having this "engine" spider me today. Will post IP and other details in a bit.

idoc

1:55 pm on Oct 1, 2004 (gmt 0)

surfin2u,

I understand... I was once also of the mind that the more search engines indexed your site the better etc. But you can take 90% of the bots out and lose maybe 2% of your traffic. There is *alot* of click fraud, spyware redirects, email harvesters etc. associated with many pseudo wannabe engines. Once you see your content cached and cloaked to the bots for domains that are selling casinos, adult content and bogus male enlargement vitamins etc... well it changed my perspective. You might wonder why I keep an eye on this thread then... it's just to be proactive. ;)

Again, the bot which is the thread topic... I don't know what their intentions are... They could intend to be the next google for all I know... if so maybe they might put up at least a single page saying what it is they do intend.

surfin2u

2:17 pm on Oct 2, 2004 (gmt 0)

I see your point, idoc, and I would also be bothered if someone stole my content and hosted it elsewhere, possibly with some unsavory ads.

Your point about blocking the next google is harder for me to agree with. If I were starting out building a new search engine, would I be anxious to tip my hand to world before it was ready to go live? I don't think so.

I run a regional directory site and in its early days I gathered data from a variety of sources in order to avoid starting with an empty directory. I did not alert all of the sources of my data to my intentions. All of the businesses that were quietly added to the directory (at no charge) are very happy to be there. Now the trick is to get them to start paying, but that's a topic for a different thread.

Here's an even bigger twist. I tried to get permission from a trade association to add their members to my directory and could never get a straight answer from them, either yes or no. They made their membership info public on the Internet, so that told me that I could use it. Google doesn't ask permission before adding new information to its site, so why should I?

jmccormac

1:37 am on Oct 3, 2004 (gmt 0)

If jeteye.com can't be bothered to reply to either pm or to e-mail then I can'be be bothered to let the tossers spider my sites. 403 - simple as that.

Regards...jmcc

privacyman

1:48 pm on Oct 5, 2004 (gmt 0)

Originally I used robots.txt file but gave that up.

Most usually I visually scan my log file in wordpad just scrolling down until I notice any new bots or unusual entries.

Upon finding any new bots, I will look up their ip and check out any potential domain either contained in the UA and/or the reverse DNS.

If I find a company with information which explains their bot, either a specific description or the site is an engine I then make the decision whether to allow or block. If, by odds, it won't generate any useful traffic then I block the bot to keep down bandwidth.

If no description of the bot or explanation, or if it is a private bot, then I presume it is for some sort of personal gain to them, email harvesting, code theft, theft of pages or images (and I do consider my images copyright, my work, my property, and copyright is stated), or the bot could be used to find a few complaints (legitimate) that I have against a few corporations, then I just block either by UA and/or by IP (and I watch for it to sneak back via another IP).

Simply, if they can't explain who they are and the purpose, then I consider that it is probably of no good for me and may not be of best interest to the general public (suppression of freedom of speech, email spam, etc). Consistency of use of IP's by bots with proper identification (of corporation or individual) and purpose would eliminate many hassles and could simplify work for webmasters.

Standards were "somewhat" established for bots, eg, format of UA with usage of robots.txt file, why couldn't there have been a requirement for proper identification of owner/user of the bot and intended purpose?

Just my views.

Hollywood

4:35 am on Oct 18, 2004 (gmt 0)

All I know is this one will be quite big at saving stuff

[jeteye.com...]

Notice - [cloud.he.net...] as the default error page. - See company involved.

Hollywood

anallawalla

1:35 am on Oct 30, 2004 (gmt 0)

The links in the above message are now broken.

Hollywood

1:57 am on Oct 30, 2004 (gmt 0)

Yes they are now broken, guess they did not like tha I found those.

Must be hiding something. Cat and mouse anyone?

Anyone else can come up with a good find on this?

Hollywood

balam

3:13 am on Oct 30, 2004 (gmt 0)

Must be hiding something.

Gee, do you think it could be related to the fact that they're not ready for their public beta, yet? ;)

Hollywood

4:10 pm on Oct 30, 2004 (gmt 0)

Balam

No I think there is more to it than that actually.

Watch the logs!

Hint hint...

;)

faltered

8:34 pm on Nov 1, 2004 (gmt 0)

This bot just visited my site a week ago. My site's only been up for three or four weeks. Was curious what it was, so I went to their site. It still is extremely vague.

Has anyone figured out anything more about Jeteye?

balam

8:57 pm on Nov 1, 2004 (gmt 0)

> Watch the logs!

Hi Hollywood...

A bit of delay in my reply, since I went a took a good look at my logs. I couldn't find anything untowards regarding Jeteye... What do you know that I don't?

Hollywood

9:03 pm on Nov 1, 2004 (gmt 0)

Just watch the logs, watch the logs.

Hollywood

balam

5:27 am on Nov 2, 2004 (gmt 0)

> Just watch the logs, watch the logs.

After reviewing over 3 full months of logs from a half dozen sites, the only thing that Jetbot seems guilty of is excessively grabbing robots.txt. That was in August when it was still being run from Gigablast IPs; since moving to their own IPs (within a block owned by Hurricane Electric - think that might have anything to do with that 404 URL a few messages back?), requests for robots.txt have been more reasonable.

The first file request comes a maximum of 5 seconds after requesting robots.txt. Jetbot has not requested anything it shouldn't; that is, they completely respect robots.txt. Jetbot also has not hammered my servers. Jetbot has not followed dynamic links (even though it's welcome to). Jetbot does follow 301's. I have received no traffic from the (known) JetEye IP range, except for Jetbot, who has always identified itself.

Of course, none of this surprises me since, as I correctly suspected, Jetbot is rebranded, licensed technology from Gigablast (who has also been squeaky clean).

Hollywood, if you know something, do us all a favor and bloody-well spit it out. So far, your air of mystery is more of a stink of...

This 43 message thread spans 2 pages: 43