homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 43 message thread spans 2 pages: 43 ( [1] 2 > >     
new private search engine?

 5:32 pm on Jul 11, 2004 (gmt 0)

Been having this "engine" spider me today. Will post IP and other details in a bit.



 5:43 am on Jul 12, 2004 (gmt 0)

Well, waited for a while for the thread to be posted. Found out it was gigabot. Anyone seen any activity from this bot lately?


 10:06 am on Jul 12, 2004 (gmt 0)

Gigabot hit a few of my sites yesterday. Each time it grabbed only the robots.txt and the index page, nothing more.


 2:25 pm on Jul 12, 2004 (gmt 0)

Started seeing it 7/9.
UA: Gigabot/1.0
IP: -
Host: jeta.jeteye.com (


 2:29 pm on Jul 12, 2004 (gmt 0)

Excellent! Thanks macrost.


 4:52 am on Jul 27, 2004 (gmt 0)

Now it's crawling with the UA: Jetbot/1.0


 3:10 pm on Jul 27, 2004 (gmt 0)

Is this someone perhaps licensing technology from Gigablast?

It was only a couple of days ago that I saw the Gigabot go from v1.0 to v2.0...

(I've also seen Gigabot visit from to .26, as recently as a couple of weeks ago. I thought there was a post somewhere around here from Matt, the Gigablast owner, listing IPs he crawls from, but I can't find it...)


 3:46 pm on Jul 27, 2004 (gmt 0)

JetBot/1.0 has been hitting everything on my site including trackbacks and comments. Is it benign or no way to tell?


 10:36 pm on Jul 31, 2004 (gmt 0)

Jetbot/1.0 - Took the robots.txt and left. (Hit 404 redirect first.)


 2:19 pm on Aug 2, 2004 (gmt 0)

Ok, seems that this bot is well... stupid in some regards. It sees what the url looks like, but sometimes doesn't follow the actual link. Kinda weird, it generated 8 errors within my application because of that yesterday.

Basically, instead of following what the actual href is, it will take something out of the href display and add it to the actual href.

Am I making any sense? :)


 12:58 am on Aug 5, 2004 (gmt 0)

Hello all. I meant to post this sooner. Yes, JetBot is the spider for JetEye.com, a new Internet technology company with a public beta due in a couple months. Yes, we are benign, and yes we are licensing some technology from GigaBlast. So JetBot is as well behaved as gigabot. If you have any questions or concerns you can use the contact page on our site at [jeteye.com...]

All the best,
The JetEye Team


 7:14 pm on Aug 5, 2004 (gmt 0)

Thanks for posting your info JetEye, and welcome to WebmasterWorld.


 11:01 pm on Aug 8, 2004 (gmt 0)


Why is your site so vague about your business? What do you actually mean by JetBot being benign? Am I supposed to interpret that as the spider is not a mail harvester?

Given that JetEye is collecting all this info and offering limited access via a login, my suspicions lead me to believe that JetBot is a spybot.


 11:20 pm on Aug 8, 2004 (gmt 0)

Aren't they all? We just decide which ones we like - targeted and qualified traffic generators:))


 11:29 am on Aug 9, 2004 (gmt 0)

I also was visited by JetBot. If only googlebot indexed as many pages as JetBot, I'd be happy.

Wonder what JetBot is? My site had lots of cities & states on it. Is that consistant with other sites it's indexed?


 11:41 am on Aug 9, 2004 (gmt 0)

JetBot seems to be acting up over the weekend. I have a site on example.com with a 301 redirect from www.example.com to example.com There are no inbound links anywhere on the web pointing to www.example.com, all links point to example.com yet ... JetBot repeatedly requests pages from www.example.com. This includes the robots.txt page, which it then ignores.

Finally decided to ban it at the firewall


 12:49 pm on Aug 9, 2004 (gmt 0)

Interesting. I was wondering what that was when I saw it in my logs. Anyone have any information on how this company plans to gain any market share worth letting them spider me for?


 4:09 pm on Aug 9, 2004 (gmt 0)

Given that JetEye is collecting all this info and offering limited access via a login, my suspicions lead me to believe that JetBot is a spybot.

Looks like they are not quite ready for a public beta test, that's probably why a login is required. I don't know why that would set any alarms ringing.


 12:45 am on Aug 13, 2004 (gmt 0)

Jetbot has visited 500 of my site's pages so far today. It's still at it, but seems well behaved in that it's taking them slowly.

This is the first time I've noticed them. I agree that it's annoying that they (jetbot/jeteye) won't reveal anything about what they're up to. Has anyone registered there or contacted them to try to get more info? (now don't violate that NDA of theirs ;-)


 2:04 am on Aug 13, 2004 (gmt 0)

When the UA was first submitted here, I added it into my robots.txt.

Thus far, the bot has respected that entry on my sites.


 1:00 pm on Aug 13, 2004 (gmt 0)

I can't decide whether to block them or not. They could turn out to be a good source of traffic for me in the future. Anyone know a good reason not to wait and see?


 2:03 pm on Aug 13, 2004 (gmt 0)

PERSONALLY, I previously has Gigabot denied and this bot is an extension of that technology. For me a simple choice.

For you?
If there is any possibility that the bot will increase a desired traffic to your site (s) than why deny it?


 2:26 pm on Aug 13, 2004 (gmt 0)

Thanks, wilderness. Why did you decide to block Gigabot?

Lord Majestic

 2:28 pm on Aug 13, 2004 (gmt 0)

Anyone know a good reason not to wait and see?

Would you pick a free lottery ticket with some potential of win, big or small?

And what are the costs of having these crawled documents served to that spider, Gigabot or not? Next to zero? Then why even contemplate banning it so long as it does not overload your site?

Easy: live and let live - you might benefit from this in the future.


 3:07 pm on Aug 13, 2004 (gmt 0)

Why did you decide to block Gigabot?

hopefully it's not a misunderstanding of words?

I relate the use of block to htaccess.
"deny" may be used in both htaccess and robots.txt however in those instances there are two entirely different definitions.

It's not my desire to appear facetious here and please do not interpet that so?

Giga is listed in my robots.txt and has thus far honored that request as has JET.

I do NOT have giga in my htaccess under a reference to either the UA or IP range.

I'm not exactly sure when I began utilizing htaccess. It was even before I came to Webmaster World.
Over time I have simply made decisions based on what had transpired and the potential of what possibly might transpire. Making decisions in the process.
In the beginning, I didn't keep the detailed notes to myself for later reference that I do today.

When my sites first began, I used one of those submission pages and I recall being Gigabot part of those submissions.
Some bots are just not worth the effort they take for what they return to websites. I've "apprently" made that determination about giga without documenting it in my notes. Were it a major personal issue, I'd go back into my monthly back-ups reviewing the logs in the process and make and evaluation. It's just not an urgent issue.

Jeeves is another that has far too much activity at my sites for the small amount of visitors the SE returns.


 6:24 pm on Aug 13, 2004 (gmt 0)

I have heard good reasons to allow these spiders access to my site and none to deny them, so I'll let them have their fun. Thanks for your help.


 2:49 pm on Aug 20, 2004 (gmt 0)

Real nice he shows up here, but jeteye, you need to provide a little more info.
You are grabbing hundreds of my pages, why?


 4:10 pm on Sep 27, 2004 (gmt 0)

Jetbot has indexed EVERY page of my forums, has been on my site for 6 days 24-7
I am getting ait nervous reading here whether it is some sort of spam harverster

Come back tell us what you are doing Jeteye


 3:21 pm on Sep 28, 2004 (gmt 0)

I banned them straight off by i.p. when I saw their homepage. You need to really be careful these days who you let download your content in whole... especially with all the cloaking and hijacking going on. Not to imply this is going to happen from this bot... only that to me the risk is not worth the perceived benefit to my sites... not that in this case I see any perceived benefit to start with.


 12:53 pm on Oct 1, 2004 (gmt 0)

I can understand wanting to block spam harvesters from getting email addresses. I accomplish that by requiring anyone interested an obtaining an email address to allow me to set a cookie in their browser first. I use the cookie to count how many email addresses they've taken today and cut them off after 5.

As for sites that view all of my content, I don't have a problem with that. I want my site to be more widely known and allowing bots to view the content is a good way to do that.

This 43 message thread spans 2 pages: 43 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved