Forum Moderators: open
It was only a couple of days ago that I saw the Gigabot go from v1.0 to v2.0...
(I've also seen Gigabot visit from 207.114.174.2 to .26, as recently as a couple of weeks ago. I thought there was a post somewhere around here from Matt, the Gigablast owner, listing IPs he crawls from, but I can't find it...)
Basically, instead of following what the actual href is, it will take something out of the href display and add it to the actual href.
Am I making any sense? :)
Why is your site so vague about your business? What do you actually mean by JetBot being benign? Am I supposed to interpret that as the spider is not a mail harvester?
Given that JetEye is collecting all this info and offering limited access via a login, my suspicions lead me to believe that JetBot is a spybot.
Finally decided to ban it at the firewall
This is the first time I've noticed them. I agree that it's annoying that they (jetbot/jeteye) won't reveal anything about what they're up to. Has anyone registered there or contacted them to try to get more info? (now don't violate that NDA of theirs ;-)
Anyone know a good reason not to wait and see?
Would you pick a free lottery ticket with some potential of win, big or small?
And what are the costs of having these crawled documents served to that spider, Gigabot or not? Next to zero? Then why even contemplate banning it so long as it does not overload your site?
Easy: live and let live - you might benefit from this in the future.
Why did you decide to block Gigabot?
surfin,
hopefully it's not a misunderstanding of words?
I relate the use of block to htaccess.
"deny" may be used in both htaccess and robots.txt however in those instances there are two entirely different definitions.
It's not my desire to appear facetious here and please do not interpet that so?
Giga is listed in my robots.txt and has thus far honored that request as has JET.
I do NOT have giga in my htaccess under a reference to either the UA or IP range.
I'm not exactly sure when I began utilizing htaccess. It was even before I came to Webmaster World.
Over time I have simply made decisions based on what had transpired and the potential of what possibly might transpire. Making decisions in the process.
In the beginning, I didn't keep the detailed notes to myself for later reference that I do today.
When my sites first began, I used one of those submission pages and I recall being Gigabot part of those submissions.
Some bots are just not worth the effort they take for what they return to websites. I've "apprently" made that determination about giga without documenting it in my notes. Were it a major personal issue, I'd go back into my monthly back-ups reviewing the logs in the process and make and evaluation. It's just not an urgent issue.
Jeeves is another that has far too much activity at my sites for the small amount of visitors the SE returns.
As for sites that view all of my content, I don't have a problem with that. I want my site to be more widely known and allowing bots to view the content is a good way to do that.