Forum Moderators: open
OrgName: Brown University
OrgID: BROWNU
Address: 115 Waterman Street
City: Providence
StateProv: RI
PostalCode: 02912
Country: US
NetRange: 128.148.0.0 - 128.148.255.255
CIDR: 128.148.0.0/16
Probably not worth letting it crawl unless it graduates ;)
Here's something using the name Taiga:
"Taiga: Internet-scale computing"
[cs.brown.edu...]
Another candidate for your crawler might be this:
"Stochastic Models for Web Agents and the Web Environment"
[cs.brown.edu...]
I spotted this thing back in November and the IP address is for some research project at Brown University.
You are to be commended then for taking the time and effort to share it with the participants of Forum 11 as well as remaining "proactive" ;)
[google.com...]
[google.com...]
[google.com...]
Let me put it this way, ever since I wrote my own web site bot blocking software, I find so many new bots and crawlers that I could spend all day posting and/or blogging about them and never be finished.
Therefore, I just post and/or comment about the ones that amuse me or have something particularly interesting to note. For instance, today I've already had over 300 bots that are knocking on my door asking for things they will never get, many new IPs never seen before, and this is a daily event.
Luckily for you, I'm proactively archiving this information into the ultimate bot blocking quarantine list!
Ever heard of "O#*$!earch/1.x (www.o#*$!earch.com)"?
WebmasterWorld software scrambled it, it's OPEN and I and SEARCH.
How about "BilgiBot/1.0(beta) (http://www.bilgi.com/; bilgi at bilgi dot com)"?
Or maybe "ICC-Crawler(Mozilla-compatible; [kc.nict.go.jp...] icc-crawl@ml.nict.go.jp)"?
and on and on and on...
See, that's the difference with whitelisting, they are all blocked by default.
[edited by: incrediBILL at 2:12 am (utc) on Jan. 24, 2007]
Do I detect an amount of discernible sarcasm in that snide remark?
You mean the indirect word "jaded" confuses you ;)
Luckily for you, I'm proactively archiving this information into the ultimate bot blocking quarantine list!
And that will likley appear here as soon as the active search engine for the former bot participant of forum 11 that has been crawling pages for more than three years ;)
BTW your list will will about useless to myself?
The majority of bot and/or new bots appear from RIPE or APNIC ranges which do not get in my websites.
Hell! I don't even take note of bots from those IP ranges.
The majority of bot and/or new bots appear from RIPE or APNIC ranges which do not get in my websites.
Hell! I don't even take note of bots from those IP ranges.
I think your definition of a bot is too narrow as most things crawling these days rarely have an identifiable user agemt, and they aren't from Asia either.
Forum 11 is about "Search Engine Spider Identification". Reporting all non-browser actions on the net is not the topic here. Too many bots are in the net. Sometimes if I examine log files, I think I'm the only man in a bot world. Posting all bot occurences would lead into chaos, hence it would not be helpful.
Blacklisting is a Sisyphean task. Continuously new bots emerge. Forum 11 provides a little help in this eternal fight, but don't expect too much.
If you want to block nearly all bad bots, you need whitelisting. Open internet must be closed. Is your front-door always open?
Forum 11 is about "Search Engine Spider Identification". Reporting all non-browser actions on the net is not the topic here.
For the most part I agree.
Problem is that we have numerous bots and/or crawls starting out unindentified with patterns that have proven in the past to be the beginnings of a yet to be named bot or SE.
(One example is the recently reactivated Amazon thread, which began crawling anonymously and then switched to a Java UA and recently two bot names.)
Don
Forum 11 is about "Search Engine Spider Identification". Reporting all non-browser actions on the net is not the topic here.
In addition to the valid point that Don made, I think Forum 11 would die a slow death from boredom and disuse if it was confined to SE spiders only.
Obviously all major and minor SE spiders have already been identified and only the occasional new one pops up. The activity created by threads on SE spiders alone, would not justify being a separate forum.
Maybe this forum should be renamed "Bot, Spider and Crawler Identification" to match the topics currently being posted. There is no other forum suitable to move to.
Therefore, without tracking things that pretend to be browsers but behave like bots you would never identify these crawlers.
It's just more challenging is all ;)
If your definition of a spider is only something you can see with an identifiable user agent then pack up and close shop now because I can think of a bunch of corporate spiders that don't want to be seen such as Picscout, Cyveillance, NetSweeper, etc. which can only be detected by their activity and nothing else.
Bill,
Everybody's aware your a busy man with massive resources dedicated to tracking and identifying crawls, spiders and UA's far beoynd the capabilities of any other participant in this fourum, however. . . . for the sake of cordiality, communication and understanding?
Could you possibly spare a few seconds and provide what submission this brilliant deduction was the result of?
Possibly in a quote?
Many thanks for your understanding and tolerance of everybody elses incompetence.
Don
Many thanks for your understanding and tolerance of everybody elses incompetence.
Huh?
I was just giving an example of why we should discuss some bot activity that wasn't so simple to identify as sometimes it takes more than a few clues to figure out it's Cyveillance or Picscout at work, assuming we can ever figure them out.
BTW, there are numerous threads of Cyveillance and NetSweeper in the Webmaster World archives.
Did I say there weren't?
Again, examples...
[edited by: incrediBILL at 8:15 pm (utc) on Jan. 25, 2007]
BTW, there are numerous threads of Cyveillance and NetSweeper in the Webmaster World archives.
Did I say there weren't?Again, examples...
If you need examples for your blog references?
I'd suggest searching the Webamster World archives or Google.
If your interest in assisiting other participants in this forum?
I haven't seen anybody other than yourself (at least in this thread) inject or request information on Cyveillance and NetSweeper.
Thus you are just as capable as I am of searcing for "examples".
Happy hunting.