Forum Moderators: open

Message Too Old, No Replies

Ultimate short list of banned bots

Wanna contribute?

         

silverbytes

1:29 pm on Jul 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a short list of my recent banned bots, some are very well known threats but most of these were added yesterday and wasn't crawlin my site sites ago, so perhaps you could take a look and see if they are visiting your website too or if is there some "good robots" in this list that doesn't dreserve to be there ;)

aipbot
BecomeBot
Cerberian Drtrs
COMBINE
ConveraCrawler
Custom Bot/Robot #20
e-collector
Faxobot/1.0
Faxobot
Fetch API Request
FAST Enterprise Crawler
Html Link Validator (www.lithopssoft.com)
iaea
ichiro
INDEXU Spider Link Checker
IRLbot
linkwalker
LinksManager
Microsoft URL Control
microsoft.url
NaverBot
Nutch
NutchOrg
OmniExplorer_Bot
Spam Bot
Test.Com
T-H-U-N-D-E-R-S-T-O-N-E
Twiceler
Xenu Link Sleuth

wilderness

6:50 pm on Jul 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most of these are still aplicable:
[webmasterworld.com...]

silverbytes

12:06 am on Jul 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks I posted myself to that thread but it's virtually impossible to follow a 25 pages span thread.

GaryK

12:34 am on Jul 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I tried to post my list but WW formatting changes it to smileys and stuff. You can look at the site in my profile for the file if you want to.

GaryK

1:59 am on Jul 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, I thought it was alright to refer folks to our profile page. Considering how pedantic I am about my own TOS I don't want to push the TOS on some else's site. Please delete my post and I'll repost it following your directions. :)

ahmedtheking

12:46 am on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So how can we get the IPs for these bots (to block that IP)? Otherwise, how do we block them? and what do they do?

wilderness

2:29 am on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



silverbytes writes:
"This is a short list"

Silver you've listed 29 lines and called it a short list.
This portion of my htaccess is 297 lines. I'm not about to post that in its entirety in this forum or any other forum either in or out of Webmaster World.
The "Close to Perfect htaccess"
( [webmasterworld.com...] ) thread is prime example of how ugly and long these types of threads can get when each participant begins adding their lines, as compared to what has previously been added. (You were aware of that [having participated in that thread] and you both inquired and started a similar thread.
Perhaps I'm narrow minded, however it doesn't make sense to me to re-invent the wheel.

ahmed writes:

"So how can we get the IPs for these bots (to block that IP)? Otherwise, how do we block them? and what do they do?"

Ahmed,
You have multiple questions;
1) IP's?
a) You either view forum tools
[webmasterworld.com...]
or utilize links to other tools given in previous
threads.

b) in most instances the preference is to deny access
by UA rather than IP to lessen the denial of
innocents.

C) In the case of the 29 provided by silver you either
dig through the htaccess examples or seek an
alternative source and learn how to write your own
lines.
[webmasterworld.com...]

2)What they do?
Personally, I don't have any desire to understand what
they do or what they intend to do or what they say they
are going to do.
If the crawling UA does not contain a link to a URL
which offers an explantion of their goals and their
goals ARE parallel to my websites goals? Than I allow
them. If NOT the same goals, they are denied.

In short is their any benefit to the spidering of my
data and is that spidering benefical or detrimental to
my websites.
Spending time going through page after page of google
searches on some assinine name because the software
maker intends to decieve websmaters is hardly productive
or cooperative with my time.


The majority of my insights come from ARIN, RIPE and APNIC inqiries and direct links to IP's.

All other insights in to making a determination, webmasters either learn over time (by monitoring their visitor logs) or reading materials provided by various methods (one of which is this forum).

For ANY person to expect somebody to provide a functioning htaccess file which is compatible for two websites of entirely different content materials is very short-sighted.

Don

silverbytes

3:29 am on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wilderness: as you said 29 is short compared with 297 which is certainly an incomplete list of threats. I don't share your opinion at all and don't care either. However the purpose of this thread is talking about those spiders posted and not the long and difficult to read thread you mention and certainly you didn't much about that.

fischermx

3:44 am on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think these lists are just usefull if they are provided with the proper full user agent.

Let's discuss the httaccess thing in another more technical thread, because, for example, many IIS users would not even care about it.

This is just about share the known unuseful robots.

balam

10:18 pm on Jul 27, 2005 (gmt 0)

10+ Year Member



> However the purpose of this thread is talking about those spiders posted

What you really mean is, "Hey, lets rehash the hundreds of threads that already cover the few spiders I mentioned, and probably contain the answers I'm looking for, but am too lazy to look for myself."

Ok, to be fair, I didn't find any quick results for "Test.Com" (a fake domain often used as an example around here) or for "Spam Bot" (often appearing as "Is this a spam bot?" around here), but for ALL the other listed spiders I found plenty of threads going back years with info, speculation, IPs, responsible companies & more, on said spiders.

Since some folk need others to make the determination as to whether a given spider is bad or not, as a public service I'd like to point out that "Googlebot", "Yahoo! Slurp" & "msnbot" are the three worst spiders you could have visiting your site. Ban them now, before it's too late and the damage is done!

Having trouble finding what you're looking for? You should check out this thread: FAQ: Additional Search Tools for WebmasterWorld [webmasterworld.com]

While humorous, the linked-to Flash presentation in the first message of this thread really should be required viewing: Posting and You... [webmasterworld.com]

wilderness

12:10 am on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Utilizing balam's link and suggstion?

A search of the Webmaster World pages at google with forum11 (this forum) returns the following:

[google.com...]

should you desire to add a particular bot, UA or IP?
just add after forum11 in the search box

+and the name

Some more Forum tools:

Valid Search Engine?
[webmasterworld.com...]

IIS and Global.asa
[w3schools.com...]

dbm Maps
[webmasterworld.com...]

Reduce harvests
[webmasterworld.com...] Msg#16

Throttle runaways
[webmasterworld.com...]

Block Methods (Scroll past opening Advertisements)
[diveintomark.org...]

Regular Expressions
[etext.lib.virginia.edu...]
[gnosis.cx...]

Close To Perfect I
[webmasterworld.com...]
Close To Perfect II
[webmasterworld.com...]
Close To Perfect III
[webmasterworld.com...]

Concise htaccess
[webmasterworld.com...]

robots.text on a diet
[webmasterworld.com...]

Search Tools
[webmasterworld.com...]

silverbytes

3:07 am on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



balam: you waste your time judging other's posts and btw your post is not only bored but also useless to the purpouse of the thread.

widlerness: thanks

Test.com is still unknown to me and probably others.

wilderness

4:09 am on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



balam: you waste your time judging other's posts and btw your post is not only bored but also useless to the purpouse of the thread.

Actually!
I thought this part (below) was rather funny and on a couple of passing thoughts, almost submitted to remined others that it WAS tongue-in-cheek without the emoticon.

I'd like to point out that "Googlebot", "Yahoo! Slurp" & "msnbot" are the three worst spiders you could have visiting your site. Ban them now, before it's too late and the damage is done!

BTW silver,
ALL those links came from the Close to Perfect htaccess thread :)

This forum has been primarily htaccess (in addition to SESID) at least since I've been here (My profile says 2001, however I was previously registered under another screen name.)
I even used balam's google link to do a search on "forum11+IIS" and there wasn't much. There was a IIS inquiriy in the "Close to Perfect" that went unanswered.

Prior to this forum going down, it was a bundle of activity and many of the once participants here have not returned. (balam was a regular at one time.)

The moderated forum (no pun as the forum does exists,)is of a lesser reaching claw than the old forum. In the old forum, submissions were not delayed and a spider could be stopped in its tracks.
Nor was separating private from commercial IP ranges an issue in the old forum. Today, even though a private IP may be doing massive crawls, we are apparently limited by Charther, moderation and delay.
In all fairness though, the old forum was shut down because some posters were submitting their competitors as crawls.

That Bret decided to bring this forum back is commendable.

My question is?

If one of the 29 UA's you submitted came from a private IP range?
How could the poster provide the IP range without violating forum rules and yet sharing accurate information?

Don

volatilegx

3:23 pm on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guess it's my turn to weigh in here.

If one of the 29 UA's you submitted came from a private IP range?
How could the poster provide the IP range without violating forum rules and yet sharing accurate information?

You can't. If an IP address appears to be that of a private individual, please don't post it. If you need to post it, please do not post the last last group of numbers in the IP address.

This forum was closed for liability reasons and was only reopened when I promised to enforce this strict guideline.

Remember, the primary purpose of the forum is to identify search engine spiders. Identifying other types of spiders (including building ban-lists) is a secondary function of this forum.

That being said, I have no problem with starting new threads listing "bannable bots", assuming those threads don't turn into flame-fests or some such. There are pre-existing threads with this information, but it takes a lot of digging to get to the meat of the info in them because they are so long. It would be nice to have one comprehensive thread of bad-bots with no extraneous posts in it, but I am probably dreaming.

balam

6:10 pm on Jul 29, 2005 (gmt 0)

10+ Year Member



I'm only here for a couple of minutes - to see who I'm peeving off now [webmasterworld.com] - but this thread fulfills most of volatilegx's dream: WebmasterWorld forum11 Updated and Collated Bot List [webmasterworld.com]

volatilegx

8:10 pm on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, bull's list is even in the forum's library [webmasterworld.com], but it doesn't cull the 'good' bots from the 'bad'.

Of course, everybody's definitions of bad and good will be different. In my opinion, any bot attached to a public search engine (which doesn't cache or has opt-out cacheing) is good. Anything else is bad.

RailMan

8:34 am on Aug 2, 2005 (gmt 0)

10+ Year Member



it's virtually impossible to follow a 25 pages span thread

it isn't that bad!
it's an excellent thread, well worth spending a bit of time on