Welcome to WebmasterWorld Guest from 23.20.223.88

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Site friendly spiders : the list

     
5:41 pm on Jun 12, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Friendly Spider:
Obey's robots.txt.
Does not request more than one page per minute.
Visits in low traffic hours or reduces requests during peak traffic hours.
Has info on site about spider.

The List
Slurp : Inktomi.com
GoogleBot : google.com
Scooter : altavista.com
DirectHit : directhit.com
Fast : alltheweb.com
teoma : teoma.com
ArchitextSpider : excite.com
Gulliver : northernlight.com
T-Rex : Lycos.com

6:22 pm on Jun 12, 2001 (gmt 0)

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'd take Slurp out, especially the Japanese based bots. They will rip threw a C-name layout very aggressively. Here is an example. All page requests are like such:
category1.domain.com
category2.domain.com

The bot is 202.212.5.32 -> goo311.inktomi.com
The requests come in like this:
at 1:59:58 PM on Monday, June 9, 2001
at 2:00:00 PM on Monday, June 9, 200
at 2:00:01 PM on Monday, June 9, 2001
at 2:00:02 PM on Monday, June 9, 2001
at 2:00:02 PM on Monday, June 9, 2001
at 2:00:03 PM on Monday, June 9, 2001
at 2:00:04 PM on Monday, June 9, 2001
at 2:00:06 PM on Monday, June 9, 2001
at 2:00:06 PM on Monday, June 9, 2001
at 2:00:08 PM on Monday, June 9, 2001
at 2:00:09 PM on Monday, June 9, 2001
at 2:00:10 PM on Monday, June 9, 2001
at 2:00:11 PM on Monday, June 9, 2001
at 2:00:12 PM on Monday, June 9, 2001
at 2:00:14 PM on Monday, June 9, 2001
at 2:00:15 PM on Monday, June 9, 2001
at 2:00:16 PM on Monday, June 9, 2001
and on, and on...

Adding up to tens of thousands of requests per day per server.

6:58 pm on Jun 12, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm going to have to second Littleman. I've been getting over swamped by Inktomi as well. I went ahead and added robots.txt files in a few domains and that stopped most of the Slurps except one: Slurp/cat

This version is like a virus. Sometimes it will grab pages on one domain with only 2 seconds in between, BUT they(Slurp/cat) are off multiple IPs on the same Inktomi C-block. Just goes to show how much they coordinate with one another. Unless, they are running off of separate lists of URLs from their dozen or so databases. But even so that could still clog up a server.

So for almost two or more weeks this lil bugger wouldn't even try to glance at the robots.txt file. After that, if you don't have it disallowed in the robots.txt file, it takes a lil break for a week and starts all over again.

7:56 pm on Jun 12, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ok debatable - on the gray list since we don't have alot of choice if we want ink traffic.

Who else is in on the Friendly list?

 

Featured Threads

Hot Threads This Week

Hot Threads This Month