Welcome to WebmasterWorld Guest from 54.196.8.177

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Marvin/1.0 - arthur4.sda.t-online.de

     
7:48 pm on Feb 5, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 19, 2000
posts:193
votes: 0


Used to know who this was,

62 hits in one minute. Looks like it did not get robots.txt

Ideas?

10:44 pm on Feb 5, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 19, 2000
posts:193
votes: 0


www.sda.t-online.de is the portal site of t-online inm Germany, which uses Infoseek germany (www.infoseek.de) to search.

What is still a mystery to me is why Marvin (the Paranoid Android :) ) isn't programmed to 'ingest' the sites a
little less 'vigourously', ie. wait a little between requests.

Mailed them about their robot's behaviour. no response yet.
(no referrals either)

6:06 am on Feb 6, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38071
votes: 16


62 is agressive. There has been alot of backroom talk about what to do about Googles and Fast's spiders. If you have multiple domains (hundreds), it isn't uncommon for Google to hit 500-1000 pages a minute. Fast used to be worse, but they starte some randomization routine on the url addition order that eliminates most of the problems. Google is still a pretty big problem.

If you have ted-bill.com, tedbill, teds-bill, teds-bills-advneture, and etc up to even a few dozen domains - Google can unleash a torrent of requests in a short time.

Thanks for info on Marvin.

5:22 pm on Feb 6, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:July 28, 2000
posts:134
votes: 0


yep, google can take down any server in no time, they go by hosts so whether you have large number of domains or one domain with large number of subdomains they will pound like there is no tomorrow, partial solution is to limit MaxClients, httpd will still be busy but the whole server won't crash down
6:27 pm on Feb 6, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


>partial solution is to limit MaxClients
Yeah, that is exactly what I did.
1:34 pm on Mar 28, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 15, 2000
posts:482
votes: 0


The problem is:
the german provider T-online is the owner of Infoseek.de (25%) and IS is the searchengine on the homepage from T-online, okay, but I donīt think that this autmatically means itīs a spider from IS.de. Might be, that a new spider is around with a connection via T-online.
(really, I hope that itīs a spider from IS, because it hasnīt been on its way for month)

the datas for Infoseek Sidewinder are
#UA Infoseek Sidewinder/0.9
idefix.sda.t-online.de
195.145.119.24
#UA Infoseek Sidewinder/0.9
miraculix.sda.t-online.de
195.145.119.25

and the datas for Marvin/1.0 are quite different (as i figured out):
#UA Marvin/1.0
212.184.44.10
212.184.44.13

3:33 pm on Mar 28, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 15, 2000
posts:482
votes: 0


checked my logfiles, and Iīd like to say:
yes, I am almost sure that it is a IS.de spider!!