homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Marvin/1.0 - arthur4.sda.t-online.de
skirril




msg:405839
 7:48 pm on Feb 5, 2001 (gmt 0)

Used to know who this was,

62 hits in one minute. Looks like it did not get robots.txt

Ideas?

 

skirril




msg:405840
 10:44 pm on Feb 5, 2001 (gmt 0)

www.sda.t-online.de is the portal site of t-online inm Germany, which uses Infoseek germany (www.infoseek.de) to search.

What is still a mystery to me is why Marvin (the Paranoid Android :) ) isn't programmed to 'ingest' the sites a
little less 'vigourously', ie. wait a little between requests.

Mailed them about their robot's behaviour. no response yet.
(no referrals either)

Brett_Tabke




msg:405841
 6:06 am on Feb 6, 2001 (gmt 0)

62 is agressive. There has been alot of backroom talk about what to do about Googles and Fast's spiders. If you have multiple domains (hundreds), it isn't uncommon for Google to hit 500-1000 pages a minute. Fast used to be worse, but they starte some randomization routine on the url addition order that eliminates most of the problems. Google is still a pretty big problem.

If you have ted-bill.com, tedbill, teds-bill, teds-bills-advneture, and etc up to even a few dozen domains - Google can unleash a torrent of requests in a short time.

Thanks for info on Marvin.

PeteU




msg:405842
 5:22 pm on Feb 6, 2001 (gmt 0)

yep, google can take down any server in no time, they go by hosts so whether you have large number of domains or one domain with large number of subdomains they will pound like there is no tomorrow, partial solution is to limit MaxClients, httpd will still be busy but the whole server won't crash down

littleman




msg:405843
 6:27 pm on Feb 6, 2001 (gmt 0)

>partial solution is to limit MaxClients
Yeah, that is exactly what I did.

oLeon




msg:405844
 1:34 pm on Mar 28, 2001 (gmt 0)

The problem is:
the german provider T-online is the owner of Infoseek.de (25%) and IS is the searchengine on the homepage from T-online, okay, but I donīt think that this autmatically means itīs a spider from IS.de. Might be, that a new spider is around with a connection via T-online.
(really, I hope that itīs a spider from IS, because it hasnīt been on its way for month)

the datas for Infoseek Sidewinder are
#UA Infoseek Sidewinder/0.9
idefix.sda.t-online.de
195.145.119.24
#UA Infoseek Sidewinder/0.9
miraculix.sda.t-online.de
195.145.119.25

and the datas for Marvin/1.0 are quite different (as i figured out):
#UA Marvin/1.0
212.184.44.10
212.184.44.13

oLeon




msg:405845
 3:33 pm on Mar 28, 2001 (gmt 0)

checked my logfiles, and Iīd like to say:
yes, I am almost sure that it is a IS.de spider!!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved