homepage Welcome to WebmasterWorld Guest from 54.161.192.61
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Marvin/1.0 - arthur4.sda.t-online.de
skirril

10+ Year Member



 
Msg#: 367 posted 7:48 pm on Feb 5, 2001 (gmt 0)

Used to know who this was,

62 hits in one minute. Looks like it did not get robots.txt

Ideas?

 

skirril

10+ Year Member



 
Msg#: 367 posted 10:44 pm on Feb 5, 2001 (gmt 0)

www.sda.t-online.de is the portal site of t-online inm Germany, which uses Infoseek germany (www.infoseek.de) to search.

What is still a mystery to me is why Marvin (the Paranoid Android :) ) isn't programmed to 'ingest' the sites a
little less 'vigourously', ie. wait a little between requests.

Mailed them about their robot's behaviour. no response yet.
(no referrals either)

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 367 posted 6:06 am on Feb 6, 2001 (gmt 0)

62 is agressive. There has been alot of backroom talk about what to do about Googles and Fast's spiders. If you have multiple domains (hundreds), it isn't uncommon for Google to hit 500-1000 pages a minute. Fast used to be worse, but they starte some randomization routine on the url addition order that eliminates most of the problems. Google is still a pretty big problem.

If you have ted-bill.com, tedbill, teds-bill, teds-bills-advneture, and etc up to even a few dozen domains - Google can unleash a torrent of requests in a short time.

Thanks for info on Marvin.

PeteU

10+ Year Member



 
Msg#: 367 posted 5:22 pm on Feb 6, 2001 (gmt 0)

yep, google can take down any server in no time, they go by hosts so whether you have large number of domains or one domain with large number of subdomains they will pound like there is no tomorrow, partial solution is to limit MaxClients, httpd will still be busy but the whole server won't crash down

littleman

WebmasterWorld Senior Member littleman us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 367 posted 6:27 pm on Feb 6, 2001 (gmt 0)

>partial solution is to limit MaxClients
Yeah, that is exactly what I did.

oLeon

10+ Year Member



 
Msg#: 367 posted 1:34 pm on Mar 28, 2001 (gmt 0)

The problem is:
the german provider T-online is the owner of Infoseek.de (25%) and IS is the searchengine on the homepage from T-online, okay, but I donīt think that this autmatically means itīs a spider from IS.de. Might be, that a new spider is around with a connection via T-online.
(really, I hope that itīs a spider from IS, because it hasnīt been on its way for month)

the datas for Infoseek Sidewinder are
#UA Infoseek Sidewinder/0.9
idefix.sda.t-online.de
195.145.119.24
#UA Infoseek Sidewinder/0.9
miraculix.sda.t-online.de
195.145.119.25

and the datas for Marvin/1.0 are quite different (as i figured out):
#UA Marvin/1.0
212.184.44.10
212.184.44.13

oLeon

10+ Year Member



 
Msg#: 367 posted 3:33 pm on Mar 28, 2001 (gmt 0)

checked my logfiles, and Iīd like to say:
yes, I am almost sure that it is a IS.de spider!!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved