homepage Welcome to WebmasterWorld Guest from 54.237.99.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)
Took disallowed files
GaryK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3019553 posted 10:39 pm on Jul 23, 2006 (gmt 0)

Mozilla/4.0 (compatible; MyFamilyBot/1.0; [myfamilyinc.com...] )
66.43.16.199
nat.myfamilyinc.com

Read robots.txt but still took a disallowed file from each of my sites.

This is apparently the parent company behind Ancestry.com and other such sites.

[edited by: volatilegx at 12:55 am (utc) on July 24, 2006]
[edit reason]
[1][edit reason] fixed broken link - added space between URL and closing parens [/edit]
[/edit][/1]

 

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3019553 posted 5:53 pm on Aug 4, 2006 (gmt 0)

Just seen this today.
No Robots.txt file requested.
Looked at Homepage, and was greeted with 403 Permision Denied. No further requests where made.
Came from 66.43.16.199 Also.

aeronautic

5+ Year Member



 
Msg#: 3019553 posted 2:24 am on Aug 5, 2006 (gmt 0)

Just hit me for the first time. Got the robots.txt file and the / page.

Why they provide zero information on their website only they know for sure... grrr.

GaryK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3019553 posted 3:00 pm on Aug 5, 2006 (gmt 0)

Dan,

I see you edited the user agent. I'm not sure why as it's now a different user agent than the one I posted.

Mozilla/4.0 (compatible; MyFamilyBot/1.0; [myfamilyinc.com)...]

That's what I saw in my logs. There is no space after the URL. :)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3019553 posted 3:50 pm on Aug 5, 2006 (gmt 0)

The real problem with the geneaolgy sites is that data is accumulated to be sold on CD's.
I have some family members are heavily involved in family history and have reported that some of the data from within their databases (after using similar sites for free hosting) has been included in CD's sold to third parties.

From my point of view and regarding my widegets ;)
I've have some unique contacts and correspondence result from folks who have found older items on my pages while tracing family geneaolgy.

Last night when this thread came up, I was most sure that I had a portion of MyFamily's ranges denied. I was unable to find any deny based on UA or IP.
Possibly I did have it denied at one time and removed it?

Has anybody had an extensive crawling or is the bot only hitting a page or two (perhaps a specific page they previously marked and crawled)?

Don

aeronautic

5+ Year Member



 
Msg#: 3019553 posted 11:26 pm on Aug 5, 2006 (gmt 0)

I wrote to them via their posted corporate contact e-mail:

MyFamily.com, Inc. Corporate <---
360 West 4800 North
Provo, UT 84604
Phone: 801-705-7000
FAX: 801-705-7001
pr *--AT--* myfamilyinc.com (hate spambots though their site posts this in the open)

and the response I got back today was:

Your message

To: PR
Subject: MyFamilyBot/1.0
Sent: Fri, 4 Aug 2006 20:34:52 -0600

was deleted without being read on Sat, 5 Aug 2006 16:58:03 -0600

A user with the name "A Hathaway" was the final recipient I believe (first name reduced to single letter by me for privacy).

FYI: The UA that hit me was:

Agent: Mozilla/4.0 (compatible; MyFamilyBot/1.0; http:// www myfamilyinc com) (I pulled the .'s to prevent a link)

host: nat.myfamilyinc.com which resolves to 66.43.16.199

Black hat or unprofessional? Hard to tell at this point - a toss-up.

I've not seen them back though. Yet.

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3019553 posted 3:18 am on Aug 6, 2006 (gmt 0)

After digging though my First Sightings Log files
I found the following header, which I am assuming they want everyone to email problems too etc..

From: SearchBot@myfamilyinc.com

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved