homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)
Took disallowed files
GaryK




msg:3019555
 10:39 pm on Jul 23, 2006 (gmt 0)

Mozilla/4.0 (compatible; MyFamilyBot/1.0; [myfamilyinc.com...] )
66.43.16.199
nat.myfamilyinc.com

Read robots.txt but still took a disallowed file from each of my sites.

This is apparently the parent company behind Ancestry.com and other such sites.

[edited by: volatilegx at 12:55 am (utc) on July 24, 2006]
[edit reason]
[1][edit reason] fixed broken link - added space between URL and closing parens [/edit]
[/edit][/1]

 

Ocean10000




msg:3034581
 5:53 pm on Aug 4, 2006 (gmt 0)

Just seen this today.
No Robots.txt file requested.
Looked at Homepage, and was greeted with 403 Permision Denied. No further requests where made.
Came from 66.43.16.199 Also.

aeronautic




msg:3034986
 2:24 am on Aug 5, 2006 (gmt 0)

Just hit me for the first time. Got the robots.txt file and the / page.

Why they provide zero information on their website only they know for sure... grrr.

GaryK




msg:3035366
 3:00 pm on Aug 5, 2006 (gmt 0)

Dan,

I see you edited the user agent. I'm not sure why as it's now a different user agent than the one I posted.

Mozilla/4.0 (compatible; MyFamilyBot/1.0; [myfamilyinc.com)...]

That's what I saw in my logs. There is no space after the URL. :)

wilderness




msg:3035404
 3:50 pm on Aug 5, 2006 (gmt 0)

The real problem with the geneaolgy sites is that data is accumulated to be sold on CD's.
I have some family members are heavily involved in family history and have reported that some of the data from within their databases (after using similar sites for free hosting) has been included in CD's sold to third parties.

From my point of view and regarding my widegets ;)
I've have some unique contacts and correspondence result from folks who have found older items on my pages while tracing family geneaolgy.

Last night when this thread came up, I was most sure that I had a portion of MyFamily's ranges denied. I was unable to find any deny based on UA or IP.
Possibly I did have it denied at one time and removed it?

Has anybody had an extensive crawling or is the bot only hitting a page or two (perhaps a specific page they previously marked and crawled)?

Don

aeronautic




msg:3035696
 11:26 pm on Aug 5, 2006 (gmt 0)

I wrote to them via their posted corporate contact e-mail:

MyFamily.com, Inc. Corporate <---
360 West 4800 North
Provo, UT 84604
Phone: 801-705-7000
FAX: 801-705-7001
pr *--AT--* myfamilyinc.com (hate spambots though their site posts this in the open)

and the response I got back today was:

Your message

To: PR
Subject: MyFamilyBot/1.0
Sent: Fri, 4 Aug 2006 20:34:52 -0600

was deleted without being read on Sat, 5 Aug 2006 16:58:03 -0600

A user with the name "A Hathaway" was the final recipient I believe (first name reduced to single letter by me for privacy).

FYI: The UA that hit me was:

Agent: Mozilla/4.0 (compatible; MyFamilyBot/1.0; http:// www myfamilyinc com) (I pulled the .'s to prevent a link)

host: nat.myfamilyinc.com which resolves to 66.43.16.199

Black hat or unprofessional? Hard to tell at this point - a toss-up.

I've not seen them back though. Yet.

Ocean10000




msg:3035797
 3:18 am on Aug 6, 2006 (gmt 0)

After digging though my First Sightings Log files
I found the following header, which I am assuming they want everyone to email problems too etc..

From: SearchBot@myfamilyinc.com

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved