Mirago robot

Forum Moderators: open

Message Too Old, No Replies

Mirago robot

has it changed name?

allybongo

3:19 pm on Aug 9, 2002 (gmt 0)

I always thought it was HenrytheMiragoRobot. In my logs its ExperimentalHenrytheMiragoRobot. Are they the same or am I being used as a test site ;)

makemetop

3:27 pm on Aug 9, 2002 (gmt 0)

I've seen this too. Comes out of a different IP to the standard Henry on 217.33.60.201 for those who are interested in such things.

NFFC

3:31 pm on Aug 9, 2002 (gmt 0)

The Mirago people read here, maybe one of them could contribute?

allybongo

3:32 pm on Aug 9, 2002 (gmt 0)

Doesn't really concern me but he's here a lot :)

DerekJPreston

10:17 am on Aug 12, 2002 (gmt 0)

Greetings to one and all,

It was suggested that someone from Mirago might like to make comment. As Head of Technology, I'm happy to oblige.

To begin with, our IP address. There are two reasons for the change in Henry's IP address. Firstly, in the last week, Mirago moved from one ISP to another. Henceforth you'll see Henry operating from within the address range 217.205.60.xxx to 217.205.61.xxx instead of the previous 217.33.60.xxx to 217.33.61.xxx range. Secondly we use NAT (network address translation) for security reasons, so the addresses allocated to individual machines within the server room will apparently change anyway from time to time.

Next, the name change. Henry is actually a cluster of machines rather than a single computer. The temporary name change from HenryTheMiragoRobot to ExperimentalHenryTheMiragoRobot reflects the fact that we are currently testing a new system. Certain domains have been selected for test purposes and those webmasters who see the Experimental... version are part of that list.

Hopefully, I will not be deemed to be violating terms of posting here by adding the following explanation.

The experimental system is aimed at improving two aspects of Mirago's service. Firstly the new robot design allows much greater flexibility in digesting the content of sites, as well as improving the hardware fault tolerance. Secondly it builds a new format of searchable index to improve the user experience. In line with certain other search engines, we may choose to make a test search system available on-line for comments prior to actually releasing the system.

I hope that this answers your questions, however please feel free to post additional ones here and I'll endeavour to enlighten as appropriate.

Best regards

makemetop

10:31 am on Aug 12, 2002 (gmt 0)

Thanks, Derek. It would be nice to see Mirago get a little more visibility as the only spidering UK based engine around. Good luck with the testing.

NFFC

10:40 am on Aug 12, 2002 (gmt 0)

Welcome Derek, thanks for the input. I think that all us UKer's are rooting for Mirago, it's nice to have a "home grown" full spidering engine. On the crawling the only thing I don't like is that Henry seems to crawl brand new domains, from Nominet's list I imagine, this can cause some duplication with all the pointer domains sometimes. Is there a technical resaon for this or is it a plan to be "fresh"?

Oh and [webmasterworld.com...] :)

allybongo

10:48 am on Aug 12, 2002 (gmt 0)

Thanks for that Derek. I'm new to this log/spider stuff so its nice to get things I'm not sure of clarified! Thanks to everyone else too.

DerekJPreston

12:31 pm on Aug 14, 2002 (gmt 0)

Greetings,

Many thanks for the welcome. The afore mentioned testing is progressing extremely well and you should see more evidence of the experimental system later today.

In response to NFFC's question about where we source information, I can indeed confirm that we use information supplied by Nominet. Also we take a feed from Network Solutions for .com and other domains.

We use these (and other) sources to ensure that our indexes contain the most recent sites. The question of duplicated (or aliased) domains is one that we deal with in a different way. It is not realistic to try to deduplicate the master list of domains before inclusion in the robot's database. Instead we focus on removing duplicate results from being returned to searches.

Result deduplication is an ongoing area of development. Due to the changed nature of the new (forthcoming) results format, we'll also be changing the algorithms used to detect identical or near identical results.

Incidentally did you know that the Apache documentation is amongst the most prolifically duplicated web content available on-line?

Best regards

Mirago robot

has it changed name?

allybongo

makemetop

NFFC

allybongo

DerekJPreston

makemetop

NFFC

allybongo

DerekJPreston

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week