Forum Moderators: open
It was suggested that someone from Mirago might like to make comment. As Head of Technology, I'm happy to oblige.
To begin with, our IP address. There are two reasons for the change in Henry's IP address. Firstly, in the last week, Mirago moved from one ISP to another. Henceforth you'll see Henry operating from within the address range 217.205.60.xxx to 217.205.61.xxx instead of the previous 217.33.60.xxx to 217.33.61.xxx range. Secondly we use NAT (network address translation) for security reasons, so the addresses allocated to individual machines within the server room will apparently change anyway from time to time.
Next, the name change. Henry is actually a cluster of machines rather than a single computer. The temporary name change from HenryTheMiragoRobot to ExperimentalHenryTheMiragoRobot reflects the fact that we are currently testing a new system. Certain domains have been selected for test purposes and those webmasters who see the Experimental... version are part of that list.
Hopefully, I will not be deemed to be violating terms of posting here by adding the following explanation.
The experimental system is aimed at improving two aspects of Mirago's service. Firstly the new robot design allows much greater flexibility in digesting the content of sites, as well as improving the hardware fault tolerance. Secondly it builds a new format of searchable index to improve the user experience. In line with certain other search engines, we may choose to make a test search system available on-line for comments prior to actually releasing the system.
I hope that this answers your questions, however please feel free to post additional ones here and I'll endeavour to enlighten as appropriate.
Best regards
Oh and [webmasterworld.com...] :)
Many thanks for the welcome. The afore mentioned testing is progressing extremely well and you should see more evidence of the experimental system later today.
In response to NFFC's question about where we source information, I can indeed confirm that we use information supplied by Nominet. Also we take a feed from Network Solutions for .com and other domains.
We use these (and other) sources to ensure that our indexes contain the most recent sites. The question of duplicated (or aliased) domains is one that we deal with in a different way. It is not realistic to try to deduplicate the master list of domains before inclusion in the robot's database. Instead we focus on removing duplicate results from being returned to searches.
Result deduplication is an ongoing area of development. Due to the changed nature of the new (forthcoming) results format, we'll also be changing the algorithms used to detect identical or near identical results.
Incidentally did you know that the Apache documentation is amongst the most prolifically duplicated web content available on-line?
Best regards