If we are talking about the difference between 1/10th second and 3 seconds, then I don't suppose it matters much at all and the quality would be more important.
Relevant results
35.20 %
Well Organised Design (good overview and easy to use)
29.60%
A lot of results
12.00%
Speed
8.80%
Not too much ads
8.80%
Advanced search options
2.40%
Good looks
1.60%
Not too many results
0.80%
Checked out your profile site: very nice :-)
I’ve actually set up a meta search engine already, not sure if I’m allowed to mention it on here.
There’s one problem because it search 8 engines it takes a while to bring back the results, I’m thinking about setting up a search engine where it only search 1 engine so it brings back the results much quicker.
[edited by: IanTurner at 9:55 am (utc) on Jan. 19, 2004]
I very much doubt that any engine will let you use their data without either presenting their PPC adverts with it or paying them a fee per thousand searches.
Apologies if you are already aware of that; I just get the impression that you weren't, and I don't want you to receive any nasty legal letters from them in a couple of months :-/
Altavista and AlltheWeb are your best bets for an engine I would think, although I am unsure of their long term future.
Did you have something specific in mind?
However, I think that the world, and the UK, needs more original search facilities - something new. I say that as someone who also owns a meta search engine, as well as a couple of other engines that resuse other engine's results.
If you are going to spend the time and money developing something, come up with something new. There are over 50 UK search engines already doing the affiliate PPC/backfill thing now and the number is growing each month. If you want to get users, you will need something new and clever.
Hope you don't mind my butting in but I have a question that from your posts around the forums I think you could well help me with.
We run a UK directory (originally a mixture of DMOZ and 3 existing specialised directories we already run) now building through submissions. It seems to be going quite well, done a lot of cleaning and we're currently around 140,000 UK specific sites. We have concentrated on building SE friendly directory pages and have about 28,000 indexed by google and almost 100,000 by Teoma so referrals are growing well.
As you say, a lot of people are just serving Espotting/Inktomi and Mirago results in the UK and we're keen to do something a bit different.
One of the things we're looking at is a complete spidering UK engine (along the lines of mirago), however there is a huge overhead in terms of technology. We thought initially we might continue to allow site submissions to the directory - and then spider each of the submitted sites to pull more information. Then maybe move move to a complete spidering engine in 6-12 months after we are more au fait with the issues.
My question (to eveybody else as well) is how receptive do you think the UK market is currently for a new spidering engine or directory/spider hybrid. From reading a lot of forums it seems people are getting a little sick of yet another espotting affilate and maybe are willing to look at something new and unique?
thanks in advance
f
The advantage of this approach is that you have total control - all sites in the crawled index have been approved by an editor, and can just as easily be removed from the index for spamming. You just have to be careful that the directory doesn't become top heavy with business sites and not enough non-commercials.
As other discussions have noted, Google is becoming full of junk because of its brilliant spidering ability. I think that there is plenty of room in the market for a better quality, smaller index.
Your next issue is of course financing it. I think that you would need at least 6 PCs/servers to make it work and probably more. Then there is the bandwidth and server location, and those don't come cheaply. I have just looked into the cost of colocating 10 servers with Mistral in the UK including a 6Mb leased line. The cost was £25K for the year, excluding the hardware costs, which would be somewhere in the region of £10K.
I think it's a good idea to make your own UK search engine because most of the UK search engines are similar. I just think it may take a little while to get enough UK sites listed, but I think it would defiantly be worthwhile in the end :-)
---------------------------------------
Bobby
Which search engines do you run?
If each site is being supplied by our own engine, then I could justify offsetting some costs to them for additional hardware.
It's a shame that colo in the UK is so expensive compared to the US. We've always confined ourselves to the london datacentres in the past (Redbus and Telehouse) so would be interested in maybe taking a look at other parts of the country (Leeds ideally, Manchester at a pinch)
f
You may also be able to provide backfill for the likes of Espotting and Webfinder, but your results will have to be top notch, better than Inktomi for example, and cheaper.
You may also find that there will be more competition in the near future for the same market space once Wotbox is crawling regularly (is it already?) and I may be building one as well, and there could well be at least one other appearing. And the big UK crawler, Mirago, is already well established.
I think Bobby is right about people using your search if it was good enough UK results, I know I would :-) You could make good money licensing the search to businesses.
-----------------------------------------------
Bobby_Davro
Checked out your sites, Iv'e come accross them before looking good :-) not as good as mine ;-)
only joking.
One of them looks like it's doing really well, do you run the businesses from home? Would you ever sell any of the businesses off?
Licensing results to external sites isn't something I've really considered at the moment, although you never know how things will pan out.
As the operator of a smaller country-wide search engine, I don't think that this point of a premium quality initial dataset can be stressed strongly enough.
Over the past few weeks, I've been working on a purely .ie sub-index and the amount of rubbish that I have had to purge from the sub-index is amazing. One of the main problems is hosters that point the website of a newly registered domain to a holding page that is unique (eg: it has the website name and unique login details such as a hostname). The other major problem is breaking the index down into sites that have to be frequently spidered and sites that are just vegetating. The .ie webspace is fairly small due to the incompetence of the Irish Domain Registry (IEDR) management in failing to grow the cctld domain and the high price of .ie compared to a .com domain. At most the number of .ie websites is around 22K. This is a far cry from the size of .uk sites and the number of UK owned com/net/org/info/biz sites. At a guess, there are about 2.5 million active .uk websites and about 2.6 million UK CNOib websites (of which I'd guess that approximately 60% were active). The last time I ran a relatively crude detection algorithm for UK owned CNOib domains and followed these up with preliminary spidering it picked up about 800K UK owned CNOib websites (I think). As you can see, the size of the problem for UK is on a vastly different scale.
The commonest way for most SEs to get a preliminary boost is to use the DMOZ/ODP dataset for their country. This is a double-edged sword because a: you are relying on another's opinion of what is a good/relevant/local website and b: because the quality of the ODP dataset is not reliable enough to use without further processing. For the UK, the DMOZ/ODP dataset could give a preliminary set of about 150K worthwhile websites.
The depth of search is also important. It could be the difference between having a search database footprint of 1G and one of 200G. This is not a decision that can be made without first spidering a lot of sites to get a rough percentage of the live web (as in webpages/sites updated within the last six to twelve months). It requires a lot of manual decision making from this point onward.
I tend to regard the spidering of a particular realm (where a crawler will endlessly crawl looking for links with the relevant cctld or tld (in this case .uk)) in the URL as being the equivalent of the infinite number of monkeys on an infinite number of wordprocessors attempting to recreate the works of Shakespeare as utterly wasteful. Most small SEs do not have the resources of Google. Far from being a disadvantage, this can be a benefit because they will have to use their resources carefully and create better a search index. Google and the like are great macro/generic search engines but they fall down drastically when they come to localised searching. The best that they can do at the moment is to use IP and cctld restrictions to determine sites hosted in a particular country. And as anyone with a website hosted in the US and on .com, that is not the most efficient way of doing things. Speaking purely on .ie figures, over 50% of Irish sites and domains are hosted outside of Ireland. The figure for Irish .com/net/org/biz/info would be higher. The UK could probably have a figure in the region of 10-30%.
Running an active SE requires more care and attention than a field of Bonsai trees. It is basically three parts: preliminary indexing and detection, active spidering and follow up. A lot of the smaller SEs (that only tend to last for a year or less) fall into the trap of thinking that it is a simple case of leaving crawlers off on a cctld and then following up with spidering. A lot of them, especially the ones that fail, don't put a lot of thought into the first part and not paying attention to the third part (continually updating the index) can produce a dead index very quickly.
The UK is definitely doable but it would require good planning, good execution and good marketing. I think that a lot of people are getting irritated with the Espotting/Overture feeds that they find dominating SERPS. Producing a high quality UK search engine could cash in on that irritation.
Regards...jmcc
Some very good points you made there :-)
You’re right allot of people think it's easy to run a search engine, I don't think they know how much work goes into maintaining it and running it.
By the way I like the comparison:
'active SE requires more care and attention than a field of Bonsai trees' very good ;-)