The project has already detected 1.2 Million UK com/net/org websites and the Dmoz.org UK dataset would give it another 144K or so sites. This float would probably put it close to the Google class in terms of depth and it keeps on updating the UK lists every month.
What I have to work out is whether there is a market for the feed to existing UK directories/SEs or there is a market in selling 'searches' on a new UK search engine to existing directories and SEs. Have any of the UK directory/SE operators here any ideas on this?
Regards...jmcc
I think that if you can set up a UK specific search engine, then other SEs may be interested in using the results. If the results are good enough, you may be able to offer the results to Espotting and Overture as their backfill.
How would your list of URLs be helpful to you starting a new UK directory?
If you are trying to get started as a new engine, then you will be competing with 100+ UK specific directories and "engines" in the UK. Having your own search database will be a big advantage, but it will still have to provide better (or at least comparable) results than the others. Can you do that?
The problem is that the only UK crawler that I know of is Mirago. There are plenty of directories out there, some are good, others are poor; but how will they use your raw
URL lists?
The URL lists could be tweaked to provide title/keywords/description where available. However categorisation may be more time-consuming. However if the directory follows the Dmoz architecture, it probably would be possible to add this categorisation. This part would be the more people-intensive part of the project as the rest is highly automated.
I think that if you can set up a UK specific search engine, then other SEs may be interested in using the results. If the results are good enough, you may be able to offer
the results to Espotting and Overture as their backfill.
That is something I had not considered. It is an interesting option though.
How would your list of URLs be helpful to you starting a new UK directory?
The main problem with sites relying on Dmoz/ODP data is that the sites often do not exist any more. Since the Dmoz float would be actively spidered, the quality of what that float would be better. It would provide an immediate footprint with the smallest possible outlay of resources. Adding the CNO sites then would begin to grow the directory.
If you are trying to get started as a new engine, then you will be competing with 100+ UK specific directories and "engines" in the UK. Having your own search database
will be a big advantage, but it will still have to provide better (or at least comparable) results than the others. Can you do that?
Regards...jmcc
I think that you may be better off separating the services that you are looking to offer. Spidering the ODP data for UK SEs is very different to offering a million uncategorized URLs.
The URL lists could be tweaked to provide title/keywords/description where available
The main problem is that the title and description that spidering provides isn't up to the standards of ODP descriptions. In fact it can be misleading, poor and even offensive. Mixing such results with ODP listings would dramatically reduce the quality. Hence, each site would have to be human reviewed anyway.
In terms of a directory, it doesn't seem that you are offering a great deal. The main problem for the UK directories is not finding UK sites to add, but funding the editors to add the sites. Anything people intensive, like editing, is expensive, and not many of the UK search/directory sites can afford it properly.
I think that your database of URLs is only useful from a spidering perspective. Either it can be used to form a new search engine from scratch, or you could use it to provide a UK filter for a larger engine, such as Fast or Inktomi, with UK sites being flagged as such.
You could do what Fast, Google and Inktomi do and charge a fee per thousand searches performed. I believe that Fast, for example, charge a $50K set up fee and $2 per thousand searches. You could easily undercut that, perhaps even waiving the set up fee. That would make it much more attractive to the likes of Espotting.