Forum Moderators: IanTurner & engine

Message Too Old, No Replies

fast new uk search - that has its own results

         

penfold25

3:16 am on Jun 16, 2004 (gmt 0)

10+ Year Member



Just stumbled accross a uk engine called [ukwizz.com...] that seems to be very new, the results are not too bad and serps seem to come up pretty fast. Anyone else heard of it?
Im sure it could definitely take a small percentage of the UK market maybe....

[edited by: Brett_Tabke at 8:12 pm (utc) on June 16, 2004]
[edit reason] [webmasterworld.com...] [/edit]

jmccormac

3:33 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is based on the Aspseek Open Source search engine software. The index does not seem to have been cleaned yet. The name is a problem as <snack> is registered as well.

I'm not sure about where it got its initial float of UK websites but I would not be surprised if it was from the Dmoz RDF dump. It is very much a work in progress though but a fair bit of crawling seems to have been done to get it to this stage.

Regards...jmcc

[edited by: Brett_Tabke at 8:11 pm (utc) on June 16, 2004]
[edit reason] no linkless urls please anywhere on webmasterworld [/edit]

penfold25

3:43 am on Jun 16, 2004 (gmt 0)

10+ Year Member



To be honest, in the UK market their is need for a new engine, i think google does very well of answering most queries , but their is still a need for a stand alone uk engine. I believe presently the ones that exist are not doing that good.
Maybe with a bit of modification and good marketing it can definitely do pretty well in the UK market.

jmccormac

3:57 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To be honest, in the UK market their is need for a new engine, i think google does very well of answering most queries , but their is still a need for a stand alone uk engine. I believe presently the ones that exist are not doing that good.
I'm almost tempted to set up a UK search engine. :) But it looks very much like Google et al have reached the limits with their simplistic localisation based on IP and cctld. The last time I ran a test on UK domains/sites in the com/net/org, I had a list of approximately one million websites.

Maybe with a bit of modification and good marketing it can definitely do pretty well in the UK market.
It probably needs a significant budget to do any decent marketing in the UK. I am not sure about the modifications though since I don't know what kind of s spider cycle it is on. The other key factor is the SE's acquisition strategy - how it detects new sites. It has got to have some kind of edge over the competition and though the initial results look promising, there is little to sustain it in the face of well funded and well organised competition. It is a tough business and most SE start-ups seem to have operational lifetimes of about fifteen months, especially if they cannot monetize the results. There is no indication that the SE is commercial.

Regards...jmcc

vaniaul

4:27 am on Jun 16, 2004 (gmt 0)

10+ Year Member



Hey jmccormac

The last time I ran a test on UK domains/sites in the com/net/org, I had a list of approximately one million websites.

What is the method for getting sites related to get UK domains/sites in the com/net/org.

Thanks in advance, looking forward to your reply.

jmccormac

4:43 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Vaniaul, the first level was simple IP searching (cross checking website/domain IPs against country sorted IP lists). The second level was cross checking with other sources for non-UK hosted sites and the third level was linguistic analysis. The level two and level three searching wasn't deep as the priority was identifying Irish domains. The same basic algorithms work for most countries but each needs its own data sources for level two and level three searching.

The UK is a complex country to model. It is a major hoster country and a lot of non-UK websites are hosted in UK IP space. This means that you can often find the websites from a few countries hosted on a large UK hoster. The simple nameserver location approach may give you a coarse set of domains, it is necessary to go deeper to the domain level for precision. And this is why level 2 and level 3 analysis is necessary to produce a good search index for any country.

Regards...jmcc

[edited by: jmccormac at 4:58 am (utc) on June 16, 2004]

vaniaul

4:52 am on Jun 16, 2004 (gmt 0)

10+ Year Member



Dear jmcc

Thanks for the prompt reply. I'll try the same way and let u know how far I'd succeeded.

CHeers
Vani

sidyadav

5:07 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



UKWizz had some pretty decent results there..Google maybe leading now, but I think niches are the future.

This is the first time I saw it, so I wouldn't expect it to have more than 20 million in it's index..also considering the fact that it's uk-specific. Anybody know how old it is, exactly?

Sid

jmccormac

5:20 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The domain was registered on 16th January 2004 and the .co.uk was registered on 29th April 2004. It looks like a relatively new SE. It does not have grouping by site on the SERPs so it may give a lot more results than are necessary. Apart from that it is possible to defeat Google on a niche basis but the UK is a very big niche. :)

Regards...jmcc

johnser

10:15 pm on Jun 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks like dmoz to me.
Not a lot of sites in the index either.

Try the obscure misspelt phrase of "cheep donkey"
Compare results also with UK-specific Google.

They need an add URL page ASAP - & some way to get Adsense on there if they want to cash in....
J

creative craig

8:44 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The search engine is new, only been online for a couple of weeks. The owner is attempting to keep it strictly UK results, which is a huge task.

The owner is a well known UK member of this forum, I will let him post though ;)

PCInk

8:56 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have just put a search in for a common term that gets me a lot of orders:

UKWizz - 0 results, Google UK (on UK only) 963 results.

It needs a lot more in the index if it has not got one page with that term in it, as it is quite popular.

I don't know who runs the site, but they could get legal action from Google. The whole site is 'playing off' (which is a legal term that can get you sued) Google's design. Take a look:

1) First result (Google has I'm feeling lucky)
2) Cache results (take a look at the text at the top of the page - it is clear that UKWizz have copied and changed Google's text - copyright infringment)

That was just from a very quick couple of searches. Needs a bit more thought on the design front. (Esp. first result - does anybody ever use it anyway? - accidents don't count!)

sem4u

9:01 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Needs some work on the results and the design of the results pages.

Still it's good to see another UK search engine around :)

sidyadav

9:27 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) First result (Google has I'm feeling lucky)
2) Cache results (take a look at the text at the top of the page - it is clear that UKWizz have copied and changed Google's text - copyright infringment)

It's ASPseek which does all that originally (not UKWizz, well, it uses ASPseek so it does it too): [aspseek.com...]

Sid

PCInk

9:42 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks sidyadav! ASPSeek.org is even worse at copyright infringement - looking at their homepage!

It is good to seek a UK based engine and would be good to see it improved.

sidyadav

9:56 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> ASPSeek is even worse at copyright infringement - looking at their homepage!

Yup. I think Google ain't taking action because it's open-source/non-commercial - or else... ;)

Sid

Brett_Tabke

1:10 pm on Jun 18, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You guys are looking at the wrong site or just intentionally being obtuse for the sake of appearing coy. Aspseek is open source software at it's finest. [aspseek.org...] If you are talking about caching - then take it up with Google - they set the standard.

Now, back on topic. A vortal is a tough thing for a one man show to run. The results have a tendency to get contaminated with side topics pretty easy. Hats off to [ukwizz.com...] for giving it a try.

christopher

11:46 pm on Jun 19, 2004 (gmt 0)



UK Wizz has no contact information. That always makes me nervous.

What is the purpose of not giving it's contact info?

Also, from looking at the format of the search results, they look like they can't be their own, because who on earth would write descriptions in that manner.

No company would stagger it's descriptions in that way.

Company descriptions look too messy, which would reflect badly on a company, plus if you input companies like that, then you may as well make the descriptions tidy and readable in the first place.

It's a feed.

jmccormac

7:47 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Company descriptions look too messy, which would reflect badly on a company, plus if you input companies like that, then you may as well make the descriptions tidy and readable in the first place.

It's a feed.


Christopher, I don't think you know how to analyse search engine sites because it takes a bit of experience and familiarity with search engine software to be able to identify the key characteristics of different SE software. In this case, because it is still in development, some of the vanilla results page html was unchanged from the search results template page included in the software package. It was obvious if you looked at the raw html. It is not a feed.

This is how searches really look from a search engine/search engine results point of view. There is a serious difference between running a simple directory and running a search engine. When you are running a simple directory, you can control the quality of the data included and render it in a more readable fashion. With search results, you can only include or block sites. You can limit the spidering level but the reality is that the vast majority of websites are NOT search engine optimised. This is not a mere anecdotal estimate, this is based on hard data from running country level search engines for the last four years or so. And even more surprising, the majority of websites in any given country are brochureware sites that may have one or two updates per year.

Running a SE covering the UK is a tough operation and it is not the same as running a little directory where you have a lot of control. You are dealing with at least 5 million potential *.uk websites (though this may drop to approximately 2 million active websites) and at least another few million com/net/org websites. At a guess, a UK SE could be looking at approximately 4 million top level sites before it even gets around to personal subdomain/directory websites. Sorting these sites, and figuring out what is and is not a UK relevant website is very difficult. And then you have to factor in the processes of acquisition of new sites and spidering.

Regards...jmcc
[1] Luckily it is a small country.

[1][edited by: IanTurner at 8:48 am (utc) on June 21, 2004]
[edit reason] language edit [/edit]

sidyadav

8:10 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well said jmccormac!

I can't imagine what it would be like to own a search engine and get accused of using a feed after all the hard work you've done for it -- absolutely terrible.

Also, for those people who think UKWizz was setup within 5 minutes using ASPseek - you're absolutely wrong.

Because for all I know, ASPseek does not include a IP/domain filter - and the people who own UKWizz have done a fine job with that.

Sid

steve40

11:18 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well done to the guy that is going for it with ukwizz , must take a lot of guts and determination not to mention time and money so i hope they succeed , and others on the board who are quick to critisise should learn to curb their words unless they have taken on a project as large and complex as this and succeeded.
steve

[edited by: IanTurner at 8:50 am (utc) on June 21, 2004]

sidyadav

11:47 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> others on the board who are quick to critisise should learn to curb thier words unless they have taken on a project as large and complex as this and succeeded

Exactly. Taking a challenge is good - but doing it is excellent.

What some don't understand is that not everybody is Google, or Yahoo, or MSN, or Ask.

Hats off to the people who take challenges as such, even knowing they have bigger and better competition.

I actually wish there was a funding organisation for these kind of projects -- imagine what the world could be! ;)

Sid

steve40

11:56 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sid there is its any one of the following

bank ( no chance )
VC ( only if your willing to let bean counters change your idea )
remortgage house ( just the wife to worry about )
steve

christopher

12:55 pm on Jun 20, 2004 (gmt 0)



Steve - I and I don't think anyone was critisising his site, it was more like trying to understand how it gets it's data.

I imagine that the other guys wanted to establish that it wasn't a brochure site.

And I have taken on a project as large as this.

[edited by: IanTurner at 8:51 am (utc) on June 21, 2004]

jmccormac

11:29 pm on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And I have taken on a project as large as this
With all due respect Christopher, I don't think that you even have any idea of the magnitude and complexity of a country level search engine project.

Just in the initial dataset, excluding the personal and subdomain websites, you are looking at approximately 30 million potential sites. It is a case of reducing this set, through various techniques and then making decisions on what to spider and what not to spider. The big problem is that by the time you start spidering these sites, some sites have been added to the initial dataset and some have been removed. And of course since the UK is a key hoster country, a lot of the sites hosted in the UK are not necessarily UK relevant. So just on the basic acquisition phase, it is very like cryptanalysis. (Not James Bond but rather the Purple/Enigma/Bletchley Park/Breaking Sky stuff.)

Running a country level search engine is a very difficult thing to do. It takes a lot of guts, a lot of work and a lot of on-going research. This is why there are so few country level search engines.

Regards...jmcc

exmoorbeast

12:03 am on Jun 21, 2004 (gmt 0)

10+ Year Member



Last count Mirago had over 700 servers and have been hard at it for years. An earlier post mentioned that they were in profit.

They spent years searching for revenue streams, and are now growing very quickly indeed. If their plan is anything to go by, some reasonably deep pockets are required.

Correct me if I am wrong, but aren't they the only genuine UK index of any serious size?

I think gigablast (maybe someone else?) has a fascinating diary of how they built there engine, mentioning that a bare minimum they needed 7 beefy servers with 600gb storage on each to do the job...and that's before they looked at documents such as pdf etc....

I take my hat off to anyone that is doing this. Not an easy job at all. In Christoper's defence though, is this a genuine index? Haven't had the chance to have a look?

christopher

2:57 am on Jun 21, 2004 (gmt 0)



Interesting comments, regarding Mirago and others.

But I'm curious to know why they don't outsource to hosting services. It would cut their (Mirago) personal server costs wouldn't it?

--------------------------

"With all due respect Christopher, I don't think that you even have any idea of the magnitude and complexity of a country level search engine project"

--------------------------

hmmmm, although I'm not a techy person - I think it's unfair to suggest the above, mainly because you are talking in vague terms, and without any knowledge of my operation or the people involved in the running of it.

I'm not going to explain my technology, plans or naming of clients etc on here, as members have previously accused me of lying etc.

But I guess we shall see, what we shall see.

sidyadav

3:45 am on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Coming back on topic...I wonder how many pages UKWizz has now?

Sid

[edited by: IanTurner at 8:56 am (utc) on June 21, 2004]
[edit reason] see sticky [/edit]

IanTurner

2:24 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think that the engine is good at the moment, though the real challenge will be scaling up to a significantly larger index without the drop in performance.

steve40

2:58 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think if I was to undertake a project of this size
The following would be minimum requirement
8 - 10 servers plus mirrored in another location
4 - 8 techy staff at start of project

Outsource of PR and marketing to external company
And Specialist in Finance Funding and business partnership

The above would only scim the surface and would also require sufficient funding to allow 3 - 6 years before a breakeven point

and most important lots of skill and dedication and a large sprinkling of luck

Having looked at current offering there is much work to do with data collection and manipulation , but as any on here could verify if we looked at our original web offering 3 years later we learn to adapt and grow to change and change again ,
I hope that those involved with UKWIZZ have the finance , tenacity and skills to do so ,
As there is the room and traffic in the UK market to support a new search engine
best of luck
steve

This 71 message thread spans 3 pages: 71