Forum Moderators: bakedjake
Right now, the most important thing you can do is help with the "miniarticles" that appear at the top of popular search terms.
[webmasterworld.com...]
A quick comparison to Google:
* both engines cannot count. A wikia search that returned 3 results said "Results 1-5 of approximately 5 for [..]". At the bottom there is a button "Results 4 to 13" which doesn't return any extra results.
* wikia does not put wikipedia articles at the top of most searches like Google does.
Wikia messes with the browser's history, which means it essentially breaks the back button. DON'T BREAK MY BACK BUTTON! Playing with JavaScript has resulted in bad coding, IMHO.
Another thing is the use of the #-character in the URL to indicate the query string. They're thereby ignoring the widely accepted standard of using # for indicating page fragments and the question mark? for indicating query strings.
Usage of JavaScript with standard-breaking # fragments betrays lack of understanding how search engines actually work: anyone who seriously does search engine development will hate JavaScript (and possibly CSS), DHTML or anything non-text. To use JavaScript in such a way on your search site knowing full well it is not search friendly is telling.
I also find it amusing that there is no big public credit given to Nutch on their site - despite its shortcomings Nutch has certainly got far more search features than any WWW scale search software developed by Wikia itself, which to me appears to be non-existant. Naturally giving public credit to them would probably take off some gloss over what Wikia search is. Or maybe they just did not think of it as a right thing to do - either way when you base your main work on someone elses' (freely taken) then it is fair to give them credit.
Ok, there are a fair few examples where some company takes source code and improves it in such a way that a new great product is created - for example Half-Life (game) was created using Quake 2 engine, however it was very highly modified so much that the final work was awesome. Right now it is not clear what modifications (if any) were made to Nutch, but I suppose we will find out soon.
To sum up my mixed feelings I think there are some great new well implemented search engine projects (I don't mean us here - say powerset is very interesting), yet they won't get a tenth of publicity that Wikia search had received even before they shown anything. Life is so unfair, eh?
They say:
We are aware that the quality of the search results is low.
And they are. This project has a long way to go before it's worthy of our attention. Because who knows whether it will ever get to the stage of rivalling any of the other major engines in quality or traffic? And if it doesn't, time spent trying to rank on it will be time wasted.
This is where engines like Yahoo and Google have a major advantage. They know the history of the web for many years and can see patterns of abuse that a new comer will need to learn. It may take years for them to catch up, if ever.
I guess it's too early to cast a real judgement as this is obviously a work in progress.
It is never too early to cast a real judgement, especially in this instance.
I performed 10 searches, the ones that I would perform on Google, Yahoo! and Live periodically just to see how close the three are with their algos. Out of those 10 searches, not one of the sites I normally see in those top 10 results are there. In fact, what I do see, is quite a few results in Chinese and Russian. For one search, 7 of the first 10 were in "other characters" besides English.
Nothing to see here. I guess we'll all keep waiting for the "next" search engine to arrive on the scene, it surely won't be this one from the looks of it.
If they are using nutch, then they won't get far as many block this UA. Better come up with their own UA.
I think this is the least of their worries as Nutch allows change of UA.
The issue really is not to crawl billions of pages, but to actually make sense of them to show Top 10 very good results from such a big index. This is the hard bit and Nutch itself does not solve it - one would have assumed that this is exactly where Wikia can help by throwing money they raised from investors into exactly this kind of work, but so far it seems that Wikia just took Nutch and added some GUI on top of it.
New fancy GUI does not win the market - A9 (from Amazon who incidentally listed as investor in Wikia) used Google's code and database (under license), yet even though they did not have problem with the relevancy, they had problem of differentiation - if people get no better result than on Google then they won't switch, no matter how fancy GUI is.
If you look to MSN, clearly they still have pretty big problems and they launched nearly 2 years ago.
The nice thing about this particular launch is that win or lose, it will keep the existing search engines on their toes.
It uses GRUB running out of swlabs.org to get their index:
149.20.54.195 “Grub/2.0 (Grub.org crawler; [grub.org...] bot@grub.org)”
Now at least you can block it if you want.
...
*dreaming that somehow... somehow after
sorting out fraudulent sites, spam, MFA and prom
sites Wikia WILL get a fair share of the SE market
even with its buggy little version of an open source SE*
[edited by: Miamacs at 10:43 pm (utc) on Jan. 7, 2008]
* wikia does not put wikipedia articles at the top of most searches like Google does.
The first thing I noticed too and it is a really good feeling to use a search engine where wikipedia entries don't push other articles to the third position or even lower.
But that is about the only positive I can say. One niche specific keyword returns 1550 entries on Google and 1280 on Yahoo, but only 4 on Wikia. Those 4 are all from my websites, so actually I should be happy with this result ;) but I am not. According to this figure the reach of the crawler seems to be only 0.3% of the reach of the big boys.
A keyword for another niche returns webpages in all types of languages: The first ten were: English, Russian, French, Korean, English, English, Russian, Czech, English, English.
It seems that they want to return a result in the native language of every volunteer working on the wikia project :) but that is not exactly how it should work. Especially not because the best websites for this topic are in German (not mentioned at all) and English language and certainly not in the other languages mentioned.
So much to do, or probably just another DOA project.
Even so, the whole link based approach to search engine crawling seems to be a 1998 problem rather than a 2008 one. I think that a link based strategy is, given the state of the modern web, broken.
It may be interesting when it gets a real index but I think that it requires a lot of time to develop.
Regards...jmcc
I think that a link based strategy is, given the state of the modern web, broken.
There is no viable algorithmically scaleable alternative to it - it may have been possible to avoid using it in 1998, but in 2008 there is way too much data to avoid result discrimination based on links: when number of matches is significant, which in most cases it is.
I just did five more searches for popular phrases I track. Man, someone sure figured out how to sub-domain spam your index. Nine out of the ten results are from the same root domain.
I say leave search up to Google for now. Take your alpha offline and go back to the drawing board. When you think you are ready, go back to the drawing board again. :)
Wikia is working to develop and popularize a freely licensed (open source) search engine.Popularize it by not mentioning its name anywhere?
Joking aside, I really hope this thing works well. The idea behind it is warm and fuzzy.
I was toying with the idea of using Amazon Web Services' (Alexa's) search results to build a similar Wiki style search engine. At very least I'd have started off with much better search results. Alexa doesn't give great search results, but at least they're usable.
I've put the idea on hold for now because I realized it's such a cliff hanger. Wikia will need a huge number of voting users to actually make a difference, but how are you going to get enough users to use a search engine that sucks? How many votes are they going to need to rank 12+ billion pages in the say 10 billion most common search phrases? Even if you can get a massive following of voting users, will they ever be able to compete with Google's years of algorithm development along with their search and "Google Analytics" data?
At the end of the day, for there to be another Google style (Search only, no existing web portal) rise to fame for another search engine, it's going to have to be as revolutionary as Google was back when it started. How is a search engine that states that "the quality of the search results is low" going to anything near revolutionary? Wikia can't be aiming for anything short of revolutionary search results. They need that critical mass. Will users really start a mass exodus from the big 3 just because Wikia search is warm and fuzzy? It's not like we're paying to use Google.
Two very small points, so going small
When is the donate button [wikimediafoundation.org] to the search engine coming, h?
I hope they don't really read this thread anytime soon, or they might give up in few days time.
Interesting to see Wikipedia and Google squaring off again. Accomplishments aside, Wikipedia seems more true to their slogan ("Be Bold") than Google is to "Don't Be Evil".
Take that, Knol!
Tried voting five stars for one of my sites
I think that comment sums it up nicely.
Multiply that by zillions of other webmasters all voting for their own sites (understandable) and organised spam teams doing the same and its a short matter of time before the serps are as useful as a chocolate tea pot
I won't write it off just yet, im all for new projects and new ideas but you cant reinvent the wheel, Should wiki search gain any traction what so ever in the market spam will be a major problem for it, currently it doesnt have enough data to offer any search service so no idea how long it will take to collect the base data before it even gets to sorting out the good from the bad.
Also, i just dont see anything different or anything that makes me think "wiki it" - its just another search engine only with less financial backing and market share than msn or possibly even ask jeeves?
The big difference from other alternative search ventures is that it is funded.
I think the biggest difference not that (fairly few search engine projects were funded very well), but the fact that this venture actually uses stuff that is developed by others (mainly for free, ie Nutch).
In a way this is like Red Hat of Linux, only Red Hat has added value - insurance (tech support).
I think the biggest difference not that (fairly few search engine projects were funded very well), but the fact that this venture actually uses stuff that is developed by others (mainly for free, ie Nutch).And it wants people to work for free too.
In a way this is like Red Hat of Linux, only Red Hat has added value - insurance (tech support).I am still trying to work out if social networking can be applied to search. It looks like it is taking the idea of an edited directory and applying it to search. However when searching people do not necessarily want to know what people think of a site or a topic. They want to get to the relevant site(s) as quickly as possible. That's the fundamental test for the quality of any search engine.
Regards...jmcc