|Massive Cuil Search Engine Launched|
"World's Largest Search Engine" is the claim
|SAN FRANCISCO — In her two years at Google, Anna Patterson helped design and build some of the pillars of the company’s search engine, including its large index of Web pages and some of the formulas it uses for ranking search results. |
Skip to next paragraph
The makers of the Cuil search engine say it should provide better results and show them in a more attractive manner.
Now, along with her husband, Tom Costello, and a few other Google alumni, she is trying to upstage her former employer.
On Monday, their company, Cuil, is unveiling a search engine that they promise will be more comprehensive than Google’s and that they hope will give its users more relevant results.
Let's see where the traffic numbers are next month :). I bet they'll still be close to 106...for for the search engine category.
The article I quoted above has a few interesting obervations on the infrastructure side of a search engine.
|Don't use NFS (network file system)... Current NFS implementations can't stand the punishment inflicted by the runtime system, or the indexing phase without using "spendy" specialized hardware... Next, using NFS in the runtime system, you will get machines that don't have fault tolerance. If one of the NFS'd machines is sick, then the rest just seize. Not good. |
...and on the final page:
|When you look at all these steps and all the complications, this process is rife with things that go can wrong. The hardest part about writing a search engine is that you're going to process billions of URLS and serve millions, if not billions, of queries. This does not leave a lot of room for error. |
I wonder how many of the warnings in Patterson's article came up. Serving up those error messages on day one must have been a great frustration for the Cuil team, because they were not exactly naive about the hardware, software and bandwidth challenges going in. But I'll bet we don't see those error messages very often.
As for relevance improving - well, it certainly has to if they're going to compete. In that old paper, Patterson said essentially that you need to start measuring relevance by using all the on-page factors you can muster. The dynamic processes involved in off-page systems like PageRank are prohibitive at start-up and should only be added in later. (Like after another round of funding, I assume.)
I don't understand how this SE could even remotely be considered a contender. The for the searches I've tried, the results are horrendous.
city state hiking trails: I get B&Bs, real estate companies, even a B&B in a town that is in a different city and state 90 miles away. Not one list of hiking trails, not one page for even a park. 3 of the listings are to pages on the same site (none about hiking trails)
city state mexican restaurants: at least in this mix of results, one listing was at least a dining guide for the area. The next closest in terms of relevancy were hotels in the area, most of those spam pages on .cz & .pt sites. (the .pt site was a single page and the only html or text in the source was 5134 various links, no theme whatsoever, no other html structures at all, just a list a a href's). One result was a beach club resort in a different state 1200 miles a way. Another interesting result was a yellow page list that listed a Mexican restaurant, but it was a different city & state - but the city was close, one vowel different. The actual city was not mentioned anywhere in the source of that page.
So to say they aren't focusing on relevance seems to be an understatement at least with the little I've looked at it.
Couple of quick thoughts:
I was just playing with it now. I consciously waited until today to give the initial traffic a chance to die down and see how the engine performs on "non-launchday load."
They're taking a page from google in the simple page design category from the front page. An input box, a search button, and not much else. The black background is refreshing after staring at white screens all the time, but they thankfully limit that to the front page. (Black backgrounds get old, fast).
The drop down "suggest as you type" in the search box is pretty intuitive. It seems to be getting the idea of what I'm searching for quite quickly. However, in FF at least, when you scroll down over one of the suggestions and hit Tab it moves the system caret to the search button without auto completing. Forward Arrow achieves nothing. This is a "must fix" because it gets really annoying really fast.
I didn't look really closely at them, although what comes up seems fairly relevant to the terms I'm typing in. This may tick off some WebmasterWorld members, but we're the worst crowd to be judging search results. The people here look at search, and search results, completely differently from how a normal user looks at them. Hold off judgement of the search results until we can get some analytics of how the general surfing population sees them. You never know, Cuil might be hitting a "sweet spot" for the general population.
Speed of results: impressively fast. I'm glad I waited until today when the traffic had died down and I could get a better sense of what their speed is going to be like on an average day.
Results layout (presentation): Very nice. I've had to greasemonky Google to make it display half as well. Also, try punching in a number of different terms and phrases. Depending on what you put in, you either get a page of results, a page of results with a "tab" bar of suggestions across the top, a page of results with a nice expanding menu selection in the top right corner, or a combination of the tabs and the expanding menu. This gives the user a lot of options for refining results right off the bat. Watch for Google to copy this, because its a winning design feature.
Overall first impression: I like it. I'm going to go as much of the next week using it as I can without reverting to Google, just to see if it "works for me." It's the first new engine in years to capture my attention, so I'll give it a fair trial. Most likely, it will fail me, but I'm going to give it the chance to fail on its own merits.
Although my main website shows up on the first page of Cuil for many important keywords for my business, I still don't like this search engine. It's pretty clear they don't know how to do deep crawling. They rather show a load of spam instead of interesting pages that are "deep" (3 clicks from the homepage)in my website.
Also the images they show are a joke. The funniest example I saw today was the Google logo when I was searching for something that had nothing at all to do with Google. Actually, they are infringing Google's copyrights here by using that logo ... If I was in Google's place, I would sue them.
I know I won't make many friends for saying this but the searches I tried came up with some very good results.
We didn't get a flood of traffic (I don't think anyone will have), but we got enough to see the spike and tail off.
Here are the traffic percentages that Cuil sent on a pretty busy site:
28th July: 0.145%
29th July: 0.244%
30th July: 0.122%
31st July: 0.075%
That suggests to me that by Friday they were already down by 75% against their busiest day. Given that people will still have been discovering them I'd say they have not converted many people yet.
I do hope they can fix the issues, but I feel they have made a huge mistake and now face a steeper hill to climb.
Very interesting stats inbound, would be great if you could post an update next week! :)
I found a use for cuil. Search for your domain name minus the TLD and then note all the scraper sites it brings up instead of your site.
|Here are the traffic percentages that Cuil sent on a pretty busy site: |
28/7 = 0.3643%
29/7 = 0.5073%
30/7 = 0.1332%
31/7 = 0.2049%
In general, the portion of referred traffic from Cuil seems to be on a slightly higher level. But I see the same "spike" on Tuesday (if one can call it a "spike"), and after that it goes down steeply. I guess the SEO folks have heavily explored it; a number of consumers have tried it as well. And both groups found it, erm, "not so good" (to stay within TOS).
It randomly pulls up pictures from the page whether they are relevant to the term or not. Weird results for our niches as well. I did find a few scraper sites though.
Perhaps its me but i just dont see any ADVANTAGE to using Cuil.
Taking everything out of the equation speed, relevance, depth etc etc for a search engine to take market share from google it has to be better than google - currently Cuil is not
Early days i know, and whilst as webmasters we need traffic shared more evenly between all engines rather than the market being dominated by one, I just cant see Cuil doing it - i dont see it as innovative enough or offering anything better, so what factor is going to make search enginee users change from google to cuil?
I really do wish them well and hope its a roaring success but i dont see anything that makes me want to change from using google so far.
The amount of hype this launch got is the only reason it got impressive numbers like this. Fact is do some basic searches and see it is not that useful of a search engine. Results are repeated, unrelated, and images are useless (often not belonging to the site they are next to). The layout is refreshing to say the least as Microsoft’s Live and Yahoo are blatant copies of googles trademark style.
Personally I won't stop using google any time soon; it is still lightning fast and gives me the most relevant results. Unless cuil makes some serious changes I think it will be very short lived.
|Search for your domain name minus the TLD and then note all the scraper sites it brings up instead of your site. |
I am also seeing this. My site is arguably the most visible on the Internet in my niche. It is up there because it deserves to be up there. When I search for my domain name it is not in the top 100 results. When I search for the main keyword (an acronym) it is still nowhere but my images are used on all the results.
It has therefore found my site, decided the content is irrelevant but that the images will do nicely thank you!
|Taking everything out of the equation speed, relevance, depth etc etc for a search engine to take market share from google it has to be better than google - currently Cuil is not |
I would say that at this stage it does not have to be better than Google. Most people would be willing to accept some teething problems, but it has to as good as or almost as good as Google.
Here's a comment from them [cuil.com] ...
|Wow. That was intense. Looking back at the first 48 hours since launch, it was quite an experience. After a lot of hard work, we were thrilled to begin offering our new approach to search. We were even more thrilled with the interest, and traffic, we received. |
In fact, it was overwhelming—literally. While we had planned for a large number of searches on our first day, we hadn’t planned on more than 50 million. After all, that’s in the same ballpark as Microsoft’s Live Search and approaching Yahoo!. And they have a bit more infrastructure than our small start-up.
So for a good part of the first day, the traffic volume simply outstripped our ability to respond. Some machines failed. Some bugs were found. Some of our redundancies…weren’t so redundant. This meant some searches didn’t get the best results. Some didn’t get any.
And yet, for a lot of searches, Cuil did provide users with new results, different from the ones folks have gotten in the past, according to the reports we’ve received. This is one of our goals—to give people an alternative to existing approaches.
Thank you very much for the feedback. The emails we’ve gotten at #*$!x@#*$!#*$!#*$!x.com have been very helpful, telling us areas you enjoy—such as the layout and the search by category feature—and areas where we need to improve—image matching, for example. (my emphasis) We read them, so please keep them coming.
How hard can it be to match an image from the result to the result?
[edited by: BeeDeeDubbleU at 6:51 am (utc) on Aug. 3, 2008]
This has to be worth something when evaluating what Cuil has to offer, I am sick of Googles never ending quest to pry into my details.
"We believe that analyzing the Web rather than our users is a more useful approach, so we don’t collect data about you and your habits, lest we are tempted to peek. With Cuil, your search history is always private.”
"when you search with Cuil, we do not collect any personally identifiable information, period. We have no idea who sends queries: not by name, not by IP address, and not by cookies (more on this later). Your search history is your business, not ours."
the results are not truly horrible, and I am certain we are going to see a gradual refinement. I see promise. let us wait and see.
I feel we must support cuil as much as possible, with searches and feedback - we need successful alternatives urgently. google's total dominance & arrogance is a very bad thing, as we all very well know. and competition is a very good thing.
I would love to see them improve and succeed.
|Wow. That was intense. Looking back at the first 48 hours since launch, it was quite an experience. |
Intense. Yep, also from a copyright point of view.
|Some machines failed. Some bugs were found. Some of our redundancies…weren’t so redundant. This meant some searches didn’t get the best results. |
So, how difficult is it to fix the images "bug"? Not too difficult I think. Unless you are ripping the images knowingly and for a purpose.
I am talking to my lawyer now. It's going to be ugly (not because of me, but because of others doing the same).
|Several court cases have found that thumbnails are acceptable use. |
Playing devil's advocate is fine; but in this context, that is factually incorrect.
|the images will do nicely thank you! |
The crawler appears to respect robots.txt conventions, so I would block it from your images directory until they get it fixed.
|The crawler appears to respect robots.txt conventions |
That has not been my experience at all, and "noindex" meta tags are not respected either.
|block it from your images directory until they get it fixed |
That does not stop them using other people's inappropriate images as part of my listing.
I cannot support a search engine that treats webmasters with such contempt.
|That does not stop them using other people's inappropriate images as part of my listing. |
Blocking them from crawling your site will solve both of those issues. You can force the issue via the contact link here:-
Strange because it appears to be obeying the robots.txt on some of my sites it's blocked from. If you've only just updated your robots.txt it could take a day or two to get recached.
Definitely UNcuil... at least from a UK perspective. Terrible search results in my sector displaying sites on their first few pages for some of my search terms that are obviously scraper sites and others where the text displayed in the search results are massively stuffed with keywords. Homepage, black is nice but results page looks rubbish.
I so wanted this to be good, but it's got a massive way to go to be half decent
|I feel we must support cuil as much as possible, with searches and feedback - we need successful alternatives urgently. google's total dominance & arrogance is a very bad thing, as we all very well know |
Why should we help Cuil? They have enough investors' money to do their job proprely. They still claim they are "the world's biggest search engine". People who are this arrogant and liars don't need our help!
I don't think we need alternatives urgently. Google is doing a fine job! Did you ever meet Google people in person? I did, on several occasions. No arrogance, just friendly and down to earth people trying to develop the best possible products.
|Several court cases have found that thumbnails are acceptable use |
Thumbnails are ok. Using people's images to promote other websites is not ok. That is copyright violation. How would you like to see your company's logo being associated with a p*rn or hackers website?
|I don't think we need alternatives urgently. Google is doing a fine job! Did you ever meet Google people in person? I did, on several occasions. No arrogance... |
I also do not see Google as being particularly "arrogant", but this has nothing to do with whether or not they hire nice people, so I disagree about the urgency to get more competition -- we need it, and we need it sooner rather than later. It is never good for one company to thoroughly dominate any field, be it search, auto supplies, or anything else. Yahoo, MS, Cuil, and every other SE in the world combined do not equal Google's market share. Why is that? Because in the eyes of the hundreds of millions of people who use the WWW, Google is the best. I applaud them for what they have accomplished, but their dominance will seriously hurt you or me and everyone else who may drop significantly in the SERPs, because of some algo twist that we did not see coming and cannot find in our own pages. So I hope Cuil takes 10% of Google's traffic in the next year, and 10% more in the year after that. Then, maybe we'll not feel quite so stressed out about the behavior of the thousand pound gorilla.
|I feel we must support cuil as much as possible, with searches and feedback - we need successful alternatives urgently. google's total dominance & arrogance is a very bad thing, as we all very well know. and competition is a very good thing. |
If you hate Google so much, you'd do better to lend your support to a better SE - weak competition only strengthens Google's superiority.
I'm not saying this SE will never be able to compete - just that it has so far provided no reason to make assumptions on it's behalf.
|I feel we must support cuil as much as possible, with searches and feedback |
I would've agreed IF they did their job of delivering at least a basic bug-free engine with relevant results that respects copyrights.
But what I see is a spam-laden engine that does hardly keep what it promises, blatantly disrespecting copyright. Why would I help such "geniuses" to build a better engine? What do *I* get in return for helping them to get rich off other peoples content? Why help grow yet another Google?
Do you use alpha software? Then why whine about an alpha based search engine. Do a couple search - say, hrmph - and move on for a few months an check back again. A search engine has to be trained, and this was just the first day of school for them.
>blatantly disrespecting copyright.
Atleast they are not republishing pages like the other search engines. Clearly, they respect copyright quite a bit.
>What do *I* get in return
Same thing as other engines - the possiblity of traffic.
|Between the new look at Live Search and Cuil - ummm, Google is looking a bit old and dated. |
|All Google would need to do is reskin their look and feel and it would be their second wind. I do believe that change will come. |
Google did this already...it's called iGoogle. I get my news, weather, etc. along with a different skin every day. If you can think of it, Google has done it, is building it, or has realized it's a bad idea(Cuil's images).
just to the Register article inc strawberries and muffins
|'You have to be nice to people |
- and, it seems, continue being nice even when results lousy, and photos from other sites slapped around willy-nilly
350 employees,did I read? Some US$33 million?
= and results worse than I recall from search engines in days before google? (If the web was new, might think cuil has potential; but as noted, it's 2008)
Yes, Asia proved a flop, despite hype; sold albums, but when did you last hear a track by them?
Indeed wonder if fandom flying in face of quality results from shares bought and/or unduly strong friendships, or simply having rosy tinted monitor.
Register article sums up in first sentence:
[edited by: tedster at 3:35 am (utc) on Aug. 4, 2008]
Brett, you are still playing devil's advocate, aren't you? :)
|A search engine has to be trained, and this was just the first day of school for them. |
They may have got away with it if they had told us that rather than implying that they they were bigger and better than Google. No mention of first day at school in their blurb. ;)
Remember this is what they told us ...
|Welcome to Cuil—the world’s biggest search engine. The Internet has grown. We think it’s time search did too. |
The Internet has grown exponentially in the last fifteen years but search engines have not kept up—until now. Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft.
Rather than rely on superficial popularity metrics, Cuil searches for and ranks pages based on their content and relevance. When we find a page with your keywords, we stay on that page and analyze the rest of its content, its concepts, their inter-relationships and the page’s coherency.
Then we offer you helpful choices and suggestions until you find the page you want and that you know is out there. We believe that analyzing the Web rather than our users is a more useful approach, so we don’t collect data about you and your habits, lest we are tempted to peek. With Cuil, your search history is always private.
The emphasis is mine but the implications are theirs. ;)
|>What do *I* get in return? |
Same thing as other engines - the possiblity of traffic.
Brett, how am I supposed to get traffic when MY images are standing next to results for COMPETING web sites, leading not to my site, but to the competitors sites? And my page is nowhere to be seen?