|brotherhood of LAN|
| 3:10 pm on Jun 19, 2013 (gmt 0)|
>small number compared to google (per day)
IMO that number may be high though due to 'defaults' like searching via the address bar inadvertently.
Any search engine achieving 100M searches a day would be pretty heavyweight IMO. Even a fraction of that if it was country specific.
| 6:45 pm on Jun 19, 2013 (gmt 0)|
|Is this the opportunity for the independent search engines to start making inroads? |
I've said elsewhere that the door is wide open for a "best of the rest" search engine. The biggest webmasters seem to globally exclude all unfamiliar robots. But you don't need an alternative search engine to find amazon, wikipedia and about dot com; it's the overlooked other sites that will make the difference.
|brotherhood of LAN|
| 7:22 pm on Jun 19, 2013 (gmt 0)|
How about a list of known/up-and-coming/regional engines?
I know of 2 members here that have written their own. Not sure of jmmcormac's URL though :)
Member glacai [webmasterworld.com] wrote Mojeek [mojeek.com]... written in C from scratch as it says on there.
Alongside DuckDuckGo, both use privacy as a major selling point.
Majestic, ahrefs and other backlink providers are international search engines of sorts... perhaps with the addition of storing text into searchable indexes they could get on board?
For anyone considering the task CommonCrawl [commoncrawl.org] has a good sized dataset to work with.
| 8:09 pm on Jun 19, 2013 (gmt 0)|
Any SE that tries to use distributed crawling (eg majestic) is a non-starter in my book. There is no real way to control scraping activities of fake-UA bots (yes, I know majestic has a solution but it's the wrong approach!).
If potential SE operators would like to post tech details here (or better still in the search enging arm of this site) I for one would look favourably on letting them onto my server, assuming they are not parasites.
Part of the data-gathering aspect is a lack of webmaster-targetted information on a) SE names/URLs; b) crawling bot UAs; c) IP ranges (if possible). Some of the SEs I have looked at (including DDG) seem to have very sparse information.
| 9:19 pm on Jun 19, 2013 (gmt 0)|
I've been working on a new version of the SE, Brotherhood of LAN,
The biggest problem is not actually finding new websites but rather sorting out clones and compromised sites. I've the process working well on a monthly basis so that it now detects at least 90% of all new Irish websites before Google and works as a self-cleaning index. When it is ready, it will be time to take Ireland back from Google. ;)
The big problem, at the moment, with Google isn't the tracking issue. It is simply that it is trying to guess what people want rather than giving them what they want. While it is a good global/non-specific SE, it really sucks on a national/country level basis. That's Google's real vulnerability.
| 11:49 pm on Jun 19, 2013 (gmt 0)|
So who here promotes an alt SE on their website(s)?
| 12:12 am on Jun 20, 2013 (gmt 0)|
|The big problem, at the moment, with Google...is simply that it is trying to guess what people want rather than giving them what they want. |
Agreed. A search engine could train a new generation (and willing members of older generations) how to become expert searchers - basically, how to navigate a complicated and treacherous new territory. Instead, the dominant search engines opt (for obvious reasons) to serve a "feed-me-whatever's-big-n-popular" market exactly what that market's accustomed to being served. It's limiting and short-sighted.
| 1:00 am on Jun 20, 2013 (gmt 0)|
Personally I use and promote Ixquick to anyone interested, it reminds me of how it used to be to use google as far as finding what I ask for. They offer a browser search bar plug-in same as bing, google and amazon and even offer a way to search anonymously through google results.
| 2:17 am on Jun 20, 2013 (gmt 0)|
The other major problem for Google is that Search fragmented a few years ago into Generic, Specific and Local. Generic is where the width of Google's search wins out. Specific used to be dominated by Google before the Animal Farm events and the emergence of Wikipedia. But Wikipedia managed to take a major part of this market away from Google. Schoolkids doing their homework and assignments now check Wikipedia rather than Google. That's a massive loss for Google and its damage has yet to play out since there is a generation coming up that doesn't consider Google to be the all-powerful search engine. Local is also a problem for Google. While it buys some credibility with Google maps and various alliances and purchases, there's a critical element in Local Search - it requires local knowledge.
At the time Facebook floated on the stockmarkets, Google was engaging in search engine development by press release. Perhaps some PR flunkies had come up with the Google 'knowledge graph' in a bid to attack Facebook's far more famous Social Graph. The press release story was recycled by clueless "technology" journalists (who wouldn't know one a search index from a hole in the ground) along with the what apparently was Amit Singhal's ambition to make Google capable of answering questions like the ship's computer in Star Trek (as he had apparently seen episodes growing up). There was just one problem with this story - the ship's computer wasn't really capable of answering questions and wasn't really used for such things. Spock (Star Trek:The Original Series), and later Data in Star Trek:The Next Generation provided the answers. Perhaps the PR flunkies in Google didn't really watch either Star Trek series. When a search engine starts engaging in development by press release it should be like a blood trail to a shark where independent search engine developers are concerned.
There's a lot of opportunity for independent search engine developers but most people who think that they are capable of building a search engine (virtually every webmaster thinks that they can do it) are not capable of doing so. There is a lot more to it than relying on blind crawling (where search engines detect new sites by crawling links). This thread from 2005 actually details some of the issues, especially when it comes to country level Search. ( [webmasterworld.com...] )
In some respects, relying on a blind crawling model is less effective now. It is also far more dangerous because some hacked sites can have links to very dodgy sites. And crawling those sites could create a toxic index (legally and technically). Due to a chronic overreliance on Google, many sites don't have a lot of outbound links to other in-context sites. A survey of Irish websites that I run every month actually zero depth (index page) counts links from sites and the most commonly linked sites are Facebook and Twitter. The internal link graph for Irish sites is quite sparse. If this is being played out on a global scale, it would explain why Google and other SEs have developed problems in detecting new websites that don't have Google Adsense or Google Analytics. Google Plus, in the Irish dataset, is being roundly beaten by Facebook, Twitter, Youtube (might be helpful for Google) and Linkedin. But these are outbound links rather than inbound links to individual sites. And without these small sites linking to each other, the blind crawling model currently used by the large SEs is dying. SEOs can help but most of the websites are brochureware sites and the owners are probably not even interested in paying for SEO services. Country level/Local Search are two areas where people can actually compete with Google and win.
| 9:05 am on Jun 20, 2013 (gmt 0)|
"duck it" instead of "google it" sounds good :-)
| 9:28 am on Jun 20, 2013 (gmt 0)|
The first times I used a search engine, like google, I NEVER expected they will collect what a user was searching for, be cause thats simple not good manner. Now they also let the gov have access tsk tsk tsk. I think they just dont want to be popular anymore. I have used ducduckgo for about 2 years now and I like it, its better then google when you use 3-more keywords and you dont get all those domain stuffing, videos,news..... between the results.
| 10:02 am on Jun 20, 2013 (gmt 0)|
Just in the Radio, France is fed up with the ways google handles privacy, they give them 3 Month to handle the matter. Germany, UK,Italy will follow.
Another story, also related - Google has acquired a 512-bit quantum computer from a company called D-Wave. This computer will be deployed in a network of self-learning machines - new prism?
| 11:39 am on Jun 20, 2013 (gmt 0)|
down with google!
they are always trying to force your details out of you.. mobile verification on sign up... 1 account for everything..... google maps....... taking photographs of peoples houses.
Eventually if google has it way they will open a one account bank online..... where they will have control of finances too.
imagine that just like people get banned from adsense..... you have been banned from you bank account for not abiding by a set of rules we make up as we go along..... thanks for the free money!
| 1:23 pm on Jun 20, 2013 (gmt 0)|
This topic is about alternative search engines. Please keep this on topic.
| 2:34 pm on Jun 20, 2013 (gmt 0)|
No. This is so Google can give the users all the possible results that they think that the user wants. It is the new Many-Words theory of SERPs. :)
|Another story, also related - Google has acquired a 512-bit quantum computer from a company called D-Wave. This computer will be deployed in a network of self-learning machines - new prism? |
| 2:54 pm on Jun 20, 2013 (gmt 0)|
@ken_b -- so far it's been mostly ixquick and duckduckgo that I've been promoting .. I never have liked the so called Filter Bubble Of Personalized Results that is Google -- Even whilst signed out, the tracking via IP that Google does is a pain. I get tired of looking at the same stuff over and over again.
I know that Google's search numbers are huge, but I'm wondering how much of a hit those search numbers would take if Google wasn't "forcing" the use of it's properties ..
In the end, it all boils down to whether or not you prefer to have someone else do your thinking for you .. I don't accept 3rd party cookies and clear cache and cookies after every browser session. I also use the Ghostery and others on occasion .. The results in Google are wildly amusing, while the results in duckduckgo, and ixquick remain fairly consistent.
| 3:20 pm on Jun 20, 2013 (gmt 0)|
The big problem for independent search engines remains unchanged - funding. Due to the number of high profile search ventures that crashed and burned (few as spectacularly as Cuil), it might be hard to raise funding but then there is the issue of monetising the SERPs.
| 3:20 pm on Jun 20, 2013 (gmt 0)|
@jmccormac, that sounds good, and I agree on G going the wrong way.
Duck sounds interesting and the results doesn't seem bad at all, at least on my first tries, I will keep playing with it.
Lycos? who owns lycos today? it was good. It was a property of Telefonica.
What happened to Cuil? there were a lot of promises behind that launch.
I'm curious, how an independent SE grows? where do they get the money? There was a time when getting indexed was a paid service, it was interesting because we carefully decided where to invest regarding their traffic and results, but now being free... I mean working on a SE is a difficult job.
|brotherhood of LAN|
| 3:31 pm on Jun 20, 2013 (gmt 0)|
Cuil... well remembered!
After a little read I see their crawl data is available on archive.org which contains 60 billion URLs [archive.org...] . Due to its age I'd suspect that a good 20-25% of those links are non-200 now... but interesting that it's available for use.
| 3:32 pm on Jun 20, 2013 (gmt 0)|
|The big problem for independent search engines remains unchanged - funding. Due to the number of high profile search ventures that crashed and burned (few as spectacularly as Cuil), it might be hard to raise funding but then there is the issue of monetising the SERPs. |
This is so true. And if you did get something significant going it would mean that you have a ton invested. The G-men will just come knocking on your door and demanding access, or else.
The system, therefore, needs to be off-shore in a freedom friendly nation. The nation of choice use to be the US. What nation fits the bill now that is rich enough to withstand an assult on their banking? No country that I can see.
| 3:37 pm on Jun 20, 2013 (gmt 0)|
Some other SE's to mention
Start Page [startpage.com...]
| 3:49 pm on Jun 20, 2013 (gmt 0)|
To paraphrase the immortal line from the movie "Jaws", You're going to need a bigger harddrive. :) That collection is 310TB of data.
@explorador This is what happened to Cuil: [techcrunch.com...]
@Chris13 The monitoring is only the superficial problem with Google. The real problem, for most people, has been the falling quality of SERPs.
Jimbo Wales tried to launch Searchwikia or Wikiasearch and that didn't work out either. The theory was to use the same approach as Wikipedia (free editing and contribution of expertise) to build a social media influenced search engine. Again the big problem with that wasn't the idea as such but rather the simple issue of quality control. The GIGO approach of spidering everything and hoping that the algorithms will give the data some relevance does not work for a Google alternative when there is no spidering strategy other than blind crawling. Google has more resources and people.
| 5:22 pm on Jun 20, 2013 (gmt 0)|
Thanks for the link, that was painful, poor cuil.
Start Page? it says "Enhanced by Google" [scratching head]
| 7:10 pm on Jun 20, 2013 (gmt 0)|
|The big problem for independent search engines remains unchanged - funding. |
That's true in the classical sense. What about crowd-fund or open/crowd-source a new type of search engine? If anything of late has spoken to one thing, it's that resting our livelihoods in any one basket (especially a corporate one) is going to be bad in the long run.
With a strong open codebase, we would see many instances pop-up, many variations - which is good for everyone. Otherwise it's just a case of 'the king is dead, long live the king'. The whole thing would make it much harder for blackhats, they wouldn't need one 'game plan' - they'd need many. We'd have protection in variety.
There is a wider movement towards decentralisation afoot. I've said this before: We need a peer-to-peer application layer. I *wish* I could drop everything and start on that now.
| 7:18 pm on Jun 20, 2013 (gmt 0)|
Not sure what a new type of search engine would involve. However the codebase is not an issue. The real issue is content and keeping a clean and useful index. That's where the real work lies for search engine operators.
| 7:28 pm on Jun 20, 2013 (gmt 0)|
I use ixquick almost exclusively for daily searches and almost always find what I want. In extremis I use bing but that's only once or twice a month at most.
startpage - I used to use that until they included G results, then I went back to ixquick.
Yandex has some of "my" sites indexed but not all.
I have never had SE links on my sites except within the help pages ("search here for more info on problem"). That used to be G. Now it's bing.
| 7:53 pm on Jun 20, 2013 (gmt 0)|
This thread should be a sticky on a home page of WebmasterWorld until there is a NEW SE.
|Google has more resources and people. |
They do have resources, so what? They have to "Use ALGO" to drive profits or so they blame.
| 7:58 pm on Jun 20, 2013 (gmt 0)|
|I *wish* I could drop everything and start on that now. |
|brotherhood of LAN|
| 8:03 pm on Jun 20, 2013 (gmt 0)|
|We need a peer-to-peer application layer |
Seoskunk already posted yacy.net [yacy.net], which does look interesting.
| This 48 message thread spans 2 pages: 48 (  2 ) > > |