Forum Moderators: phranque
I am planning to start my own google. So I need your ideas about
* What should be the database? MsSQL2000 MySQL etc.
* What should be the programming language? PHP, ASP, JSP
* How will I have the database of websites? I am planning to spider google directories to fetch sites and add them to my database? Will that work?
* How can I make it to handle 700K searches per day.
Thanks
It's hard enough for companies that have invested tens of millions of dollars in the technology to compete with the likes of Google. Without having a much better mousetrap in mind (as Google's founders had), if it was easy enough to spend, say, only $100,000 and come up with a search engine that users wanted to use (i.e. your self-described 700K hits), hundreds of people would have done it already.
Also, Google will not allow you to spider their results. They forbid the practice and police it. So if you do go ahead with this idea, you'll need another solution.
* What should be the database? MsSQL2000 MySQL etc.
* What should be the programming language? PHP, ASP, JSP
Of course it may be possible to run a small so-called search engine with - let's say - several million pages using a RDBMS like MS SQL. But if you want the size of your index to grow so that it can somehow be compared to that of a real search engine you won't be successfull with it.
Welcome to WebmasterWorld.
To be blunt, asking the questions you've asked I would say you're approaching from the wrong direction.
The starting point of any search engine is not the database, it's the algorithm. You need to come up with a google-busting idea. Building databases and spidering pages is easy, organising and indexing the data in a meaningful fashion is the tricky part.
Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time. It became the fundamental basis for google's alogrithm when it was started. Brin & Page went and wrote a paper on it and then got funding.
Come up with a unique and novel google-busting algorithm first, without it all you'll have is a meaningless database.
If you already have a unique idea for the algo, then get some help on building the database and spiders and get a demo running. Then go get funding. You'll need lots of funding if you're to stand any chance at all.
TJ
Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time.
I think you're right that the idea was unique to search engines at the time trillianjedi.
The concept itself was first proposed by Eugene Garfield in the 1950s. He called it Citation Indexing. Garfield went on to develop it as a way to index scientific publications.
Google became the SE market leader using marketing differentiation to promote a marginally better computational differentiation.
They did it when barriers to entry were low. Today they are not.
Why should the barriers be higher now? It's still as easy to switch from search engine A to search engine B as it was then.
I think the social/psychological leap required to get people away from the leading brands is greater than ever.
Google is good. Alta-Vista and the others were not as good at the time.
Beating google requires both a great engine, and the marketing to pursuade people that google is bad, or at least not as good.
MSN have that kind of mighty influence. I doubt that statbat has (with all due respect to statbat).
I take your point and it is rather chicken and egg. You need a good algo to have a product worth marketing, and you need good marketing to get people to hear about your good algo.
The point I wanted to make to statbat was simply to forget worrying about building spiders and a database - that part is relatively easy. The hard part is building a good product around the data (let's say part algo, part marketing).
In my original post, I was coming from the direction that it's the algo that requires the brains. It needs a clever idea to get market share and pursuade people away from the established brands.
But I agree you can also look at it from either direction. It would also require very clever marketing and branding.
Or the biggest stroke of luck in the history of the internet (google get's hit by a virus taking it down for a month and Bill Gates goes bust in the same week).
TJ
Statbat: Check out Gigablast - [gigablast.com...]
Basically has done/is doing what you're trying to do, articles on the site detail problems he has come up against etc.
I agree that it will be almost impossible to bootstrap (pull ones self off the ground with your bootlaces - for those not sure of the expression) a search engine today. Gigablast is the closest I've seen.
So here is a step by step list for you:
1. Get a huge amount of money
2. Hire some very smart people
3. Produce your search engine (with as yet unknown google killing feature)
4. Market like crazy
5. Reap rewards until post-google-killing-search-engine-killer is invented and marketed.
I would recommend a less ambitious project at first.
I would recommend a less ambitious project at first.
Or target a group of users who currently do not search the internet.
The ambitious part of the project is taking market share from the existing engines (hence all the talk of a google-buster).
To take 700k search queries a day, those users have to come from somewhere.
Rather than try and compete for share of an existing market, I think I would looking at new pastures.
That would also make life a lot easier for the marketeers.
Still going to require one hell of a lot of money though. And it's still going to require one fantastic idea.
Neither of the above are as easy as spidering pages and building databases.
TJ
Can I get some stock options? ;)
Sure:-
1. Paxo. For the instant stuff, not bad. Good price. Nice packaging.
2. Oxo. The original, not especially good though (in my opinion).
3. Fresh home-made from the local deli. This stuff is the best you can get, although tricky to get hold of. Makes great soup.
hth,
TJ