I'd recommend hyperseek. Can't post a link but do a search. I use a version of it (heavily modified) but I like it a lot. Very easily to add things and customize.
You're planning to spider Google's directory? Why not just download the ODP data and be done with it? It just seems like you're going around your elbow a bit.
I am sorry but I want to start my own search engine & donot want to buy some software. I want to get custom developed software.
I also want to know how my databse will be populated?
You'd have to have a very good reason for doing this without knowing the first thing about your technological solution.
It's hard enough for companies that have invested tens of millions of dollars in the technology to compete with the likes of Google. Without having a much better mousetrap in mind (as Google's founders had), if it was easy enough to spend, say, only $100,000 and come up with a search engine that users wanted to use (i.e. your self-described 700K hits), hundreds of people would have done it already.
Also, Google will not allow you to spider their results. They forbid the practice and police it. So if you do go ahead with this idea, you'll need another solution.
I already created my own google buster, but never released it as I have too much money.
Suggest you think how you will get your 700K searches a day.
Google claim 200 Mill a day.
Marketing is all!
Seriously, there's nothing easy about it:
"I am sorry but I want to start my own search engine & donot want to buy some software. I want to get custom developed software."
I wish you good luck. When you say your own google, will you have your own algo with math wizards or just a normal search engine?
|* What should be the database? MsSQL2000 MySQL etc. |
You can't run a (real) search engine with a "classical" RDBMS. They are to slow for this kind of task. You have to develop your own data structures and programs to access them.
|* What should be the programming language? PHP, ASP, JSP |
You can use Perl for some offline processings (e.g. index building) where the speed of your disks is the real constraint. But your query engine (the front end which has to deliver the search results) has to be pure C (or something similar).
Of course it may be possible to run a small so-called search engine with - let's say - several million pages using a RDBMS like MS SQL. But if you want the size of your index to grow so that it can somehow be compared to that of a real search engine you won't be successfull with it.
Welcome to WebmasterWorld.
To be blunt, asking the questions you've asked I would say you're approaching from the wrong direction.
The starting point of any search engine is not the database, it's the algorithm. You need to come up with a google-busting idea. Building databases and spidering pages is easy, organising and indexing the data in a meaningful fashion is the tricky part.
Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time. It became the fundamental basis for google's alogrithm when it was started. Brin & Page went and wrote a paper on it and then got funding.
Come up with a unique and novel google-busting algorithm first, without it all you'll have is a meaningless database.
If you already have a unique idea for the algo, then get some help on building the database and spiders and get a demo running. Then go get funding. You'll need lots of funding if you're to stand any chance at all.
|Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time. |
I think you're right that the idea was unique to search engines at the time trillianjedi.
The concept itself was first proposed by Eugene Garfield in the 1950s. He called it Citation Indexing. Garfield went on to develop it as a way to index scientific publications.
|Garfield went on to develop it as a way to index scientific publications. |
Sorry Utica - only just seen this.
That's interesting - I wonder if that's where Brin/Page got the idea from?
Great post TJ but I think the key is not algos and never was.
Google became the SE market leader using marketing differentiation to promote a marginally better computational differentiation.
They did it when barriers to entry were low. Today they are not.
|Google became the SE market leader using marketing differentiation to promote a marginally better computational differentiation. |
LOL - "marginally better". You seem to have forgotten how bad the results from Altavista, Excite or Infoseek were back in 1999.
|They did it when barriers to entry were low. Today they are not. |
Why should the barriers be higher now? It's still as easy to switch from search engine A to search engine B as it was then.
|Why should the barriers be higher now? It's still as easy to switch from search engine A to search engine B as it was then. |
I think the social/psychological leap required to get people away from the leading brands is greater than ever.
Google is good. Alta-Vista and the others were not as good at the time.
Beating google requires both a great engine, and the marketing to pursuade people that google is bad, or at least not as good.
MSN have that kind of mighty influence. I doubt that statbat has (with all due respect to statbat).
I take your point and it is rather chicken and egg. You need a good algo to have a product worth marketing, and you need good marketing to get people to hear about your good algo.
The point I wanted to make to statbat was simply to forget worrying about building spiders and a database - that part is relatively easy. The hard part is building a good product around the data (let's say part algo, part marketing).
In my original post, I was coming from the direction that it's the algo that requires the brains. It needs a clever idea to get market share and pursuade people away from the established brands.
But I agree you can also look at it from either direction. It would also require very clever marketing and branding.
Or the biggest stroke of luck in the history of the internet (google get's hit by a virus taking it down for a month and Bill Gates goes bust in the same week).
I love this thread ;)
Statbat: Check out Gigablast - [gigablast.com...]
Basically has done/is doing what you're trying to do, articles on the site detail problems he has come up against etc.
I agree that it will be almost impossible to bootstrap (pull ones self off the ground with your bootlaces - for those not sure of the expression) a search engine today. Gigablast is the closest I've seen.
So here is a step by step list for you:
1. Get a huge amount of money
2. Hire some very smart people
3. Produce your search engine (with as yet unknown google killing feature)
4. Market like crazy
5. Reap rewards until post-google-killing-search-engine-killer is invented and marketed.
I would recommend a less ambitious project at first.
|I would recommend a less ambitious project at first. |
Or target a group of users who currently do not search the internet.
The ambitious part of the project is taking market share from the existing engines (hence all the talk of a google-buster).
To take 700k search queries a day, those users have to come from somewhere.
Rather than try and compete for share of an existing market, I think I would looking at new pastures.
That would also make life a lot easier for the marketeers.
Still going to require one hell of a lot of money though. And it's still going to require one fantastic idea.
Neither of the above are as easy as spidering pages and building databases.
700k search queries a day x 30 = 21 million a mth (based on 30 days at that level)
that is more than Lycos.co.uk and Tiscali.co.uk put together
From the questions you are asking it sounds like you need to start from basic. Read up on the history of Google and Altavista see how their technology works and how they implement it.
How much are you going to spend on running your own Google?
He could be starting something up in China. I wouldn't say Google has a lock on the market there just yet.
|I am planning to start my own google |
Can I get some stock options? ;)
|Can I get some stock options? ;) |
1. Paxo. For the instant stuff, not bad. Good price. Nice packaging.
2. Oxo. The original, not especially good though (in my opinion).
3. Fresh home-made from the local deli. This stuff is the best you can get, although tricky to get hold of. Makes great soup.