Welcome to WebmasterWorld Guest from 54.196.224.166

Forum Moderators: phranque

Message Too Old, No Replies

Starting my own Google

I am planning to start my own google so help

     

statbat

3:00 am on Feb 29, 2004 (gmt 0)

10+ Year Member



Hello,

I am planning to start my own google. So I need your ideas about
* What should be the database? MsSQL2000 MySQL etc.
* What should be the programming language? PHP, ASP, JSP
* How will I have the database of websites? I am planning to spider google directories to fetch sites and add them to my database? Will that work?
* How can I make it to handle 700K searches per day.

Thanks

walkman

3:12 am on Feb 29, 2004 (gmt 0)



I'd recommend hyperseek. Can't post a link but do a search. I use a version of it (heavily modified) but I like it a lot. Very easily to add things and customize.

RBuzz

3:19 am on Feb 29, 2004 (gmt 0)

10+ Year Member



You're planning to spider Google's directory? Why not just download the ODP data and be done with it? It just seems like you're going around your elbow a bit.

statbat

3:28 am on Feb 29, 2004 (gmt 0)

10+ Year Member



I am sorry but I want to start my own search engine & donot want to buy some software. I want to get custom developed software.

I also want to know how my databse will be populated?

Thanks

diamondgrl

3:38 am on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'd have to have a very good reason for doing this without knowing the first thing about your technological solution.

It's hard enough for companies that have invested tens of millions of dollars in the technology to compete with the likes of Google. Without having a much better mousetrap in mind (as Google's founders had), if it was easy enough to spend, say, only $100,000 and come up with a search engine that users wanted to use (i.e. your self-described 700K hits), hundreds of people would have done it already.

Also, Google will not allow you to spider their results. They forbid the practice and police it. So if you do go ahead with this idea, you'll need another solution.

SlowMove

3:40 am on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I already created my own google buster, but never released it as I have too much money.

hurlimann

3:44 am on Feb 29, 2004 (gmt 0)

10+ Year Member



Suggest you think how you will get your 700K searches a day.

Google claim 200 Mill a day.

Marketing is all!

Good Luck

SlowMove

4:00 am on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seriously, there's nothing easy about it:
[www-db.stanford.edu...]

walkman

5:00 am on Feb 29, 2004 (gmt 0)



"I am sorry but I want to start my own search engine & donot want to buy some software. I want to get custom developed software."
I wish you good luck. When you say your own google, will you have your own algo with math wizards or just a normal search engine?

Fischerlaender

11:24 am on Feb 29, 2004 (gmt 0)

10+ Year Member



* What should be the database? MsSQL2000 MySQL etc.

You can't run a (real) search engine with a "classical" RDBMS. They are to slow for this kind of task. You have to develop your own data structures and programs to access them.

* What should be the programming language? PHP, ASP, JSP

You can use Perl for some offline processings (e.g. index building) where the speed of your disks is the real constraint. But your query engine (the front end which has to deliver the search results) has to be pure C (or something similar).

Of course it may be possible to run a small so-called search engine with - let's say - several million pages using a RDBMS like MS SQL. But if you want the size of your index to grow so that it can somehow be compared to that of a real search engine you won't be successfull with it.

trillianjedi

11:43 am on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Statbat,

Welcome to WebmasterWorld.

To be blunt, asking the questions you've asked I would say you're approaching from the wrong direction.

The starting point of any search engine is not the database, it's the algorithm. You need to come up with a google-busting idea. Building databases and spidering pages is easy, organising and indexing the data in a meaningful fashion is the tricky part.

Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time. It became the fundamental basis for google's alogrithm when it was started. Brin & Page went and wrote a paper on it and then got funding.

Come up with a unique and novel google-busting algorithm first, without it all you'll have is a meaningless database.

If you already have a unique idea for the algo, then get some help on building the database and spiders and get a demo running. Then go get funding. You'll need lots of funding if you're to stand any chance at all.

TJ

utica

2:34 am on Mar 1, 2004 (gmt 0)

10+ Year Member



Google itself was founded on a clever idea - the concept that authoritative high quality sites will attract links pointing to them. That was a unique idea at the time.

I think you're right that the idea was unique to search engines at the time trillianjedi.

The concept itself was first proposed by Eugene Garfield in the 1950s. He called it Citation Indexing. Garfield went on to develop it as a way to index scientific publications.

trillianjedi

12:00 am on Mar 8, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Garfield went on to develop it as a way to index scientific publications.

Sorry Utica - only just seen this.

That's interesting - I wonder if that's where Brin/Page got the idea from?

TJ

hurlimann

12:23 am on Mar 9, 2004 (gmt 0)

10+ Year Member



Great post TJ but I think the key is not algos and never was.

Google became the SE market leader using marketing differentiation to promote a marginally better computational differentiation.

They did it when barriers to entry were low. Today they are not.

Fischerlaender

12:23 pm on Mar 10, 2004 (gmt 0)

10+ Year Member



Google became the SE market leader using marketing differentiation to promote a marginally better computational differentiation.

LOL - "marginally better". You seem to have forgotten how bad the results from Altavista, Excite or Infoseek were back in 1999.

They did it when barriers to entry were low. Today they are not.

Why should the barriers be higher now? It's still as easy to switch from search engine A to search engine B as it was then.

trillianjedi

12:56 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Why should the barriers be higher now? It's still as easy to switch from search engine A to search engine B as it was then.

I think the social/psychological leap required to get people away from the leading brands is greater than ever.

Google is good. Alta-Vista and the others were not as good at the time.

Beating google requires both a great engine, and the marketing to pursuade people that google is bad, or at least not as good.

MSN have that kind of mighty influence. I doubt that statbat has (with all due respect to statbat).

I take your point and it is rather chicken and egg. You need a good algo to have a product worth marketing, and you need good marketing to get people to hear about your good algo.

The point I wanted to make to statbat was simply to forget worrying about building spiders and a database - that part is relatively easy. The hard part is building a good product around the data (let's say part algo, part marketing).

In my original post, I was coming from the direction that it's the algo that requires the brains. It needs a clever idea to get market share and pursuade people away from the established brands.

But I agree you can also look at it from either direction. It would also require very clever marketing and branding.

Or the biggest stroke of luck in the history of the internet (google get's hit by a virus taking it down for a month and Bill Gates goes bust in the same week).

TJ

gethan

1:13 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I love this thread ;)

Statbat: Check out Gigablast - [gigablast.com...]

Basically has done/is doing what you're trying to do, articles on the site detail problems he has come up against etc.

I agree that it will be almost impossible to bootstrap (pull ones self off the ground with your bootlaces - for those not sure of the expression) a search engine today. Gigablast is the closest I've seen.

So here is a step by step list for you:

1. Get a huge amount of money
2. Hire some very smart people
3. Produce your search engine (with as yet unknown google killing feature)
4. Market like crazy
5. Reap rewards until post-google-killing-search-engine-killer is invented and marketed.

I would recommend a less ambitious project at first.

trillianjedi

1:18 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I would recommend a less ambitious project at first.

Or target a group of users who currently do not search the internet.

The ambitious part of the project is taking market share from the existing engines (hence all the talk of a google-buster).

To take 700k search queries a day, those users have to come from somewhere.

Rather than try and compete for share of an existing market, I think I would looking at new pastures.

That would also make life a lot easier for the marketeers.

Still going to require one hell of a lot of money though. And it's still going to require one fantastic idea.

Neither of the above are as easy as spidering pages and building databases.

TJ

Shak

1:31 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member shak is a WebmasterWorld Top Contributor of All Time 10+ Year Member



700k search queries a day x 30 = 21 million a mth (based on 30 days at that level)

that is more than Lycos.co.uk and Tiscali.co.uk put together

Shak

creative craig

1:35 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From the questions you are asking it sounds like you need to start from basic. Read up on the history of Google and Altavista see how their technology works and how they implement it.

How much are you going to spend on running your own Google?

blaze

1:36 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



He could be starting something up in China. I wouldn't say Google has a lock on the market there just yet.

rogerd

2:13 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I am planning to start my own google

Can I get some stock options? ;)

trillianjedi

2:16 pm on Mar 10, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Can I get some stock options? ;)

Sure:-

1. Paxo. For the instant stuff, not bad. Good price. Nice packaging.

2. Oxo. The original, not especially good though (in my opinion).

3. Fresh home-made from the local deli. This stuff is the best you can get, although tricky to get hold of. Makes great soup.

hth,

TJ