Forum Moderators: bakedjake
Costs include (but are not resticted to):
bandwidth involved in getting the information to a central database
servers that the DB sits on
electricity that powers the DB servers
somewhere for the servers to 'live' safely
backup costs
staffing/programming, you can't shove the whole web into a flat file database - you need people with really good experience of dealing with more data than you can imagine
also
cost of making people aware of the project
cost of supporting people who opt in
$1.5 Million to crawl the whole web (or at least 50 Billion pages of it), If anyone can do it for that (in a reasonable time frame) I know a company who would probably bite your hand off.
If anyone can do it for that (in a reasonable time frame) I know a company who would probably bite your hand off.
We crawl 50 mln per day and its scalable linearly, ie 10 times people join and we will get to 500 mln per day, that's 3 months to crawl 50 bln pages.
Of course the challenges that you mentioned are all true but our work pretty much proven that it can be done without millions needed to be invested. Certainly takes time and effort but its do-able - much harder task to actually have good ranking that is competitive with Top tier search engines.
I can't say more because of "self-promotion" rules here but those who seek will find, just like those who dare win ;)
much harder task to actually have good ranking that is competitive with Top tier search engines.
Indeed: a little look at your project confirms the difficulty of ranking pages well.
What you've achieved is impressive, I applaud people who have the vision and determination to do things on the scale you are attempting.
However, the real value of any search engine is in the ability to remove spam/poor pages and return the most relevant ones for searches. I'm sure you are learning a great deal about how tricky that must be (I can't profess to know much about that).
It looks as though I've got another interesting site to visit on a regular basis.
Best of luck with it.