|search engine requirments|
For a support website of an web application
| 11:38 am on Oct 2, 2009 (gmt 0)|
I am setting up a requirement document for a new search engine. Did I miss anything in the list below?
Search engine requirements:
* User friendly: search field and a button and the cursor in the search field
* Good help while searching, like typos, alternative words, suggestions
* Cover most common issues and questions
* like google (all words are handles in the AND structure unless when "using double quotes")
* advanced search?
* Search result needs to be clear:
---meaningfull summary(2-3 lines?) with highlights
---rank in stars (not %)
* Search engine chould be able to use...
---enhanced synonyms list (easily extandable)
---stemming (eg. "searches" will find "search")
...but if used let the user know!
* Weightings and ranks factors
---Only use understandable weights (easy to maintain)
* Usage statistics
---top 50 terms and hits so documentation could be improved
---hit list of zero returned search terms to create new documentation
* Must be able to index (crawl) websites, pdfs, (office) documents.
| 1:47 am on Oct 25, 2009 (gmt 0)|
* fast (nobody wants to wait 20 minutes for their search results...)
* large index (if your index has 100 docs ppl probably won't find what they're looking for)
* relevant results (what good is size if all that comes up is junk?)
| 11:54 am on Oct 26, 2009 (gmt 0)|
* Well behaved spider.
Includes (but undoubtedly isn't limited to):
* Obey robots.txt
* Obey relevant meta-tags
* Round trip DNS for the IP from which the spider crawls.
* Don't crawl too fast (with the complication that this will be site specific!)
| 11:10 pm on Oct 26, 2009 (gmt 0)|
* Startup costs in the tens of millions, otherwise your servers will die of overload within five minutes of starting. Ahem. Err, Cuil.
| 4:50 am on Oct 30, 2009 (gmt 0)|
Oh .. I don't know .. search utilities that bend over backwards to show me things like rank, status, and stars, won't ever get my full attention, much less any kind of devotion.
Bolded text for keywords is enough .. I treat highlighted text the same way I do rank, status, and stars ..
| 6:56 pm on Nov 16, 2009 (gmt 0)|
There's a big difference in requirements if you are going to make a site search engine, a database search engine or a full Internet search engine.
There is also a ton of search engines already available for free so you might want to think of ways your search engine would be different to cover some area the others don't.
The only requirement that matters in the end is that users should find what they are searching for when typing in a query. And not have to wait too long.
...and on the tech side you need to optimize the index to take as little disk space as possible and the indexing algo to be quick and low on CPU/mem usage.
I'm currently actively developing a search engine myself so I have some experience in the field. It's a lot of work, but fun :)