Forum Moderators: bakedjake

Message Too Old, No Replies

Need help on search engines

For university indivisual project

         

ac1982

9:32 pm on Nov 2, 2005 (gmt 0)

10+ Year Member



Hi
I am in the 3rd year in the University and I choose as my subject to build a small search engine to find solutions for software and hardware problems.I don't have any knowledge about this stuff so I started studing perl programming and robots. I would like to ask you, do you thing I will be able to to this project? If yes please give me some directions. Thanks

treeline

10:18 pm on Nov 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to write all the scripts from scratch it will be a lot of work. There are a lot of moving parts, so if you must build it totally yourself keep it really, really simple. No, simpler. If you have no programming experience it's hard to see how you can do it in a reasonable amount of time.

A better option (if allowed by your advisor) may be to purchase one of the many scripts that handle the basic functions (there are also a few free ones). This will still be a lot of work, but completely doable and very educational. You may need to combine several programs to work together. You'll learn to coexist with a database, keep a spider well fed, troubleshoot script errors in either perl or php, tweak search settings, make it more usable, fight spam....

I'd suggest working with an existing script as a good first experience unless you are a very talented programmer and have already thought through all the pieces that are needed.

ac1982

10:52 pm on Nov 2, 2005 (gmt 0)

10+ Year Member



First of all thanks for the quick response.
I have some knowledge as it concerns programming. I was using java for 3 years, php and mysql for more that 6 months. And now am studing perl.

My purpose is to build a small search engine. I have one and a half month to deliver a research report and then I have another 5 months for the development phase, so I thing I have plenty of time.Can you give me some more details about "the moving parts" you said before?
From the other hand I would like to suggest me some existing scripts so I could make a little practise.
Thanks

Lord Majestic

3:50 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need to specify exact research goals because building a search engine touches many aspects of Computer Science, in fact if you were to try to build very big search engine you would find yourself solving pretty much all tough problems CS deals with.

For starters you need to decide if you going to use database - highly recommended option for those who have no time and who don't plan on building big search engine.

ac1982

4:26 pm on Nov 3, 2005 (gmt 0)

10+ Year Member



"Building a database first is a good idea to help me start doing something" my supervisor also said. I thing for now I ll focus on building a database for a start and maybe after that I will be able to move on.
Thanks

Lord Majestic

4:42 pm on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Read up on topic of "inverted index".

Dave_A

12:07 am on Nov 5, 2005 (gmt 0)

10+ Year Member



Hi I would imagine that you would be able to build an engine from scratch, it just takes time and a heap of study to be able to do it.
Sifting the data is the easy part when it's in a Mysql database but when it comes to developing a whole new webspider, it can be very different.
Webspiders need a very different way of working to be of any use and you must be prepared to test it out in heaps of ways.
Webspiders need to be able to read and follow commands written within the Robots.Txt files and if they don't you will have heaps of people thinking that you are just swallowing bandwidth and they will block the spiders attempts at indexing mostly using Htaccess to block it.
A few people attempt to set up a search engine, they start to crawl the web without having a working engine that can be searched so heaps of webmasters get worried by things that waste badwidth like that.
I have set up a working search engine in New Zealand, presently it is now enabling Metasearch engines to feed from it's search results and you may find that setting up a metasearch engine would enable you to develop a search engine without the problems that may arise from mistakes in programming.
Give it a try my friend!

Lord Majestic

12:20 am on Nov 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



setting up a metasearch engine would enable you to develop a search engine

This would be hardly a development of the search engine work, more like simple integration that won't teach anything beyond integrating things.

ac1982

1:16 pm on Nov 5, 2005 (gmt 0)

10+ Year Member



Thanks Dave_A for the info you gave me. I ll make a research on meta search engines to learn more about this subject. For the moment I ll concantrated on buiding a small search engine using maybe more than one database to retrieve data, for my project. My objectives are to learn as much as I can about search engines so any info are welcome :-)

Dave_A

2:01 pm on Nov 5, 2005 (gmt 0)

10+ Year Member



I just looked at the time you posted your message..Then I realised you are based possibly in the UK because of the time difference..
I's now two thirty in the morning and I am watching the webspider indexing more websites into the search engine.
Spidering in progress...
------------

Heck it must be time for me to find a hobby?

Today I formed a partnership with an American Metasearch engine who are now passing queries over to my engine for added results from my search engine.
The volume of searches has risen by a around 17% which isn't bad.
Passing data between two databases is a work in progress.

Heaps of regards
Dave A

[edited by: Woz at 8:39 am (utc) on Dec. 5, 2005]
[edit reason] No URLs please, see TOS#13 [/edit]

topsites

8:33 am on Dec 5, 2005 (gmt 0)



It all depends what you want...
As a directory owner, I can tell you the basic differences:
A directory is a link list, the 'search' part of the site usually crawls only the links within thou most may offer a world-wide-web function as well.
A search engine has NO directory, crawls world wide web and stores the links the spider finds in a database. Directories also store their links in a database but with directories the links are in plain sight for the visitor as well.
A meta crawler uses a number of search engines to conduct simultaneous queries and beyond that, I am not sure how it works thou Mamma.com is the pioneer of the meta alghorhythm so you might ask them.

Other than that, both search engines AND directories need software to perform the searches. The search-box is a doorway for the visitor to access the database via the software, thou the search-box in and of itself can be one heck of a script, it does not however, perform searches without said software.

For my site, I use Zoltan Milosevic's software, an affordable shareware program with plenty of functions, even a novice can perform an automated install and play around with the limited free version, you can find it here:
[xav.com...]
(oh, and the source is available)
Hope is help.

ac1982

5:44 pm on Dec 12, 2005 (gmt 0)

10+ Year Member



My purpose is to make a search engine that will have both a directory It sounds a little bit inapropriate for as I don't have so much experience in this stuff but I will try to do it.