Forum Moderators: coopster & phranque

Message Too Old, No Replies

Custom Search Engine

Need to build a SE for a limited topic index

         

lorax

1:52 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello Folks,
One of the problems I have with the major SEs is that the result sets include a lot of junk that's not pertinent to what I search on. I want to build (beg, borrow or ... code) an SE of my own that searches only the web sites I tell it to. I can parse the pages once the SE returns them to me. What I'd like to know is:

1. Can this be done with PHP and MySQL?
2. How would I pull the pages of the target site into my script for parsing? Is it as simple as fopen or readfile?
3. My plan is to parse the pages and pull relevant info (title, meta tags) into a db and then build a keyword list based on keywords the SE found out of a master list I tell the engine to look for.
4. Am I nuts or is this possible without having to know something like JAVA or VBScript?

Your thoughts/suggestions would be most appreciated.

lorax

2:04 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I should clarify that there are actually two parts to this puzzle. One is the spider/bot that gets the pages from the target web sites. The second is the SE on my site which searches the db I build.

It's the spider/bot that I need help with.

john316

2:25 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How many sites do you plan on indexing?

lorax

2:29 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To begin with I have a list of a few hundred but I doubt it'll get to be more than a few thousand.

brotherhood of LAN

3:06 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You may want to try the FDSE search engine, which can spider sites titles/tags/body text, and can be customised

I was going to sticky it in the interest of neutrality but you can actually try this beasty good software package for free

Ive used it and recommend it. It is written in perl, and can install automatically to your web site from their website if you have probs

Its an alternative anyways...

lorax

3:20 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Botherhood_of_Lan,
Please do sticky me the URL. I'm open to saving myself time. I'd like to write the code someday just for the sake of learning (and being able to say "I built that") but right now I just need a good working solution to demonstrate the concept to an important contact.

lorax

3:43 pm on May 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ahh...I just reread you post Brotherhood_of_lan and twas able to find FDSE just fine. Thanks for the tip!