I read the brief paper about google backrub, and that has helped me to get to grips with the various "subcomponents" of a search engine and its cycle. The doc is here [www-db.stanford.edu] though I'm sure most of you have already read it many moons ago.
In this example, I'm not looking for something small scale, but nothing too big that covers all topics on the web (maybe a large, generic category like webmastering).
So I just want to toy with the idea of making one, hopefully using PHP as my entrance into the pearly gates of SE engineering :)
Its a big question, but first and foremost, what variables to consider? I'm thinking of ones that do not place too much influencing on spidering.....so I guess perhaps I'm thinking more along the lines of a themed directory just now, that will have a later to be added crawler.
What variables would you consider adding to your ideal search engine, with preferably, less emphasis on "on the page" stuff and more emphasis on an ODP type listing with some variables that will hopefully measure more "precision" into the results thrown out.
CTR is obviously something that could be used, perhaps with some sort of IP filter for bogus/odd/spider clicks
I'm thinking some sort of PR feed would be nice too... :)
But I'm just looking for a general > better idea of how to approach making a small scale engine that will meet future needs and be editable as any open source effort that never "really" does exactly what you want it to do!
Help and pointers mucho appreciated
First pointer: ht://Dig [htdig.org]
RE inventing the wheel...thats why I was apprehensive about posting, but then again..I always like bigger wheels :)
It looks like something comparable to htdig would be excellent for the spidering process....
For me, I'm running PHP and mySQL and downloaded a C compiler - with the fuzzy logic I can produce something credible by the end of it all ;)
Part of this whole exercise for me is just to go through the motions and walk through what SE spidering,indexing and sorting entails.
Something I could work with on my IIS/XP sorta effort would be nice :)
Again, I'll try not to reinvent the wheel, but whoever taught it....I hope they have a downloadable file I can read offline where I'm trying to pick up all the pieces and make a SE similar to FDSE.....but tweaked in the directions - that ideally (what is the ideal) could be used in future
It seems that we are thinking of how to do the same thing.
A while ago I made a post here [webmasterworld.com...] and Brett came back to me with a top reply. I gave it a little more thought and have come up with a few ideas. I dont know if I am on the right lines but I am having a lash.
cheers