homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Search vs Indexing
Mastah Blastah

10+ Year Member



 
Msg#: 136 posted 5:36 pm on Sep 29, 2000 (gmt 0)

Ok being a programmer I believe that there are separate and many algo's at work in any modern search engine.

*Feel free to chime in*

What I imagine is there is an simple check run on pages submitted to the engine, this would be against machine submission tools to help weed out the mass submission on an hourly basis type of situation. Based on state information such as cookies and IP tracking this would help them shuttle a mass of automated requests into the round file (/dev/null)

Then if your page gets passed this stage your request would be put into a spidering queue, probably split among spiders in some way in Alta's case I'm sure that it just goes out and gets the page at that moment and saves it for indexing.

Here is where the next algo comes in: The indexing algo hacks apart your page based on it's content and analyses it's links, links put into two lists, future spidering and Internet 'cartography' The Cartographer also reports back any links that your site has to it. I would imagine that links are in a separate DB from page content. Page content is hacked apart and weighed for ranking assessment. In the case of AV I imagine the spidering of links on your site to be pretty quick. This is the only way they could build a theme if you submit only the root URL (Aside: What if you no followed all your pages with AV? Maybe this is a hot tip, anyone want to try:) )

So now your site is mapped, sliced and weighed, themed and stored, what is next.

The Search Algo: I guess there to be another algo that looks for the best matches in the index when a search is done. First it must look at the search to figure out which words are most important, then it has to look at the phrase (if search is more than one word) It would dip into the indexes based on this interpretation of the search phrase and look for top matches. Now there is a chance here to figure out what is going to weigh the highest in a broader sense. For instance are link's going to be more important than page makeup? What about it's position in the 'map' of the net?

The search algo would probably pull a matches and display 1 page of listings. It would then cache the rest of the listings based on how many pages you could directly navigate to (the up to 10 pages on the bottom like the ol' goooooogle-> thing)

In summary I believe that the DB's are pretty separate, that weighting can be modified on any given DB and the search itself is subject to weighting.

\/ 3
]PS: File this under the know thy enemy catagory

 

nowhere

10+ Year Member



 
Msg#: 136 posted 6:55 pm on Sep 29, 2000 (gmt 0)

>(the up to 10 pages on the bottom like the ol' goooooogle-> thing)

Hi

I was just wondering what you mean by that.

Thank You

grnidone



 
Msg#: 136 posted 5:15 pm on Oct 2, 2000 (gmt 0)

Blast-

>(Aside: What if you no followed all your pages with AV? Maybe this is a hot tip, anyone want to try:) )

I followed you up to the Aside. I don't understand what you mean by the "What if"...

Everything before that sounded very plausible, though, but I would like to understand the rest of your theory..

Thanks,
-G

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 136 posted 8:31 pm on Oct 2, 2000 (gmt 0)

I agree with MastahBlastah, it's far from one big database table with index keys -- even though we tend to talk about it that way for the sake of simplicity. Some such scheme would be required to make indexing a fast enough process, and to ensure that one hiccup wouldn't throw the whole index build back to square one.

I assume than any substantial SE must have a parallel scheme just to deal with the sheer volume of information on the web. But Alta's is particularly sharp.

Alta also offers a specialized media search: images, audio and video -- and I believe they do some kind of dedicated spidering for each of these categories. For instance, you can filter an image-search for "color or b&w", and "buttons, photos, or graphics". I'd think they would have a pretty sophisticated image-reading spider to automate that database.

I've never read anyone's ideas about the "ins and outs" of AV media search -- but I'd be all ears, especially since it looks like I'll be taking on a music client soon with all kinds of mixed media on the site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved