Welcome to WebmasterWorld Guest from 54.85.162.213

Forum Moderators: phranque

Want your own ML based app/site search?

SPTAG (Space Partition Tree And Graph) released

     
5:49 pm on May 16, 2019 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1268
votes: 393


Want to play with your very own app/site search?
Want to step into practical use machine learning?

Microsoft has released Bing's SPTAG (Space Partition Tree And Graph) [github.com] under MIT licence.

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

...

This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.

SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.

Note: written in C++ with Python wrapper.
9:14 pm on May 16, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2548
votes: 717


What makes this better than the many other packages out there, most notably SciKitLearn in python.
[scikit-learn.org...]
3:19 am on May 17, 2019 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1268
votes: 393


@NickMNS: haven't a clue :)
I haven't used either the SPTAG mentioned in OP nor the SciKitLearn you thought comparable. I just ran across it and thought it interesting. If this was a few years back before I went to the time and trouble of reinventing the wheel aka building my own ML backed site search I'd probably be neck deep in trialing it but as is, not.

It's just nice that this sort of stuff is increasingly shared; makes it so much easier to step in a shallow end as opposed to prior years ago when only the rather chilly deeps were available. Machine learning is not going away, to be current some exposure is becoming a requirement outside hobby sites.
5:21 pm on May 17, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2548
votes: 717


Ah I see now why the sudden interest.
[seroundtable.com...]

I find this very disingenuous on the part Microsoft as they are hyping this as a simple black-box solution to allow someone be to create a search engine that would be able to find the height of the Eiffel Tower by searching for "how high is the tower in Paris". What they are providing is the algorithm to be able to create such a search engine. Yes, there is value in that, but it is not sufficient to build anything nearly as sophisticated as they claim. The real "work" in this is determining the features on which to vectorize the terms. How would you vectorize "Eiffel Tower", "Paris", "France" and any and all related terms?

(Side note, maybe the searcher wanted the height of the "Tour de Montparnasse" and not "Tour Eiffel")

All this being said, the point of interest is that they appear to have developed a Nearest Neighbor like algorithm that is far more computationally efficient than the more common models that exist like the ones I referenced in SciKit-Learn.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members