Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: bakedjake
How can I get search results from various search engines?
I'm thinking along the lines of using something like the Google API but it is not for commercial sites and the number of searches are limited.
Does anybody know how I can get search results (via some sort of API) so that I can incorporate them in my site?
Take shak's advice. Unless you're seriously planning to compete in the SE space, writing a truly paralel meta-search from the ground up is not for the faint of heart. However, if you know what a non-blocking socket is, then perhaps it's for you after all.
If so that sounds illegal since I'm stealing somebody else's search results.
Its not illegal per se, but its against their T&Cs, and from their point of view they will make a perfectly legit action of rejecting you access to their site.
combines the results and sorts using it's own algo.
I have always been curious how can that be done. I mean search engines use their own sorting based on parameters that they never expose to you, and then you take sorted outputs from X search engines and try to sort on your own, while totally lacking miriads of important things that search engines took into account, but you did not?
The meta search I run is a job search engine. We meta search guys like monster.com, yahoo's hotjobs etc. The ranking algo's they (the job SE's) use are simply keyword based. One problem we've had is that we don't have access to all the keywords for every job. We just get the title and summary info. We also don't have access to the location info (zip code for example) so we can't do fancy things like showing all jobs within a radius from a zip code. It's been very frustrating *sigh*.
But all that has been solved by rearchitecting our SE as a crawler rather than a meta search and indexing the full data for every job. The down side is we now have a 6 hour delay before a newly posted job appears on our SE. We've also developed a way to figure out the longitude and latitude of a job, which gives us the ability to do radius searches and distance filters.
We'll be launching later this month. msg me if you want the URL.
But all that has been solved by rearchitecting our SE as a crawler rather than a meta search and indexing the full data for every job.
Ah, lets me just get it right -- effectively you use other search engines to narrow choice of pages that you crawl yourself and then rank using your own algorithms?
One in Java and the other in Python. The Java one is threaded and can be run on multiple servers. The Python one is simply a foundational class taken from the Java application as a proof of concept.
Anyway, you'll need the following:
a regex class
a class for each engine
a class to gather the results
a class to perform various algorithms on the results gathered
a class to display the results to the user
a lot of JSP and servlets to handle the above
You'll need to carefully consider timeouts on the various engines.
You'll need to consider how long your user must wait, and what will they do to adjust the wait.
Finally, you'll need to consider if you even want to do it. I'd like to go into detail about my background but I can't. Bottom line, virtually every search engine TOS says you can NOT metasearch their results. You will be BLOCKED. Google in particular will NOT allow you display their results ANYWHERE, unless you pay....and the fee is quite high. Bottom line, metasearch as we once knew it, is probably dead, if done 100% by the TOS. Most metasearch sites are not following the TOS of the various search engines, or else they are paying big bucks for permission.