Running a search Proxy?

I like the /ie interface. I'm thinking that I can create a very simple Google proxy for my site. It wouldn't get more than 50 users a day, I predict, but it would draw attention to the privacy issues I'm trying to highlight. I doubt that Google would block me, because that would only give my issues more attention. I'm nonprofit, zero budget, and just want to raise some issues.

With the /ie interface, you can still add num=100 in the search command line and get 100 results instead of 10. That way you could even strip out the "Next" link. The snippets are there for your use, if you want to parse them out for non-IE users and highlight the keywords. I would strip out the Google logo, as well as their plug for their toolbar.

(This is not much different than what Google does when they cache our pages. They slap their own brand on their copy of our stuff, alter the coding, and recommend a bookmark that keeps folks away from us forever.)

An extremely simple interface would be to shell to a "lynx -source" dump to a file, and then read this file, parse out what you need, and spit it back out to the searcher. If you fork from your Linux CGI program to use lynx, you may need to escape the '&' in the long URL that you pass to lynx. Of course, you can just open a socket and parse on the fly, assuming that you are eager to figure out all that handshaking that even a browser like lynx probably does to make Google happy. (I tried a wget dump but got a Forbidden from Google.)

Also, delete the .lynx_cookies file first, if lynx is set up to accept cookies, so that Google has to issue a new ID for every search.

No advertising! No cookie for the user! No cache links! -- I like that, because I'm opposed to Google's cache, although a lot of searchers wouldn't like it. I would use POST instead of GET for the user's search terms, so that the terms don't end up in my httpd log, and I can brag that we do no logging of search terms.

A Google adbuster and anonymizer! It's just a gimmick, really, because if it started becoming popular, I'd have to take it down. Even if Google didn't take me down first, who'd want to waste much bandwidth on such a gimmick? Calling a shell or forking doesn't come cheap either.

I wonder what Google uses this /ie interface for? Are they likely to keep it going? It seems to me that getting the raw data like that, without ads and without all that complex XML and SOAP stuff, is just too tempting for Google to keep it going, now that they're getting so heavy into piling new stuff onto their SERPs.

Running a search Proxy?

Everyman

chiyo

ciml

Everyman

Everyman

GoogleGuy

NFFC

Everyman

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week