The question is, how far to go to disguise the tool? Would a combination of spoofed user agent and simulated real-user timings be enough, or would spoofing the referral string be a good idea as well? Anything else?
I never could get the lwp cookies figured out last year so I went to all socket based. The trouble I ever had there was with Hotbots MS boxes. Finally figured out if I'd send an accept */* as part of the header, it would work fine (just something about the way I'd setup the code).
If you do write an automated tool, do try to keep respectful in the number of searches you run per day. As someone who runs many sites that are targets for spiders, I can sympathize with the se's on the issue. I don't know what that figure is, but try to keep it under 1 request a minute. Slow and steady in off hours works best. Let it run while you sleep. I have a personal limit of 500 a day max per engine. Most days I don't even run it.
Is this search tool for finding keywords, or for finding how your competition is ranked?
I am not understanding why one would do this.
-jhee
I'd be using it for 2 things - firstly to check my own rankings for a range of keywords on various sites (I'm totally disorganised about keeping a watch on my rankings, it'd be nice to automate the process), and also to gather information about the search results for a large amount of randomly chosen and popular keywords. This would provide the raw data for analysis to try and figure out an engine's algo.
(Plus, of course, I'd have fun doing the scripting ;)