Page is a not externally linkable
insight - 3:31 pm on Mar 9, 2005 (gmt 0)
And no, 10,000 results (10 results x 1000 queries) is not enough. It's not even in the right order of magnitude. There's lots of ways people search: "widget wx", "widget wx100", "widget w-100"... We want to be doing well for all of them. Just as GoogleGuy suggested, we mine our server logs to keep finding the new ways that customers find things. It's dead easy to use server logs and ranking results to figure out the clickthrough rate for a given position, and that told us it's not important to look deep into the results for a query, it's important to have something for all the variations. So, yeah, we do a fair bit of scraping. We do it as politely as we can: big delays between searches, more than one originating IP, we spread hits accross the datacenters (which has also told us a fair bit about how updates happen), etc. That politeness has apparently mostly kept us under the captcha limits. I could invest time into looking less bot-like (more IPs, more browser-like headers, occasional clickthroughs, etc.), but I'd really rather just pay for access to the API. Let me say that again: I'd rather pay for access to results than scrape them. We could have an arms race between bot detection and human emulation, but I'd rather just be a customer. As a businessman and generally nice human being I don't want to play cat and mouse, I want to have reliable transactions and I don't understand why Google doesn't want that, too.
I've counted Google as a great tool since I read this cool research paper in the summer of 1998, and I don't see it getting less interesting anytime soon. As of this morning, Google accounted for 63.77% of leads to our site. And we shouldn't pay close attention? Like Receptional, I could tell you the conversion rate from those hits to sales. Or for any other engine, or partner, or a shopping site. It's not a lot of data if you really know what you're doing with SQL; I have a pile of interesting reports ready for me in the mornings.