Forum Moderators: Robert Charlton & goodroi
[google.com...]
... the collective weight of all those queries add up ...
One might view that as the price to be paid for monetizing our content, on the way to becomming a $50 billion dollar company.
OTOH, I'm sure that those wm queries do put a strain on. Personally, I'm very respectfull of G's servers. It's a citizen of the 'net thing. ;-)
Think it's time to go back into the cave for a while. :)
then how do we get the data? Sites are often dependent on Google rankings for profitiability, but we should just cross our fingers and hope we do well rather than seeing where we do well so we can make informed business decisions?
Relying on Google's results for profitability probably isn't an informed business decision. True - it comes in very handy, but only at the whim of the algorithm, which will change whether you like it or not over time.
But - given that we all take the Google gravy train from time to time, tracking Google traffic (or better, Sales that convert from Google searcjes), rather than Google rankings is a better plan.
Dixon.
And no, 10,000 results (10 results x 1000 queries) is not enough. It's not even in the right order of magnitude. There's lots of ways people search: "widget wx", "widget wx100", "widget w-100"... We want to be doing well for all of them. Just as GoogleGuy suggested, we mine our server logs to keep finding the new ways that customers find things. It's dead easy to use server logs and ranking results to figure out the clickthrough rate for a given position, and that told us it's not important to look deep into the results for a query, it's important to have something for all the variations.
So, yeah, we do a fair bit of scraping. We do it as politely as we can: big delays between searches, more than one originating IP, we spread hits accross the datacenters (which has also told us a fair bit about how updates happen), etc. That politeness has apparently mostly kept us under the captcha limits. I could invest time into looking less bot-like (more IPs, more browser-like headers, occasional clickthroughs, etc.), but I'd really rather just pay for access to the API.
Let me say that again: I'd rather pay for access to results than scrape them. We could have an arms race between bot detection and human emulation, but I'd rather just be a customer. As a businessman and generally nice human being I don't want to play cat and mouse, I want to have reliable transactions and I don't understand why Google doesn't want that, too.
I second that. Querying via the API and querying manually gets in almost every case differing results. Usually the API-results are off by 1-2 positions (on the negative side).
Is it 1-2 position off or a couple weeks behind? I'm sure it's the later, i.e., the site where you have submitted your API is only allowed to query outdated results somehow.
In which case.. the coder vs Google arms race can continue, and I'm okay with that, because Google aren't trying that hard. Or, at least, to me it's always seemed easy to get the information required.
blaze, the collective weight of all those queries add up; that's one of the main reasons we ask people not to do it. But the WebAPI is a great way to do 1000 queries/day of whatever you're interested in for your own personal purposes.
What a shame all those programming resources have to be dedicated to preventing misuse of Google results. Publishers feel the same way about Autolink's misuse of their work.
GoogleGuy replied:
> insight, I'd start with your server logs--that's gold that's usually not mined nearly as much as rank checking. Personally, I don't see a problem with using the Web API to check rankings for your own sites--just don't sell that tool/service. :)
Forgive my ignorance, but what kind of "gold" is GG referring to? Could one determine the link popularity for each of one's pages from the server logs?
Thanks,
Ric
Forgive my ignorance, but what kind of "gold" is GG referring to? Could one determine the link popularity for each of one's pages from the server logs?
He means that there is alot of information in your server logs that can help you improve your site. You can see where your referrals are coming from, you can see what people are using, what paths through the site they are taking, etc..
Page rank info is not available in the logs but you can see what keywords people are using that find you in given search engines.
I, and a slew of others I’m sure, am waiting with baited breath for a response to Insights comments:
I'd rather pay for access to results than scrape them. We could have an arms race between bot detection and human emulation, but I'd rather just be a customer. As a businessman and generally nice human being I don't want to play cat and mouse, I want to have reliable transactions and I don't understand why Google doesn't want that, too.
I’m aware that silence is a response, but I thought I'd prod a bit more.
The lack of response to the suggestion they charge money for using the service implies this.
Am I right, GoogleGuy? You don't have to answer if I am right.