|Is it possible to scrape Google's results - legitimately?|
I'm currently working on (or trying to at least) a system which stores positions of my sites for certain keywords over a period of time. I've looked at google APIs and there a few ways which work, but all of them have a clause saying you can't do automated searches and it all has to be done via human interaction.
Is there any legit way to be able to get these results (even if it's paid) and not require human action?
There are several ways I'd be able to work around google's systems, but I'm looking to do everything above board... I have a feeling it might not be possible though!
Thanks in advance for any info people might have to offer!
If you are asking about having Google's public blessing to scrape search results from google.com, I think you've already got the answer. The last time we had a thread about this was about a year ago - see Crawling Google - How can it be done? [webmasterworld.com]
There's really no way to do it legitimately without Google's permission.
Of course, if you're looking for shady ways, you have to use over 100 proxies, get rid of cookies after every visit, and make sure you don't hit it too often in a short period of time. It's very possible to scrape Google 24/7 if you have lots of proxies in your disposal, and you change your keywords often
Google webmaster tools now gives you bulk data an what your site is ranking for. Go to webmaster tools -> "your site on the web" -> "search queries" -> "download this table". I get a spreadsheet with thousands of search phrases that includes average rank. You can even use the "Filters..." to restrict it to a specific country, search type, or substring match.
If you were willing to download this once a week, you could keep track over time pretty easily. I don't know if there is a programmatic way to log in and get this data automatically.
The rank data looks pretty good to me, but it does have a couple problems:
1. It isn't accounting for SEM ads or Google placements (map, images, currency, etc) that show up above the organic results.
2. We rank #1, #2, AND #3 for our brand name. It reports the average rank as 2.1. It would be far more useful for them to report average rank for my best slot (which in this case would be very close to 1).
The WebmasterTools API [code.google.com] may offer some help.
Hisoka, you say it's not possible without Google's permission. Though for such small usage I know I'd never be able to get Google's permission, just curious as to if anyone has ever tried or know of anyone that has been given permission though?
I have got a dodgy version which does work, but don't really want to go down that path.
I've been trying to avoid using the web master tools API. I've noticed several of the issues that dead sea came across, plus it's more effort on my behalf. I may just have to put the idea on the back burner for awhile.
Thanks for the help everyone =)
Google scrapes us WITHOUT PERMISSION so grab a box of proxies and go at it IMO.
Pot calling kettle black, blah, etc.
See if 80legs would sell their botnet of crawlers with a custom user agent, would be epic.