Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Scrape Google pages using curl = disadvantages?

         

poweri

4:03 pm on Dec 6, 2008 (gmt 0)

10+ Year Member



I want to use curl to scrape pages from Google search engine. Of course people will see that the pages are from Google, but I want to have the pages embedded in my website.

Can they blacklist the server of your host for fetching the data? I suppose they (Google server,...) can see the fetching and because I have read something about limited use for websites and software developers (1000 hits a day?), I have this kind of questions. I don't want trouble...

tedster

8:39 pm on Dec 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you programmatically scrape Google search results, you can run into trouble.

Google's Terms of Service do not allow the sending of automated queries of any sort to our system without express permission in advance from Google.

[google.com...]

One thing that can happen is your server just gets blocked and your page will load a goofy warning instead of what you intended. I've also heard of cases where Google penalized the offending site when their bandwidth abuse became high.

poweri

1:41 am on Dec 7, 2008 (gmt 0)

10+ Year Member



Thank you for replying. I think I will lose this one...

willybfriendly

2:00 am on Dec 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A few years ago I had a site that scraped G's news page (under a specific topic). I simply set it up to scrape the page on the first search on any give day and cache it.

Never had a problem with it as far as getting banned or penalized. In fact, for a few years it consistently ranked #1 for the search 'widgets in the news'.

That said, it was a bit hypocritical of me to complain about scrapers and plagiarism as long as I had the page up. I took the page down about a year ago when the site in question got a major upgrade.

But, my point is that properly implemented the idea can be done and can be quite effective, depending on topic, etc.

kidder

3:39 am on Dec 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is the biggest scraper site of all... But don't you dare scrape them. LOL. Guidlines are not laws fellas.

ZydoSEO

8:41 pm on Dec 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guidelines are not laws in the legal sense... true. But they do spell out what Google expects of you as a webmaster if you want to 'play' in their world. Ignore/violate them, and there can be serious consequences.

johnnie

12:09 pm on Dec 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have a look at the google AJAX search API [code.google.com]

BradleyT

7:08 pm on Dec 8, 2008 (gmt 0)

10+ Year Member



Google is the biggest scraper site of all... But don't you dare scrape them. LOL.

From Google's robots.txt
Disallow: /search