Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Developing new website - what's the point if Google will scrape your data?

     
8:15 am on Aug 17, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 7, 2003
posts:804
votes: 121


Yesterday I went looking for some information that I thought would be available on the web, and was surprised to learn that nobody had compiled that information and built the site (at least I couldn't find it). Which got me thinking....

If I did compile that info (it would cost about 10,000 - 20,000 to gather the data) and build a site, what would be the point when Google could just scrape the info and present it in the search results without the user ever having to visit my site (and give me an advertising opportunity)?
9:30 am on Aug 17, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member redbar is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Oct 14, 2013
posts:3371
votes: 564


Precisely, why do you think I removed thousands of my trade unique widget images a couple of years ago?

They cost me a lot of time, effort and money only for Google to take 90% of my traffic plus AdSense revenue at at stroke. Yep, what is the point?
4:28 pm on Aug 17, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


I think it depends on the type of data curated. If the information is mostly one-sentence answer then perhaps I would not bother - it could be scraped easily.

If the information is something where longer answers / in-depth answers are sought then scraping the data would only give a short intro. In that case the site may be useful to visitors and could be worthwhile.

Another factor is whether you could monetise the site easily, i.e. if it takes 10-20K to build the site, would you at least get your money back and over what time, and would you earn anything out of it (or manage to channel the traffic to your money sites).
4:35 pm on Aug 17, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


Information is a commodity. Packaging and presentation are the "value add."

Think of compilation as a starting point, not as the finished product.
10:55 pm on Aug 17, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 4, 2001
posts:1097
votes: 7


I can't comment on the scraping aspect... I'm familiar with content being "appropriated" for use in Knowledge Graph responses, but does Google actively scrape beyond that?

What concerns me more is that Google has no qualms whatsoever about eating your lunch if they need it to support/improve their share price and appease Wall Street. The need for Google to continue to show better and better financials leaves them with no choice but to channel more and more online revenues to their own products and services.

Shopping, Flights, Travel, Hotels etc are just the tip of the iceberg. You ain't seen nothing yet and this self promotion above all else is what lies at the heart of the European Union's clash with Google.

I have no intention of ever developing another website in my lifetime. Google has already eaten my lunch and the lunches of many thousands of others and if anyone is considering placing a new website in a niche that Google has a financial interest in, then IMO you are doomed to failure.

It has nothing to do with scraping, it has everything to do with money.
12:39 pm on Aug 18, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13012
votes: 222


You're the only one who can gauge how much your time and investment are worth. Can you drive significant traffic from other sources besides Google? For me, that would play a large part in my assessment of risk.
4:25 pm on Aug 18, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1087
votes: 83


Is the information suitable to could create an 'interactive assistant' website? That way:
- Google can't scrape a widget, or a group of widgets
- people might well prefer an interactive tool, or a set of videos, to a page of text
- site is highly consumable and link-bait by definition
- you can still build text pages around the questions asked about the information (good content for rankings)

Of course, someone might then create a text website based on stealing your answers and then Google will scrape them instead :)

I've not built any information websites but if I were to do so I think I would build a tool.
6:55 pm on Aug 18, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


I've got an information site, and Google has sent me about 18,5 million "sessions" (the term used by Google Analytics) since I began using GA a little over eight years ago. What's more, my Google traffic has increased substantially since Google introduced "answer boxes" and the like.

If Google is eating your lunch, maybe you need to pack a different kind of meal in your lunchbox.
7:28 pm on Aug 18, 2015 (gmt 0)

Full Member

10+ Year Member

joined:May 25, 2006
posts:300
votes: 36


Taking off my webmaster hat and putting on my accountant hat, the answer depends on what 20000 means to you - I wouldn't invest everyting in any project that has significant risk (although I don't think google scraping your data is an important risk compared with the other business risks in launching a new website), but if 20k is quite a small investment for you then take a chance - allocating at least a small percentage of resources in higher risk/higher possible returns investments is often a good idea.
9:08 pm on Aug 18, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts: 480
votes: 22


Their are ways to prevent scraping and data mining but then you are talking about "copy protection" and you need to first decide if the content is to be searchable from the web and thus SEO fodder or if that information is for a restricted audience only because you cannot have it both ways.
9:19 pm on Aug 18, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 7, 2003
posts:804
votes: 121


For the idea that I was thinking about, a one-sentence sentence answer would definitely suffice from a user's point of view.

So no point in gathering the data from a commercial point of view.
6:24 am on Aug 19, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


after all those lunch bags and packaging stuff, let us get real and I would advise you to not look at it as an one liner or in-depth answers, but whether your site/content is all about facts...

What google does is to scrape "facts" and show them in their KGs as no one can sue them for scraping facts as these cannot be copyrighted by anyone and no one can claim that these have been scraped from them...

so if your site is all about facts, leave them for the likes of wikipedia as they can run the show with donations....today google is said to be scraping one liners and it might not be far when they might scrape bigger answers too as long as they are facts...so don't waste your money on building sites with factual content alone as you will only be gifting them to the likes of google eventually...more so if you depend on google traffic and adsense revenues to run the show...
6:55 am on Aug 19, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts: 480
votes: 22


My what an interesting concept... "that anything factual cannot be copyrighted and is free to plagiarise".
1:00 pm on Aug 19, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


Facts not, but how they are presented, yes.

Otherwise any scientific or historical teaching book could be plagiarised.
1:38 pm on Aug 19, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


My what an interesting concept... "that anything factual cannot be copyrighted and is free to plagiarise".

That isn't the case at all. Facts themselves can't be copyrighted, but that doesn't mean "anything factual" is fair game for harvesting.

Copyright is about protecting expression, not facts or ideas. The Copyright Website's "What does copyright protect?" page may be helpful in understanding this concept:

[benedict.com...]
2:00 pm on Aug 19, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


Yes you might copyright the way you present the facts....you can present/express them differently in your books and sell them if you have the capability to market them....

Anyone harvesting facts like what Google does online is going to present them differently to save themselves from legal suits..Google does do it very well in their knowledge graphs and you cannot make a case against them for presenting those info/facts in Knowledge Graph...IMO, what they are doing cannot be challenged...

harvesting of facts cannot be challenged (whether it is fair game or not doesn't matter) but presenting them in an unique/innovative way is what matters and if such presentation/expression is plagiarized, it might be challenged. However google will never do that...
3:47 pm on Aug 19, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


Anyone harvesting facts like what Google does online is going to present them differently to save themselves from legal suits..

Anyone who hopes to make a living from just "harvesting facts" and dishing them up online would be in trouble with or without competition from search engines.

Facts are a dime a dozen. It's how you gather, present, and interpret them that matters.
9:23 pm on Aug 19, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts: 480
votes: 22


Unfortunately we have little control over our own intellectual property, especially when so many entities are designed to prosper from plagiarism. I don't recall ever giving Google the right to spider sites that I do not submit to them. Nor did I give them the right to spider pages that I may have typed into the address bar of their Chrome browser.
10:51 pm on Aug 19, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


I don't recall ever giving Google the right to spider sites that I do not submit to them.

Web crawling has been around at least since the early 1990s, before Google even existed. Search-engine spiders have been part and parcel of the Web for so long that it's hard to imagine anyone (including a court) succeeding in putting the genie back in the lamp.

If you want to remain invisible, why not block Google and other search engines with robots.txt?
11:02 pm on Aug 19, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts: 480
votes: 22


Ahem, robots.txt can be ignored.

On some sites we do block all know spiders but there are many more unknown and many who do not want to be known. But when it comes to the real intellectual property we do block everyone except clients. In fact that is our industry, protecting the intellectual property of others.
7:46 am on Aug 20, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1087
votes: 83


Nor did I give them the right to spider pages that I may have typed into the address bar of their Chrome browser

According to this you did - [google.com...] (second bullet point)
See also 7.3 here: [google.co.uk...] - although 7.5 says you retain intellectual property rights on 'reviewed' content.
9:18 am on Aug 20, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts: 480
votes: 22


According to this you did


Are you saying that anyone who installs the Chrome web browser is agreeing to be spied upon and that any web address, private or otherwise, that is typed into what is supposed to be an address bar and not a search box, will be added to their indexes for spidering and public search results?

So all the naive web developers in this world who believe that Chrome is ideal for site testing will be testing their new site developments while Google adds those pages to their indexes before the site is actually released to the public.

Which part of "do no evil" makes that legal?
11:21 am on Aug 20, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


So all the naive web developers in this world who believe that Chrome is ideal for site testing will be testing their new site developments while Google adds those pages to their indexes before the site is actually released to the public.


Yep. Seen many test sites in SERPs.

This is where blocking the test site by robots comes to play. Or use canonical to live (if live exists already).

Or even better, test site should be behind the login, on intranet or protected by IP filtering.
12:31 pm on Aug 20, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13012
votes: 222


Or even better, test site should be behind the login, on intranet or protected by IP filtering.


1000 times yes. I run into about once a month with clients' developers.

So all the naive web developers in this world who believe that Chrome is ideal for site testing will be testing their new site developments while Google adds those pages to their indexes before the site is actually released to the public.


If they don't know how the web works by now, they should probably look into a new line of work.
12:18 pm on Aug 21, 2015 (gmt 0)

Preferred Member

5+ Year Member Top Contributors Of The Month

joined:Sept 12, 2014
posts:384
votes: 68


what's the point if Google will scrape your data?



Yesterday I went looking for some information that I thought would be available on the web, and was surprised to learn that nobody had compiled that information and built the site


Are you creating the data or compiling it from other sources? If you are compiling the data, as you wrote, then wouldn't you be doing the same as goog?
3:37 pm on Aug 21, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 7, 2003
posts:804
votes: 121


Are you creating the data or compiling it from other sources? If you are compiling the data, as you wrote, then wouldn't you be doing the same as goog?


This idea would involve physically gathering the data in the real world through thousands of phone calls and letter writing - the data is not available on the web.
4:05 pm on Aug 21, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3491
votes: 787


This idea would involve physically gathering the data in the real world through thousands of phone calls and letter writing - the data is not available on the web.

Ig the data isn't readily available and has value, why not sell it instead of making it available as a free Web site that anyone can scrape or rewrite? (It seems to me that Google and Bing answer boxes are the least of your potential problems.)
8:47 pm on Aug 21, 2015 (gmt 0)

Preferred Member

5+ Year Member Top Contributors Of The Month

joined:June 26, 2013
posts:454
votes: 69


Just invest the time/money to compile the data, buy some Google stock and launch the site. This way you are guaranteed to get a .0000000001% return on your investment. And as a shareholder, you will reap the benefits of scraping on a massive scale and might just break even in the end.