Will Google nail me for scanning content from books not in Googlbooks?
5:40 pm on Feb 16, 2012 (gmt 0)
I'd like to quote some text from books on a site but am concerned I might get nailed by Google on the serps. The books are not already listed on google books...
Any thoughts on how Google might react?
8:13 pm on Feb 16, 2012 (gmt 0)
Since you are planning to "scan" rather than merely typing a sentence or two as "fair-use" quotes for commentary, your term "quote" does not apply. By all normal copyright laws, you are apparently planning theft. Otherwise, why worry about Google.
Yes, you will get nailed. Stealing content written by others, just to feed it to GoogleBot, should never be rewarded.
The fact that it is not in Google Books likely have no impact. Google is likely to have many more books loaded in their databases than the ones showing publicly on their Books site. The publicly viewable Google Books is merely what they have deemed legally "publishable" by them and have flipped the switch for at any given time. You can be sure they will recognize others. OCR'ing books is not a new scam. Google know to protect themselves from that.
9:19 pm on Feb 16, 2012 (gmt 0)
Uhm, did the OP say he was planning to use books that are under copyright?
If they're not on Google Books now, they will be. For some reason, the People In Charge think that Google Books is WorldCat; they will list books even if they don't show a single page of content.
10:19 pm on Feb 16, 2012 (gmt 0)
If content is not under copyright, and is of any value, you can be pretty sure that Google has already loaded it. So probably have a thousand other funky web-sites.
Any current material is typically automatically copyrighted for the life of the author + 70 years. And if the author have put it into the public domain (as in given up his/her rights entirely, not merely a GPL), it is likely to exist in many, many copies on the net already. In a big batch down on page 950+ in Google search.
Anything else is ripe for a DMCA request if it suddenly shows up on the web. Causing the violating site to be immediately eliminated from Google search. As such eliminating the whole idea of using it to attract GoogleBot to begin with.
All authors (whether book- or web-based) have to do is to set up a few Google Alerts for key phrases from their content, for Google to automatically alert them when their content suddenly magically shows up somewhere they did not expect. I have quite a few such alerts set up. On receipt of an alert, followed up with a few clicks on Google's DMCA complaint page, and the offending site (or parts of it) gets banned from Google search. First the site-results show up (sort of), but any attempt from users to use it simply jumps off to the DMCA complaint against the site. After the DMCA have been determined, the site's results are killed off. Google cannot afford (politically speaking) to be in violation of copyright laws. They already tried going down that path with the initial version of the Google Books site. And they are big enough now to be under watchful political eyes in multiple countries. The last thing Google would want is to go the way of old Ma Bell because they get deemed too powerful.
Not to mention what else might follow if the copyright owner decides to follow up further, after the initial DMCA complaint. (Unless as you say it might be content specifically in the public domain.)
11:29 pm on Feb 16, 2012 (gmt 0)
I'd like to quote some text from books on a site
I read this as "some text" which, IMO should fall under Fair Use (US) or Fair Dealing (UK, Commonwealth). If properly attributed, there's no problems in doing this. That said, Fair Use is SHORT quotes for the purpose of illustration/scholarly report, with emphasis on SHORT. What that number of words might be is rather fluid, though if the work is already short and 90% is "quoted" then Fair Use has been exceeded. Additionally, if these are true IMAGE scans of a printed page, not ocr'd to text then to html, etc., that would also seem okay under Fair Use. Example:
Scan of an early 18th century chapbook page showing typesetting and illustration that is included in an article regarding the printing industry.
Material published by the U.S. Federal government has no copyright, ever. Does not apply to State governments!
Material published in the U.S. before 1989 without a copyright notice (in practice, this seems to apply most often to small university presses) is in the public domain.
Material published by Americans, in the U.S., before 1964 and not renewed is in the public domain. (Authors' descendants have been known to throw tantrums over this rule.)
Not all countries are Life+70. Many are Life+50. Some are Life+ more than 70. Some have special rules for special circumstances. Crown Copyrights run on their own schedule. Don't touch Peter Pan.
1:56 am on Feb 17, 2012 (gmt 0)
I provided the link above for a reason. It has all that info, plus the 1932, 1964, 1972 etc. exceptions, and the handful of public domain possibilities which do not fit with any of the above. :)
Additionally, it defines the terms of 50, 70, 95, and 120 years which can applied, as well as define the difference between corporate work and creator work.
More fun, discusses Trademark and how it is applied, which often comes up during copyright discussions.
9:12 am on Feb 17, 2012 (gmt 0)
Thank you for the interesting comments... looks as though there is quite a bit of legislation to swot up on although it might differ country to country...? Not sure how Google manages that, for say a UK author... who has not yet published their work on the internet. Which I guess would mean you are OK, that is until they complain and take you down. But by then, hopefully one would have acrued enough of a critical mass of users to self perpetuate enough unique content so one does not have to rely on all the "quoted material".
I saw another guy mention in another thread he was scanning mercilesslly thousands of pages of book content. Which is not good. But... if I am brutally honest I've done it once before on a much smaller scale just to give me a kick start on the ladder, and eventually it all transformed into unique. I was blazing into the sunset before getting stung.
By the way I would not want to encourage this.
9:37 am on Feb 17, 2012 (gmt 0)
Have you thought about creating something original yourself, instead of "kick starting" with content others invested their time in creating, and then "blazing into the sunset" before you get caught?
10:25 am on Feb 17, 2012 (gmt 0)
DeeCee... for every maverick there is a 'steady eddy'... AS you can see from the thrust of this thread it's a fine line. I'd try not to negatively generalize without context, understanding the degree and as we mention it's important creditting the author. It also depends on your strategy and personal style of doing business. Zuckerberg is a prime case study skirting around the fringes. So long as everyone as a whole is winning that is the main goal.
Back on track with the original conversation... let's not discount Bing and Yahoo...