Forum Moderators: open
NY Times article [nytimes.com]
Google, the operator of the world's most popular Internet search service, plans to announce an agreement Tuesday with some of the nation's leading research libraries and Oxford University to begin converting their holdings into digital files that would be freely searchable over the Web.It may be only a step on a long road toward the long-predicted global virtual library. But the collaboration of Google and research institutions that also include Harvard, the University of Michigan, Stanford and the New York Public Library is a major stride in an ambitious Internet effort by various parties. The goal is to expand the Web beyond its current valuable, if eclectic, body of material and create a digital card catalog and searchable library for the world's books, scholarly papers and special collections.
Particularly after the Google Scholar announcement, I think it's another significant step forward.
[edited by: vitaplease at 9:12 am (utc) on Dec. 14, 2004]
[edit reason] link to original article [/edit]
also "some" library information will be there quite soon:
The Google effort and others like it that are already under way, including projects by the Library of Congress to put selections of its best holdings online, are part of a trend to potentially democratize access to information that has long been available to only small, select groups of students and scholarsLast night the Library of Congress and a group of international libraries from the United States, Canada, Egypt, China and the Netherlands announced a plan to create a publicly available digital archive of one million books on the Internet. The group said it planned to have 70,000 volumes online by next April
The libraries of five of the world's most important academic institutions are to be digitised by Google. Scanned pages from books in the public domain will then be made available for search and reading online.
BBC Article [news.bbc.co.uk]
And the libraries in question: Michigan, Stanford, Harvard, Oxford and the New York Public Library
Anymore sightings in the wild?
..more than 15 million books and other documents covered in the agreements. Librarians involved predict the project could take at least a decade.
Lets say 15 years...
1 million books a year...
235 working days a year...
4255 books & documents a day...
that looks like a military excercise, even if they find volunteers
and use:
Two small start-up companies, 4DigitalBooks of St. Aubin, Switzerland, and Kirtas Technologies of Victor, N.Y., are selling systems that automatically turn pages to capture images.
the article comments on:
Google's technology is more labor-intensive than systems that are already commercially available
I'd assume so, if they want to get the Optical Character Recognition as well done as in the excellent Google catalogs:
[catalogs.google.com...]
Yet:
At Stanford, Google hopes to be able to scan 50,000 pages a day within the month, eventually doubling that rate, according to a person involved in the project.
200 pages a book on average? = 250 books a day to be doubled to 500 books a day..or eight library scanners as efficient as Standord's to complete it in 15 years..
It seems like this is the nail in all the "start up" engines' coffins for at least a decade. How could anyone compete against Google after this w/o their own "library"?
I wonder how MSN search will react to THIS.
..others involved estimate the figure at $10 for each of the more than 15 million books and other documents covered in the agreements.
and the deal is not exclusive
$150 is not an insurmountable barrier to entry, but it does really look like the consolidation in the SE world has taken place.
Hmm - anyone see the irony in that?
I want to sneak up on them and ask about the sandbox
I doubt the hourly people that they send for this project will know anything about search.
Better to ask them some particulars about what they are doing... like how long to scan a book, how are they storing the data, how they THINK it will be incorporated in search, etc...
If you push back the optimized results, then the only option to get listed noticably is Adwords.
If I were Google, I'd try to make as much money as I could before I self-destructed, as well. Unless they're planning to buy Yahoo! or MSN, I wish them good luck - and good riddance.
You can dress up a pig, but it's still a pig.
Particularly after the Google Scholar announcement, I think it's another significant step forward.
This announcement--working with libraries and scanning significant amounts of offline books to allow people to search over them--along with Google Scholar (raising awareness of peer-reviewed and research papers), are two things I'm especially proud that we did this year.
This is a sizable commitment of time and effort, but it's absolutely worth it to my mind. I think that projects like catalogs.google.com were a great start for figuring out the issues with scanning books. I guess that seed was planted a while ago:
[researchbuzz.org...]
Ah maybe this is the plan. Google has enough data from the web. And most new sites are getting too spammy so Florida stems the flow from new sites, but let's add non web data into the mix and send less traffic to web sites.
It may be part of the plan.
If you read the Google mission statement: [google.com...] you'll note that it has never been their mission to provide traffic to websites.
A large number of people are attempting to secure their income by exploiting what is a side-effect of Google's purpose.
If I was one of them, I'd be deeply worried just in general terms, regardless of this development.
The reason is that this is a local story for New Yorkers. Google has made a deal with the New York Public Library.
Bad idea...
Unless they are rolling in their graves I don't think they will care much. They are only scanning books that have fallen out of copyright and into the public domain. The Wikipedia article on 'Public domain' gives us the guidelines (in the US--for Oxford it will be a bit different):
* The work was created and first published before January 1, 1923, or at least 95 years before January 1 of the current year, whichever is later.
* The last surviving author died at least 70 years before January 1 of the current year.
* No Berne Convention signatory has passed a perpetual copyright on the work.
* Neither the United States nor the European Union has passed a copyright term extension since these conditions were last updated. (This must be a condition because the exact numbers in the other conditions depend on the state of the law at any given moment.)
At least one of the libraries that el GOOG is dealing with has authorized only works published before the year 1900.
I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.
This is a significant step in organizing and preserving the history and scholarship of the world, and it belongs above the fold.
It's also a significant step in the history of the internet. It's not just about a local New York business deal.
Once again... thank you, Google.
That will be an interesting legal issue - I wonder if judges would rule it under similar "library-royalty-free or royalty-limited viewing"
Doing a Google search for: "books subject"
and checking the content on sometimes Serp suggested:
[print.google.com...]
shows only a limited image of a page, the specific text as far as I can see cannot be selected.
Suppose the final situation will be similar in the public field (non-search engine area), that is text-wise totally indexed and searchable, but no food for spiders or text copying.
It would be great if eventually some sort of page-view royalty fee would be funded towards authors as can happen in some countries towards authors in the library structure? but then authors would probably start to search for themselves? ;)
[internetnews.com...]
Many of the finest writings are so good they underlie the character of our societies. Including them among the search results is a great improvement. Much better than reading interpretations of them, or the latest crackpot theories from websites. You can't hire content writers like these anymore....
This is a couple of orders of magnitude bigger and likely to be far more useful.
A very bold move on Google's part.
I'm concerned though that this is an undertaking of a company and not a "not for profit organisation", such as gutenburg - this is a vast part of mankinds heritage - digitising and making it accessible it is a great thing.