homepage Welcome to WebmasterWorld Guest from 54.211.219.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 59 message thread spans 2 pages: 59 ( [1] 2 > >     
Google adds major libraries to its database
hot on the heels of the Google Scholar announcement
Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 27080 posted 9:05 am on Dec 14, 2004 (gmt 0)

Google adds major libraries to its database

NY Times article [nytimes.com]

Google, the operator of the world's most popular Internet search service, plans to announce an agreement Tuesday with some of the nation's leading research libraries and Oxford University to begin converting their holdings into digital files that would be freely searchable over the Web.

It may be only a step on a long road toward the long-predicted global virtual library. But the collaboration of Google and research institutions that also include Harvard, the University of Michigan, Stanford and the New York Public Library is a major stride in an ambitious Internet effort by various parties. The goal is to expand the Web beyond its current valuable, if eclectic, body of material and create a digital card catalog and searchable library for the world's books, scholarly papers and special collections.

Particularly after the Google Scholar announcement, I think it's another significant step forward.

[edited by: vitaplease at 9:12 am (utc) on Dec. 14, 2004]
[edit reason] link to original article [/edit]

 

vitaplease

WebmasterWorld Senior Member vitaplease us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 9:21 am on Dec 14, 2004 (gmt 0)

very interesting developments...

also "some" library information will be there quite soon:

The Google effort and others like it that are already under way, including projects by the Library of Congress to put selections of its best holdings online, are part of a trend to potentially democratize access to information that has long been available to only small, select groups of students and scholars

Last night the Library of Congress and a group of international libraries from the United States, Canada, Egypt, China and the Netherlands announced a plan to create a publicly available digital archive of one million books on the Internet. The group said it planned to have 70,000 volumes online by next April


gethan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 10:35 am on Dec 14, 2004 (gmt 0)

The libraries of five of the world's most important academic institutions are to be digitised by Google. Scanned pages from books in the public domain will then be made available for search and reading online.

BBC Article [news.bbc.co.uk]

And the libraries in question: Michigan, Stanford, Harvard, Oxford and the New York Public Library

Anymore sightings in the wild?

Clark

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 11:34 am on Dec 14, 2004 (gmt 0)

Ah maybe this is the plan. Google has enough data from the web. And most new sites are getting too spammy so Florida stems the flow from new sites, but let's add non web data into the mix and send less traffic to web sites.

Mr Bo Jangles

10+ Year Member



 
Msg#: 27080 posted 12:20 pm on Dec 14, 2004 (gmt 0)

Does anyone have knowledge of just how a project of this enormity would be completed in a reasonable time frame? Just how would you scan so much stuff - can someone provide info of how any like project was done?

bloke in a box

10+ Year Member



 
Msg#: 27080 posted 12:32 pm on Dec 14, 2004 (gmt 0)

A couple of monkeys with typewriters and a long period of time.. ;)

vitaplease

WebmasterWorld Senior Member vitaplease us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 2:07 pm on Dec 14, 2004 (gmt 0)

>>how a project of this enormity would be completed in a reasonable time frame?

..more than 15 million books and other documents covered in the agreements. Librarians involved predict the project could take at least a decade.

Lets say 15 years...
1 million books a year...
235 working days a year...
4255 books & documents a day...

that looks like a military excercise, even if they find volunteers
and use:

Two small start-up companies, 4DigitalBooks of St. Aubin, Switzerland, and Kirtas Technologies of Victor, N.Y., are selling systems that automatically turn pages to capture images.

the article comments on:
Google's technology is more labor-intensive than systems that are already commercially available

I'd assume so, if they want to get the Optical Character Recognition as well done as in the excellent Google catalogs:

[catalogs.google.com...]

Yet:
At Stanford, Google hopes to be able to scan 50,000 pages a day within the month, eventually doubling that rate, according to a person involved in the project.

200 pages a book on average? = 250 books a day to be doubled to 500 books a day..or eight library scanners as efficient as Standord's to complete it in 15 years..

Mr Bo Jangles

10+ Year Member



 
Msg#: 27080 posted 2:45 pm on Dec 14, 2004 (gmt 0)

very interesting! thank you vitaplease

SEOMike

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 3:38 pm on Dec 14, 2004 (gmt 0)

Wow... Something like this will only solidify Google as the #1 engine in the world. How is any other company going to come up with the resources / money for this from startup? In order for a company to even come CLOSE to being able compete, they are going to have to be a world class player to begin with.

It seems like this is the nail in all the "start up" engines' coffins for at least a decade. How could anyone compete against Google after this w/o their own "library"?

I wonder how MSN search will react to THIS.

vitaplease

WebmasterWorld Senior Member vitaplease us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 4:04 pm on Dec 14, 2004 (gmt 0)

>> "start up" engines' coffins...

..others involved estimate the figure at $10 for each of the more than 15 million books and other documents covered in the agreements.

and the deal is not exclusive

$150 is not an insurmountable barrier to entry, but it does really look like the consolidation in the SE world has taken place.

gethan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 4:07 pm on Dec 14, 2004 (gmt 0)

Ok - so this is very very interesting. If the pages are available online for reading and the works are in the public domain - anyone can copy the works and do with them what they wish - so that's how a startup or other engine can compete. Google does the digitising and everyone else spiders their archive.

Hmm - anyone see the irony in that?

hdpt00



 
Msg#: 27080 posted 4:10 pm on Dec 14, 2004 (gmt 0)

Anyone have a schedule when they are going to these schools and which libraries. I attend one of them, I want to sneak up on them and ask about the sandbox. This makes it a bit easier to ask. If anyone has the schedule where they will be attending and the exact names of the library, let me know and I'll be put to work!

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 4:15 pm on Dec 14, 2004 (gmt 0)

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

Hmmmm.

SEOMike

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 4:39 pm on Dec 14, 2004 (gmt 0)

I want to sneak up on them and ask about the sandbox

I doubt the hourly people that they send for this project will know anything about search.

Better to ask them some particulars about what they are doing... like how long to scan a book, how are they storing the data, how they THINK it will be incorporated in search, etc...

HyperGeek

10+ Year Member



 
Msg#: 27080 posted 4:40 pm on Dec 14, 2004 (gmt 0)

How much do you want to bet that, in less than a couple of years, these library results will take up the first 1-3 SERPs for every major keyphrase.

If you push back the optimized results, then the only option to get listed noticably is Adwords.

If I were Google, I'd try to make as much money as I could before I self-destructed, as well. Unless they're planning to buy Yahoo! or MSN, I wish them good luck - and good riddance.

You can dress up a pig, but it's still a pig.

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 4:44 pm on Dec 14, 2004 (gmt 0)

Particularly after the Google Scholar announcement, I think it's another significant step forward.

This announcement--working with libraries and scanning significant amounts of offline books to allow people to search over them--along with Google Scholar (raising awareness of peer-reviewed and research papers), are two things I'm especially proud that we did this year.

This is a sizable commitment of time and effort, but it's absolutely worth it to my mind. I think that projects like catalogs.google.com were a great start for figuring out the issues with scanning books. I guess that seed was planted a while ago:
[researchbuzz.org...]

victor

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 4:57 pm on Dec 14, 2004 (gmt 0)

Ah maybe this is the plan. Google has enough data from the web. And most new sites are getting too spammy so Florida stems the flow from new sites, but let's add non web data into the mix and send less traffic to web sites.

It may be part of the plan.

If you read the Google mission statement: [google.com...] you'll note that it has never been their mission to provide traffic to websites.

A large number of people are attempting to secure their income by exploiting what is a side-effect of Google's purpose.

If I was one of them, I'd be deeply worried just in general terms, regardless of this development.

blaketar

10+ Year Member



 
Msg#: 27080 posted 5:40 pm on Dec 14, 2004 (gmt 0)

I wonder if the content will be in the Sandbox for some amount of time until some backlinks to their new content appear :)

whoisgregg

WebmasterWorld Senior Member whoisgregg us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 6:32 pm on Dec 14, 2004 (gmt 0)

It's nice to see a private company undertaking a public works project. Very pleased, very impressed. :D

Chndru

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 6:33 pm on Dec 14, 2004 (gmt 0)

Very pleased, very impressed.

Yup. I am excited too ;)

BReflection

10+ Year Member



 
Msg#: 27080 posted 6:35 pm on Dec 14, 2004 (gmt 0)

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

The reason is that this is a local story for New Yorkers. Google has made a deal with the New York Public Library.

itisgene

10+ Year Member



 
Msg#: 27080 posted 6:37 pm on Dec 14, 2004 (gmt 0)

Well, I don't think it is a good idea. Who owns the copyright of those books? the universities? Google? I think not. There must be millions of writers who are not happy about publishing their work for free without asking them. If google does this, what is the difference between Music file swap/server and Google Libraries?

Bad idea...

BReflection

10+ Year Member



 
Msg#: 27080 posted 6:50 pm on Dec 14, 2004 (gmt 0)

Well, I don't think it is a good idea. Who owns the copyright of those books? the universities? Google? I think not. There must be millions of writers who are not happy about publishing their work for free without asking them. If google does this, what is the difference between Music file swap/server and Google Libraries?

Unless they are rolling in their graves I don't think they will care much. They are only scanning books that have fallen out of copyright and into the public domain. The Wikipedia article on 'Public domain' gives us the guidelines (in the US--for Oxford it will be a bit different):

* The work was created and first published before January 1, 1923, or at least 95 years before January 1 of the current year, whichever is later.
* The last surviving author died at least 70 years before January 1 of the current year.
* No Berne Convention signatory has passed a perpetual copyright on the work.
* Neither the United States nor the European Union has passed a copyright term extension since these conditions were last updated. (This must be a condition because the exact numbers in the other conditions depend on the state of the law at any given moment.)

At least one of the libraries that el GOOG is dealing with has authorized only works published before the year 1900.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 27080 posted 6:51 pm on Dec 14, 2004 (gmt 0)

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

This is a significant step in organizing and preserving the history and scholarship of the world, and it belongs above the fold.

It's also a significant step in the history of the internet. It's not just about a local New York business deal.

Once again... thank you, Google.

vitaplease

WebmasterWorld Senior Member vitaplease us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27080 posted 7:10 pm on Dec 14, 2004 (gmt 0)

>>Well, I don't think it is a good idea. Who owns the copyright of those books?

That will be an interesting legal issue - I wonder if judges would rule it under similar "library-royalty-free or royalty-limited viewing"

Doing a Google search for: "books subject"

and checking the content on sometimes Serp suggested:

[print.google.com...]

shows only a limited image of a page, the specific text as far as I can see cannot be selected.

Suppose the final situation will be similar in the public field (non-search engine area), that is text-wise totally indexed and searchable, but no food for spiders or text copying.

It would be great if eventually some sort of page-view royalty fee would be funded towards authors as can happen in some countries towards authors in the library structure? but then authors would probably start to search for themselves? ;)

[internetnews.com...]

ebizcamp

10+ Year Member



 
Msg#: 27080 posted 8:30 pm on Dec 14, 2004 (gmt 0)

Nothing special. Google is good in marketing. Dialog has much more valuable information than google.

treeline

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 9:09 pm on Dec 14, 2004 (gmt 0)

Many younger people think that researching on the internet IS researching an issue completely. All the sources that came before and haven't made it onto the internet hardly exist for them.

Many of the finest writings are so good they underlie the character of our societies. Including them among the search results is a great improvement. Much better than reading interpretations of them, or the latest crackpot theories from websites. You can't hire content writers like these anymore....

Clark

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 9:15 pm on Dec 14, 2004 (gmt 0)

You're right about the marketing. This seems very similar to Amazon's Search inside the Book. But I don't see many people using Amazon's search engine to access it.

I do from time to time, but somehow have not gotten into the habit, I don't know why.

hunderdown

10+ Year Member



 
Msg#: 27080 posted 9:24 pm on Dec 14, 2004 (gmt 0)

Amazon's Search Inside the Book is far more limited and doesn't work well. That's why you (and others) don't use it!

This is a couple of orders of magnitude bigger and likely to be far more useful.

A very bold move on Google's part.

gethan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27080 posted 10:23 pm on Dec 14, 2004 (gmt 0)

For the record - I think this is a good thing in principle - I hope google have some left :) They started out with many good ones.

I'm concerned though that this is an undertaking of a company and not a "not for profit organisation", such as gutenburg - this is a vast part of mankinds heritage - digitising and making it accessible it is a great thing.

This 59 message thread spans 2 pages: 59 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved