Google adds major libraries to its database - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Google adds major libraries to its database

hot on the heels of the Google Scholar announcement

1
2
»

Robert Charlton

9:05 am on Dec 14, 2004 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Google adds major libraries to its database

NY Times article [nytimes.com]

Google, the operator of the world's most popular Internet search service, plans to announce an agreement Tuesday with some of the nation's leading research libraries and Oxford University to begin converting their holdings into digital files that would be freely searchable over the Web.
It may be only a step on a long road toward the long-predicted global virtual library. But the collaboration of Google and research institutions that also include Harvard, the University of Michigan, Stanford and the New York Public Library is a major stride in an ambitious Internet effort by various parties. The goal is to expand the Web beyond its current valuable, if eclectic, body of material and create a digital card catalog and searchable library for the world's books, scholarly papers and special collections.

Particularly after the Google Scholar announcement, I think it's another significant step forward.

[edited by: vitaplease at 9:12 am (utc) on Dec. 14, 2004]
[edit reason] link to original article [/edit]

vitaplease

9:21 am on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

very interesting developments...

also "some" library information will be there quite soon:

The Google effort and others like it that are already under way, including projects by the Library of Congress to put selections of its best holdings online, are part of a trend to potentially democratize access to information that has long been available to only small, select groups of students and scholars
Last night the Library of Congress and a group of international libraries from the United States, Canada, Egypt, China and the Netherlands announced a plan to create a publicly available digital archive of one million books on the Internet. The group said it planned to have 70,000 volumes online by next April

gethan

10:35 am on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The libraries of five of the world's most important academic institutions are to be digitised by Google. Scanned pages from books in the public domain will then be made available for search and reading online.

BBC Article [news.bbc.co.uk]

And the libraries in question: Michigan, Stanford, Harvard, Oxford and the New York Public Library

Anymore sightings in the wild?

Clark

11:34 am on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Ah maybe this is the plan. Google has enough data from the web. And most new sites are getting too spammy so Florida stems the flow from new sites, but let's add non web data into the mix and send less traffic to web sites.

Mr Bo Jangles

12:20 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

Does anyone have knowledge of just how a project of this enormity would be completed in a reasonable time frame? Just how would you scan so much stuff - can someone provide info of how any like project was done?

bloke in a box

12:32 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

A couple of monkeys with typewriters and a long period of time.. ;)

vitaplease

2:07 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>>how a project of this enormity would be completed in a reasonable time frame?

..more than 15 million books and other documents covered in the agreements. Librarians involved predict the project could take at least a decade.

Lets say 15 years...
1 million books a year...
235 working days a year...
4255 books & documents a day...

that looks like a military excercise, even if they find volunteers
and use:

Two small start-up companies, 4DigitalBooks of St. Aubin, Switzerland, and Kirtas Technologies of Victor, N.Y., are selling systems that automatically turn pages to capture images.

the article comments on:

Google's technology is more labor-intensive than systems that are already commercially available

I'd assume so, if they want to get the Optical Character Recognition as well done as in the excellent Google catalogs:

[catalogs.google.com...]

Yet:

At Stanford, Google hopes to be able to scan 50,000 pages a day within the month, eventually doubling that rate, according to a person involved in the project.

200 pages a book on average? = 250 books a day to be doubled to 500 books a day..or eight library scanners as efficient as Standord's to complete it in 15 years..

Mr Bo Jangles

2:45 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

very interesting! thank you vitaplease

SEOMike

3:38 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Wow... Something like this will only solidify Google as the #1 engine in the world. How is any other company going to come up with the resources / money for this from startup? In order for a company to even come CLOSE to being able compete, they are going to have to be a world class player to begin with.

It seems like this is the nail in all the "start up" engines' coffins for at least a decade. How could anyone compete against Google after this w/o their own "library"?

I wonder how MSN search will react to THIS.

vitaplease

4:04 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>> "start up" engines' coffins...

..others involved estimate the figure at $10 for each of the more than 15 million books and other documents covered in the agreements.

and the deal is not exclusive

$150 is not an insurmountable barrier to entry, but it does really look like the consolidation in the SE world has taken place.

gethan

4:07 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Ok - so this is very very interesting. If the pages are available online for reading and the works are in the public domain - anyone can copy the works and do with them what they wish - so that's how a startup or other engine can compete. Google does the digitising and everyone else spiders their archive.

Hmm - anyone see the irony in that?

hdpt00

4:10 pm on Dec 14, 2004 (gmt 0)

Anyone have a schedule when they are going to these schools and which libraries. I attend one of them, I want to sneak up on them and ask about the sandbox. This makes it a bit easier to ask. If anyone has the schedule where they will be attending and the exact names of the library, let me know and I'll be put to work!

jimbeetle

4:15 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

Hmmmm.

SEOMike

4:39 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I want to sneak up on them and ask about the sandbox

I doubt the hourly people that they send for this project will know anything about search.

Better to ask them some particulars about what they are doing... like how long to scan a book, how are they storing the data, how they THINK it will be incorporated in search, etc...

HyperGeek

4:40 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

How much do you want to bet that, in less than a couple of years, these library results will take up the first 1-3 SERPs for every major keyphrase.

If you push back the optimized results, then the only option to get listed noticably is Adwords.

If I were Google, I'd try to make as much money as I could before I self-destructed, as well. Unless they're planning to buy Yahoo! or MSN, I wish them good luck - and good riddance.

You can dress up a pig, but it's still a pig.

GoogleGuy

4:44 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Particularly after the Google Scholar announcement, I think it's another significant step forward.

This announcement--working with libraries and scanning significant amounts of offline books to allow people to search over them--along with Google Scholar (raising awareness of peer-reviewed and research papers), are two things I'm especially proud that we did this year.

This is a sizable commitment of time and effort, but it's absolutely worth it to my mind. I think that projects like catalogs.google.com were a great start for figuring out the issues with scanning books. I guess that seed was planted a while ago:
[researchbuzz.org...]

victor

4:57 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Ah maybe this is the plan. Google has enough data from the web. And most new sites are getting too spammy so Florida stems the flow from new sites, but let's add non web data into the mix and send less traffic to web sites.

It may be part of the plan.

If you read the Google mission statement: [google.com...] you'll note that it has never been their mission to provide traffic to websites.

A large number of people are attempting to secure their income by exploiting what is a side-effect of Google's purpose.

If I was one of them, I'd be deeply worried just in general terms, regardless of this development.

blaketar

5:40 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

I wonder if the content will be in the Sandbox for some amount of time until some backlinks to their new content appear :)

whoisgregg

6:32 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

It's nice to see a private company undertaking a public works project. Very pleased, very impressed. :D

Chndru

6:33 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Very pleased, very impressed.

Yup. I am excited too ;)

BReflection

6:35 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

The reason is that this is a local story for New Yorkers. Google has made a deal with the New York Public Library.

itisgene

6:37 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

Well, I don't think it is a good idea. Who owns the copyright of those books? the universities? Google? I think not. There must be millions of writers who are not happy about publishing their work for free without asking them. If google does this, what is the difference between Music file swap/server and Google Libraries?

Bad idea...

BReflection

6:50 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

Well, I don't think it is a good idea. Who owns the copyright of those books? the universities? Google? I think not. There must be millions of writers who are not happy about publishing their work for free without asking them. If google does this, what is the difference between Music file swap/server and Google Libraries?

Unless they are rolling in their graves I don't think they will care much. They are only scanning books that have fallen out of copyright and into the public domain. The Wikipedia article on 'Public domain' gives us the guidelines (in the US--for Oxford it will be a bit different):

* The work was created and first published before January 1, 1923, or at least 95 years before January 1 of the current year, whichever is later.
* The last surviving author died at least 70 years before January 1 of the current year.
* No Berne Convention signatory has passed a perpetual copyright on the work.
* Neither the United States nor the European Union has passed a copyright term extension since these conditions were last updated. (This must be a condition because the exact numbers in the other conditions depend on the state of the law at any given moment.)

At least one of the libraries that el GOOG is dealing with has authorized only works published before the year 1900.

Robert Charlton

6:51 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

I was quite surprised at the way the Times treated this story. In the print version it was front page, above the fold, right-hand column -- the space reserved for the most important news of the day.

This is a significant step in organizing and preserving the history and scholarship of the world, and it belongs above the fold.

It's also a significant step in the history of the internet. It's not just about a local New York business deal.

Once again... thank you, Google.

vitaplease

7:10 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>>Well, I don't think it is a good idea. Who owns the copyright of those books?

That will be an interesting legal issue - I wonder if judges would rule it under similar "library-royalty-free or royalty-limited viewing"

Doing a Google search for: "books subject"

and checking the content on sometimes Serp suggested:

[print.google.com...]

shows only a limited image of a page, the specific text as far as I can see cannot be selected.

Suppose the final situation will be similar in the public field (non-search engine area), that is text-wise totally indexed and searchable, but no food for spiders or text copying.

It would be great if eventually some sort of page-view royalty fee would be funded towards authors as can happen in some countries towards authors in the library structure? but then authors would probably start to search for themselves? ;)

[internetnews.com...]

ebizcamp

8:30 pm on Dec 14, 2004 (gmt 0)

10+ Year Member

Nothing special. Google is good in marketing. Dialog has much more valuable information than google.

treeline

9:09 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Many younger people think that researching on the internet IS researching an issue completely. All the sources that came before and haven't made it onto the internet hardly exist for them.

Many of the finest writings are so good they underlie the character of our societies. Including them among the search results is a great improvement. Much better than reading interpretations of them, or the latest crackpot theories from websites. You can't hire content writers like these anymore....

Clark

9:15 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You're right about the marketing. This seems very similar to Amazon's Search inside the Book. But I don't see many people using Amazon's search engine to access it.

I do from time to time, but somehow have not gotten into the habit, I don't know why.

hunderdown

9:24 pm on Dec 14, 2004 (gmt 0)

Amazon's Search Inside the Book is far more limited and doesn't work well. That's why you (and others) don't use it!

This is a couple of orders of magnitude bigger and likely to be far more useful.

A very bold move on Google's part.

gethan

10:23 pm on Dec 14, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

For the record - I think this is a good thing in principle - I hope google have some left :) They started out with many good ones.

I'm concerned though that this is an undertaking of a company and not a "not for profit organisation", such as gutenburg - this is a vast part of mankinds heritage - digitising and making it accessible it is a great thing.

This 59 message thread spans 2 pages: 59

1
2
»