Microsoft to put Books from British Library Online

Forum Moderators: mack

Message Too Old, No Replies

Microsoft to put Books from British Library Online

engine

11:55 am on Nov 7, 2005 (gmt 0)

Software giant Microsoft Corp. said Friday it has signed a deal to scan and put online 100,000 books from the British Library.

Microsoft to put British Library books online [businessweek.com]

Receptional

1:49 pm on Nov 7, 2005 (gmt 0)

Interesting find Engine.

So - some maths. $2.5 million investment divided by 100,000 books is $25 per book - about UKŁ15. From that moment on, nobody needs to buy a copy of the book again, potentially.

Tough break for the village of Hay-on-Wye - the countries foremost location for buying out of date and rare titled second hand books. Great break if you can't get to Hay-on-Wye.

There's got to be some nifty equipment to be able to scan 100,000 books and retrieve the data reliably for Ł15 per book. Not exactly a Ł50 scanner from PC World! :)

Also - Once scanned, the data will immediately be available on the British Library website and indexed by Google! Interesting times.

engine

8:21 pm on Nov 7, 2005 (gmt 0)

I can see the value in creating the ability to search for information in a library. However, I'm not keen on the whole book being available. Authors will suffer in the long term unless there is going to be some kind of royalty paid.

IanKelley

7:45 am on Nov 8, 2005 (gmt 0)

I can't wait until the mob of books.Google detractors sees this one :-)

davec

3:09 pm on Nov 8, 2005 (gmt 0)

It does say they are only choosing books well out of copyright.

ergophobe

8:34 pm on Nov 8, 2005 (gmt 0)

the library says its deal with Microsoft is not exclusive -- the scanned books will be posted on the British Library's own Web site, currently freely searchable through Google.

When libraries were started, they were private, then some member-only libraries were born. Finally, the idea of a free and open library for the public was born and the most important libraries follow this policy (Bib. Nationale, British Library, Library of Congress) or something close (university libraries that generally allow public access, though not checkout or stack access).

If Microsoft wants to go and scan the books of the world that are out of copyright and then let that content be available on the BL site, that's fantastic. The Bibliothčque Nationale has thousands of public domain books online already, but they're in image scans, so not searchable. This will be a great improvement. If Google wants to do the same, that's great too.

What I object to in the Google thing are:

1. The hyped up press-release about a beta test with public domain books, and yet all the obvious searches yield sales pitches for unrelated or loosely related books for sale. In other words, they have made an announcement and failed to deliver. It should at this point not be public. We'll see if MS is better or worse when the time comes.

2. The idea that I'm supposed to like Google because they're a "do no evil" company and should patronize them just like I should patronize Ben and Jerry's, whether I like their ice cream or not, because they donate profits to charity. I'll do good in my own fashion and patronize businesses based on the quality of the offerings. Microsoft has no illusions about how they are loved around they world, so they don't act silly about that sort of thing. They expect to sell their products through either quality or strongarm tactics. No illusions. If the Microsoft book offerings are superior, that's where I'll do my book searches. If not, I won't. Simple.

3. Copyright issues. Some books are read (e.g. novels), and some books are consulted (e.g. dictionaries). My books are of "consult" sort, so if a search engine shows a paragraph or two, that really obviates the need for most people to buy the book. Fair Use says that researches can use small extracts in their publications, but Google would essentially be monetizing my content and probably impacting sales. I believe that is in violation of copyright. Why would I publish a dictionary if Google is going to go scan it in and show snippets of 3-4 lines? Why would I buy a dictionary if Google had that available? Microsoft is choosing books "from the older end of the library's vast collection of 13 million titles". These are the books that are most likely to be rare and very hard to find and, of course, there are no copyright restrictions. I see this as providing great value.

Microsoft has put up with accusations of evil and malfeasance for so long, that I actually thing they are

1. just plain smarter on these issues
2. way better at thinking about how everything they do, good or bad, can be spun to make them look bad.

Google needs to learn those lessons.

IanKelley

3:28 am on Nov 9, 2005 (gmt 0)

This is clearly just the frist step. MS is waiting to see how the lawsuits turn out against Google. If books.Google survives (technologically and legally) then MS will do exactly the same thing.

That's just how MS does things, and it works very well for them.

walkman

4:05 am on Nov 9, 2005 (gmt 0)

>> This is clearly just the frist step. MS is waiting to see how the lawsuits turn out against Google. If books.Google survives (technologically and legally) then MS will do exactly the same thing.

Not a bad strategy. The issue will settle eventually. The judge will make a major ruling and one side will push for a settlement. Not wanting to risk a loss, both sides will sit down and agree to something. Among other things, as part of the settlement, Google must hand to the publishers the digital versions, so they can give it to Microsoft, Yahoo, AOL and Amazon.com. :)

Dante

11:58 pm on Nov 9, 2005 (gmt 0)

Why would I buy a dictionary if Google had that available?

Perhaps you wouldn't, but millions of others still do.

Merriam Webster's Collegiate Dictionary is still ranked in the top 500 books sold at Amazon out of millions of other titles.

ergophobe

5:31 pm on Nov 10, 2005 (gmt 0)

True, but that's the best case for a dictionary: a small, cheap, mass market general dictionary with huge sales anyway and the cost of the dictionary in print is low compared to the convenience gained.

If you look, however, at low-circulation dictionaries on specialized topics, you might be looking at more like $100 or $150 for a 300-page book and if the publisher's sales drop 10% it may no longer be worth it to publish. If it is no longer worth it to publish, it probably wouldn't be worth it to produce. And yet, it was useful enough for a few thousand people to pony up $150.

My publisher, who publishes a number of books that would fit this description ("consultation" books where people want to look up a few brief sections, not read the book cover to cover), noticed a big hit during the period that photocopiers dropped in price and became omnipresent. There are a number of titles that were pushed from profitable to unprofitable.

You might say, then, why put it in print at all? Since these books are research titles, they really should be published in an archival-grade medium and an online-only version doesn't have an adequate track record, while we know that 700 year-old paper and ink is still perfectly readable and available. The instutions that created the document have come and gone, the governments that have housed the document have risen and fallen. Yet there it is, for all to read in the library. It's great, once the publisher has recouped the investment required for publishing a low-circulation book with an archival-grade paper and binding, if it goes online (and increasingly that is happening to books that go out of print, but which still have a valid copyright). But there's something to be said for providing a financial incentive to get these published in the first place.

IanKelley

11:20 am on Nov 11, 2005 (gmt 0)

You make a good point egro, however there are a couple of things which just can't go unmentioned...

Publishing at one point did some serious damage to word of mouth storytelling and teaching. I suspect back then a lot of people didn't like it much. I guess we'll never know though, since they didn't write the history. :-D

Countless industries and small businesses have taken a hit as a result of the internet. Some have even died. More will.

I personally have mixed feelings about technology but we would be silly not to at least consider all of this from the perspective of evolution.

If an idea makes things easier and simpler for people while making someone money then there isn't much that can stop it from happening. The most you can hope for is to slow it down.

Or to put it another way, everything your publisher prints is eventually going to make it onto the net and it will end up being either free or very cheap. It's sad maybe, but inevitable nonetheless.

ergophobe

5:16 pm on Nov 11, 2005 (gmt 0)

Sure, technology will obsolete many businesses or force many people out. Wheelwrights were put out of business by tires. That happens. Most low-circulation books should be published electronically and that business should mostly disappear except as an art form, just as we no longer use oil painting simply to record memories.

The problem with scanning in the back catalog, though, isn't that it obsoletes someones business model, but it is that it essentially takes someone else's property by force and stealth. In other words, it doesn't make it impossible for the wheelwright to sell new wheels, it goes down to his shop, takes the old wheels without asking and then sells those without giving him a cut.

Reminds me of the famous excerpt from Carl Sandburg's poem "The People, Yes" (stanza 37, I believe):

"Get off this estate."
"What for?"
"Because it's mine."
"Where'd you get it?"
"From my father"
"Where'd he get it?"
"From his father."
"And where did he get it?"
"He fought for it."
"Well, I'll fight you for it."

In other words, as someone mentioned, if it is okay for a search engine to use my property in a way that I disapprove, why isn't it okay for me to break into the Googleplex and steal the secret sauce and then publish it. Of course, I would only show 3-4 line excerpts of the code. I'm sure they wouldn't be bothered.

It's not really about technology obsoleting a business model or a job, it's about property and what the meaning of intellectual property is. Is it my intellectual property, or isn't it? If it is, what rights to the SEs have within fair use?

Let's say this, if I took a book made up of quotations, scanned it in, and then make the quotations appear in search results, I have actually made something that is more like the way people will use a book of quotations. A superior product. But I have stolen someone's work. No two ways about it. Your analogy with storytelling says that non-searchable printed books of quotations will disappear in the face of superior technology. That's fine. However, it doesn't say that you have a right to steal my book and make it searchable. Big difference.

Matt Probert

6:24 pm on Nov 11, 2005 (gmt 0)

It's worth pointing out, perhaps, that The British Library holds a copy of EVERY BOOK EVER published in the UK. A few years ago they stopped collecting bus time tables and the like.

Trying to obtain a book from the BL is a bit like getting blood from a stone, quite often one has to make a request in person, and then wait two days or more while it is fetched.

Making old titles available online COULD be a great resource for researchers like myself. By old I mean 19th century and earlier, which are very hard to locate and even harder to use within the constraints of a reference library reading room.

I am, however, very nervous about anything that may bring about the demise in printed publications. I mourn Encyclopaedia Britannica going all electronic.

Matt

ergophobe

7:17 pm on Nov 11, 2005 (gmt 0)

I mourn Encyclopaedia Britannica going all electronic.

Why? I rather like it. I would probably never have owned it in the print version. I love that other works, like the Trésor de la Langue Française, which I could never afford, is now freely available online and available in CD-ROM for 69 euros (this was formerly over 1500 euros in print). I now also have Godefroy's Dictionnaire de l'Ancienne Langue Française on my hard drive (except for vol. 5, not available from the BN). Not only could I have enver afforded that one, but it was out of print.

In general, I think it's fantastic that public domain works are going online and non-public domain works are going online in accord with the publisher's and authors' wishes.

For all my complaint about putting copyrighted works online, I think that in the long term, getting materials online is a great thing. Most of the documents I work with exist in one place in the world with few or no copies. Increasingly these are being put online and suddenly historical research is freed from geography.

IanKelley

5:13 am on Nov 13, 2005 (gmt 0)

I agree that allowing unlimted viewing of excerpts is very much like copying and reprinting provided it's the kind of publication where relevant info is frequently going to fit into the excerpts.

That's not a deal breaker, though, that's just a bug. I'm sure it won't be long before you can only view a small number of excerpts from a given source (if not already).

Considering the energy that search engines have already put towards understanding text automatically I could even see them modifying the length of excerpts, or disabling them, for certain kinds of publications.

Searchable book indexes is a good idea that definitely still needs tweaking. Needing improvement and needing to be illegal are two very different things, however.

Or am I just being nostalgic? :-P

ergophobe

5:55 pm on Nov 13, 2005 (gmt 0)

I wouldn't say nostalgic. To give yet another analogy, let's say that you are testing a beta version of cruise control for you car. You set it to 95 k/h in a 100k/h zone, but it's not working right and suddenly, you have a ticket for going 120 k/h.

Does this mean that, inadvertently or not, the device you were using caused you to break the law? Yes, it does. It's your responsibility to monitor the speed of the vehicle. Legislators don't have to pass special laws saying that this is a zone where even drivers using cruise control need to drive within the speed limit.

Does the fact that you meant well by using the device and had actually tried to set it to be 5 k/h below the speed limit get you out of paying the fine? Not usually.

Does this mean that cruise control is an inherently bad technology that should never be implemented under any circumstances? Of course not.

Does it mean that it can't be considered truly acceptable until it can be built in such a way that it does conform to current law or until current law is changed to make its behavior legal? I would say yes. You would say no (or at least that's how I take your comments).

I think that last one is the only point where we disagree and the simplistic analogy makes the extent of our disagreement overly stark. I'm just saying that if you don't have excellent controls on the way copyrighted material is redistributed, you should do your beta test with public domain materials until you get it dialed. Assuming that Microsoft sticks to plan and starts with the old back catalog of 100% public domain works British Library, I'll give them a huge hurrah. If they pull a fast one like Google did, I'll write them off as "doing evil" for lack of a better phrase. If they get it dialed to where they can offer reasonable protection to copyrighted works that need it (the "consultation" books I've mentioned) and then start offering those up as well, I will also cheer.

I believe that in the long run, this will be the greatest boon to research since the birth of the public library. I just want to be sure that in the short run it isn't some crass commercial gambit that plays fast and loose with existing law.

IanKelley

7:54 am on Nov 14, 2005 (gmt 0)

Point well made, aside from:

You would say no (or at least that's how I take your comments).

I think the cruise control analogy becomes strained at this point :-) The lawsuits regarding book indexing that I'm aware of appear to want to completely stop the indexing of a large part of all current and future books. This will cripple the potential of the technology as opposed to smoothing the rough edges.

I'm also not sure yet that there was any evil intent. I think the evil doer in question made the mistake of rushing a project that was (rightly) on pause pending further review because the competition was using the pause to develop... competition.

MS, on the other hand, is conducting their test at a very safe depth. Good call on their part, but maybe that's the luxury of being second.

ergophobe

7:26 pm on Nov 14, 2005 (gmt 0)

Heh heh. As I said, the analogy was too simplistic, so I thought you might object to how I characterize your position, but I couldn't think of a better one.