It’s something of a dream for many: to digitize and make accessible the vast number of books, documents, artifacts, photos, videos, and other materials housed at thousands of different institutions across the country. The Digital Public Library of America is working on making it a reality. We explore a collaboration between libraries, museums, and archives – including the Library of Congress and the Smithsonian – that aims to put it all online.
- Maura Marx Director, Digital Public Library of America Secretariat, Berkman Center, Harvard University.
- Martin Kalfatovic Associate Director of Digital Services, Smithsonian Libraries; Co-chair of the Technical Workstream; Digital Public Library of America.
- Maria Pallante United States Register of Copyrights
The Smithsonian 3D Digitization team does field work on whale fossils in Chile:
Smithsonian’s Three-D Mummy Render:
MR. KOJO NNAMDIFrom WAMU 88.5 at American University in Washington, welcome to "The Kojo Nnamdi Show," connecting your neighborhood with the world. It's a dream many share. Imagine, all the material that exists in libraries, archives and museums across the country -- books, audio and video recordings, fossils, works of art, historical documents, all of available online. You could log on to a computer in Nebraska and read one of the millions of books in the collection of the New York Public Library or examine any of the fossils at the Smithsonian in 3-D. Most institutions have been digitizing their collections for years, but now they're working together.
MR. KOJO NNAMDIThe Digital Public Library of America launched just a year ago, but they're hoping to combine their efforts into a bigger project than any one effort. The challenge is huge from the technical issues to the minefield of copyright law, not to mention decisions about the content and scope of such a project. Joining us to discuss how they plan to make the Digital Public Library of America a reality are Maria Pallante, United States Register of Copyright. She joins us in studio. Maria Pallante, thank you so much for joining us.
MS. MARIA PALLANTEThank you, Kojo. My pleasure.
NNAMDIAlso with us is Martin Kalfatovic. He is the associate director of digital services at the Smithsonian Libraries and co-chair of the Technical Workstream for the Digital Public Library of America. Martin Kalfatovic, thank you for joining us.
MR. MARTIN KALFATOVICThank you, Kojo. Glad to be here.
NNAMDIAnd joining us from studios in Massachusetts is Maura Marx, director of the Digital Public Library of America Secretariat at the Berkman Center at Harvard University. Maura Marx, thank you for joining us.
MS. MAURA MARXHello, Kojo. Thanks for having me.
NNAMDIYou, too, can join this conversation. Just call us at 800-433-8850. Do you think all materials in public libraries and museums should be available online for free? 800-433-8850. You can go to our website, kojoshow.org. Join the conversation there. Send us a tweet @kojoshow or email to email@example.com. Maura Marx, I'll start with you. You wouldn't call the Digital Public Library of America an organization. Is it -- it's more of a vision, isn't it?
MARXIt's a vision. It's an idea. It's a project that seems to be turning into a movement. It's been gaining momentum since it started last year. And that would make, I think all of us involved in it incredibly happy. You know, there's so much that already exists on the ground that libraries have already done to digitize their collections and to work together. And we see this as a way to bring those efforts together and to complement them and to also just get together to rethink how libraries work in the digital age.
MARXYou had a great expansion of libraries in the late 1800s and it's that same kind of time now. Technology has moved along so much, there's so much more we can do with it. So, it's about getting together with librarians and technologists and economists and everyone to rethink how we work.
NNAMDIMartin, it's important for people to understand what the Digital Public Library is not. It's not a single website that will contain every digitized item in the world. Why is that not division? And why would it not even be practical?
KALFATOVICWell, the main reason is that there is sort of already portals into the whole Internet. You have things like Google, Bing, all of the other big search engines who can get you into that content. One of the things that libraries, archives and museums historically brought is a level of curation or adding to the content and just not throwing millions and millions of items out there for you to sort of rummage through.
KALFATOVICSo, one of the things we hope to do with the Digital Public Library of America is bring that sort of curation idea that libraries, archives and museums have brought to content over the years into the digital environment in a more proactive way than we've seen with big massive search tools.
NNAMDIMaura, it's called the Digital Public Library of America. Is it public, private, both?
MARXWell, the institutions involved in it are of every sort, both public and private. But the one tenet that's already been adopted is that access to it should always be free to all. And that comes very much from the public library side of things. That's the inscription carved in the door of the Boston Public Library says "Free to All." And that's a very public library sentiment, I guess you'd say.
NNAMDIMartin, this is a Herculean project, especially the actual digitizing of millions of documents and artifacts. The Smithsonian, like a lot of institutions, has been digitizing for years. How did you approach such a mammoth task and how much has been digitized so far?
KALFATOVICVery good question and one for which we don't really have a terribly good answer because the concept of digitization has changed over the course of year, of the last 10 years or so. And also, how you digitize things and what counts as digitization changes. So, for instance, for many of our museum objects, where we have, for instance, 2.2 million plus mosquito specimens at the Smithsonian.
KALFATOVICDo we really need to digitize all of those specimens in an image format or is it actually digitizing the metadata, the descriptions of those mosquitoes good enough for access purposes. So in that sense, we have levels of digitization at the Smithsonian that we count from simply digitizing that descriptive metadata about the object to some of our more complex things where we'll do full three-dimensional digitization.
NNAMDII was about to say, I don't have to see all 2.2 million mosquito specimens. But the Smithsonian does contain objects and artifacts that are not books or not documents. So how do you digitize those? I guess you were just explaining you're going to use 3-D imaging.
KALFATOVICRight. So we have all types of digitization mechanisms from a basic book-type scanners that we use, where we've currently digitized over four million pages of books from the Smithsonian collections to types of imaging where you will digitize drawers of specimens and a camera sort of flows over the tops of those to actually digitize the specimens.
KALFATOVICAnd there's a special tool called an herb scan which scans botanical specimens by turning the scanner upside down and shooting down on top of the botanical specimens up to our very state-of-the-art three-dimensional scanning tools, which will rotate the images and scan things from very tiny up to very large scale objects like the space shuttle.
NNAMDIAnd speaking of the 3-D images, we've posted a video on our website, kojoshow.org, that you can go there to look at, the reconstruction of a skull. Martin, one important thing digitization makes possible is the ability to enrich a document or artifact. Can you explain?
KALFATOVICOne of the things we found with digitization over the last few years in just actually just presenting the object, whether it's a book or a fossil is only part of the story. So even with our books, what we like to do is provide the ability to either annotate that book, to add additional information that you, as a consumer or reader of that object, will do. So, again, adding in that whole social networking angle so that you can share those digitization or digital projects with your friends and others up to actually us through crowd sourcing to correct the text, add annotations, to explain things that we don't know about those objects ourselves are one of the key things in providing this stuff out there to the world is.
NNAMDIMaura Marx, it can also mean putting information into context and linking to related data kind it not?
MARXWe talk a lot these days about link to open data and that's really just a way of enriching the data that describes our objects, to put it in context so that, you know, if you think about when you go to a search engine you search for something and you get back a very long list. And you start clicking through things to see which one suits or which many of them suit. And if you think of link to open data, it's just a way of connecting things and people and events to each other to help that context be enriched.
NNAMDIMaura Marx is the director of the Digital Public Library of America Secretariat at the Berkman Center at Harvard University. She joins us for a conversation on the Digital Public Library of America with Martin Kalfatovic. He is the associate director of digital services at the Smithsonian Libraries and the co-chair of the Technical Workstream for the Digital Public Library of Media.
NNAMDIMaria Pallante is the United States Register of Copyrights. You can join the conversation at 800-433-8850. You can also send email to firstname.lastname@example.org. Or send us a tweet @kojoshow. What documents, artifacts, images or books would you like to have access to online? 800-433-8850. Maria, you heard Maura talk about the inscriptions on the Boston Library -- Public Library above the door, Free to All.
NNAMDIThere are (unintelligible) the public library of the city of Boston, built by the people and dedicated to the advancement of learning and much more. But free to all, Maria, is an inspiring concept, but it does bring us to one of the, I guess, bigger challenges of a project like this where free access to published work runs into copyright laws. What are some of the challenges there?
PALLANTEYeah. Well, that's the million dollar question, right?
PALLANTESo, and I'll just say is a preface, Kojo, that I run the United States Copyright Office. But for those that don't know the United States Copyright Office is part of the Library of Congress and has been since 1870. So, I think it's fair to say that you won't find a group of people more respectful of the link between creativity and copyright law than the few thousand people that work at the Library of Congress. So, what are the copyright issues? Well, it depends on what the question is.
PALLANTEWhat's the point of the digitization? Is it for preservation purposes? To make sure that the fragile original manuscripts, you know, are not lost for future generations or are we jumping straight to access across the board? And so then the question becomes, we clearly all love books and creativity and songs and photos and movies, but that would presuppose that we also love authors and songwriters and publishers...
NNAMDIAnd would like to see them be able to continue to earn a living or to earn some money.
PALLANTEThat's right. That's it.
NNAMDIAnd that's why we can be very conflicted about these things. Maura, what are some of the issues with copyrights law as you see it for the Digital Public Library?
MARXWell, I was going to say, speaking from Cambridge, Massachusetts, we feel exactly the same way. We want to keep supporting all those creative people who are enriching our lives with their works. So, when I say free to all, I do not mean a free for all. You know, I do not mean that we would like to put content up and have anyone be able to take all of it without making sure that people get paid.
MARXBut even if there is a pay wall that users don't see, we think that access should be along the lines of what happens in your public library today. Anyone can go in and take out a book and read it and we have to be sure that we don't only allow commercial publishers and the market to dictate how this happens in the future. You know, public libraries have played a very, very important role in our democratic society.
MARXWe have to make informed decisions. That means that even people who might not have money to go out and buy all kinds of materials have the right in our society to have access to information through their public libraries. So we have to make sure that that doesn't get totally chipped away by market forces.
NNAMDIGet to the phones -- I'll get to the phones in a second. But, Maria, the Digital Public Library is not the only project working on mass digitization. One of the biggest cases came up earlier this year. Remind us about the Google Books case, please.
PALLANTEWell, that's a case that's still ongoing. And a second kind of follow-up case was filed against the Hathi Trust. So, generally speaking, the Google case involved Google partnering with libraries to scan millions and millions of books. And to then perhaps make snippets available while you on the Internet are looking for something, a snippet would come up. And the question is, is that fair use?
PALLANTEWell, the authors and publishers who sued said, that might be fair use but we didn't think scanning the entire book in the first place was fair use. So we have a disagreement there. That's an unresolved question. Is systematically scanning entire libraries fair use if the end use of that scan is fair use? Unresolved.
PALLANTEAnd there was a settlement attempt at that -- for that case. And that was a case that the United States government entered and the department of justice filed a couple of briefs. And to make a long story short, the judge said, you know, there's a lot of really interesting innovative stuff in this attempt at settlement but it does seem to turn copyright upside down in that normally one would go to a copyright owner and say, may I make this use and how do I negotiate the fee for that if a fee is warranted?
PALLANTEAnd what you've set up is a system where you scan and you begin to make available, then people come to you and say stop or I'm okay with it. It's a bit of a reverse trend. If you're going to do something like that, that's really a congressional process. That's a legislation.
NNAMDIAnd you, Martin?
KALFATOVICOne of the keys things that we have found in some of our projects that we have worked on in the Smithsonian is working on a project called The Biodiversity Heritage Library Project along with 13 other natural history libraries. And what we've done, is we've actually done what Maria just said which is go to the copyright holders for material that is in copyright and receive permission to digitize that and make it available. And we have had good success with a lot -- a number of scientific communities to actually get that permission and work with them jointly to expand the access to that material.
NNAMDIThat's a lot of work.
KALFATOVICIt -- that is one of the very hard parts about, is it is a one on -- one by one basis.
NNAMDIMaria, how does the Google court decision relate to projects like The Digital Public Library?
PALLANTEWell, in a number of ways. So one question is, as we were beginning to think about how to update exceptions for libraries, right. So you have exclusive rights under copyright, then you have exceptions for the public interest, one of which pertains to libraries, it's called the section 108 exception. We all agree, the copyright office, the stake holders, the libraries, the publishers, that needs to be updated for the digital age so libraries can do a little more than they've been able to do with print materials.
PALLANTEAs we were working toward the legislative proposal on that, this case kind of landed squarely in the middle of that dialogue and so the legitimate question for libraries is, wait a minute, I don’t want to agree to anything if ultimately Google is going to be able to do more than we are. So how does one reconcile that? That's one big issue. But the more interesting issues are, how do we move forward with our copyright act? How do we keep it updated and relevant?
PALLANTESo we've just issued, in the U.S. Copyright Office, a report called Legal Issues and Mass Digitization. It's on our website, copyright.gov, and we go through these. So what are the choices? Well, some of it might be fair use. Fair use is an exception. It's based in first amendment principals that the press and others should have the ability to not have to go to copyright owners every time they need to make use of something because it relates to having an informed citizenry.
PALLANTEThere are other applications of fair use that might affect libraries and educational institutions. Then we have preservation exceptions, then we have orphan works which is a hot topic.
PALLANTECan't find the owner and what do you do? You're a good faith user, you want to find the owner but you just can't. Why should the whole system grind to a halt? We have proposals for that. But then the piece that kind of falls off in the dialogues, sometimes not always, is licensing. What -- should libraries license? Libraries in this country buy subscriptions and they buy limited of numbers of books and other works to lend out, they're part of the market place in that regard.
PALLANTEBut they don't necessarily buy a novel and then by licensing rights to reproduce it electronically, so that rather than lending five or 10 copies to patrons, which is a -- not a big impact on a market, they're sending -- lending hundreds or thousands. So that's where it begins to feel like, how are libraries going to intersect with the rights of authors and publishers rights? That's the fundamental question.
PALLANTEHaven't solved that yet.
NNAMDI...Maura, it's my understanding that the Google case was, well, something of an inspiration for the digital public library.
MARXIt was. Can I just hit on one thing that Maria said and then I'd be happy to go back to Google? But in talking about libraries, you know, lending out infinite copies of ebooks, I think there's other models emerging where you see libraries buying a physical book, having a digital copy of that book and then lending out a copy of -- a digital copy of the book, one at a time. So with a correlation to the physical book. And there's ways to do this so that our users can get the books the way they want. You know, if kids today are really reading online, they're not really going to be happy with the choice of only having a physical book.
MARXBut back to the Google discussion. It absolutely inspired us. Google, you know, came out with an announcement at the end of 2004 that they would digitize entire libraries and up to that time, you know, people had been digitizing but never at that scale. So they opened eyes to what could be done, you know, to the enormity of the work that could be done quickly.
MARXBut they also didn't do it exactly the way we would do it. We would've done it with more of a public mission, we would've done it perhaps with different richer data and so on. So it's a little bit of an inspiration where you could say, wow, thanks for getting us moving, but now we'd like to create our ideal over here.
NNAMDIAnd you mentioned earlier that this was not intended to be a free for all. It's my understanding...
NNAMDI...that you're looking for ways to build a pay wall.
MARXWell, build a pay wall. You know, libraries already spend enormous amounts of money on acquiring content. And...
NNAMDILicense and content.
MARX...the licensing and acquiring content, both.
MARXAnd making that content available to users for free. So you know, we'd like to be able to do the same thing in a digital environment.
NNAMDIGot to take a short break. When we come back, we will continue this conversation on the Digital Public Library of America. Inviting your calls at 800-433-8850. Do you think digitizing all of the materials in libraries, archives and museums is a good use of resources? Call us at 800-433-8850, go to our website kojoshow.org or send us a tweet @kojoshow. I'm Kojo Nnamdi.
NNAMDIWelcome back. We're having a conversation on the Digital Public Library of America with Maria Pallante, the United States Register of Copyrights. Martin Kalfatovic, associate director of Digital Services at the Smithsonian Libraries and the co-chair of the Technical Workstream for the Digital Public Library of America. They both join us in our Washington studio. Joining us from studios in Cambridge, Mass. is Maura Marx, director of the Digital Public Library of America Secretariat at the Berkman Center at Harvard University. And as I mentioned earlier we're taking your calls at 800-433-8850. So I'd better start taking some. Here is Elizabeth in Woodbridge, Va. Elizabeth, you're on the air. Go ahead, please.
ELIZABETHHi, I'm student at George Mason and a school bus driver and the access I get to George Mason, the library and the e-journal, is amazing for research and I would like to see that kind of system, even though I know it's really expensive, a lot of my tuition goes to it, that kind of system available for the regular population.
NNAMDIWhen you say that kind of system, what do you mean?
ELIZABETHLike, when I went online yesterday, I'm doing research on year-around schools and I have access to journals from the, like, American Psychological Foundation and all kinds of education journals and just experts in any field I have at my fingertips. I can click a few buttons and the databases just go right through it and I can print it off at home and use it for my papers. I wouldn't know what I'd do without it. I really don't know how a regular library with just access to books would be very difficult for me to do the kind of research my professors expect. And I think if the regular public had access to that kind of research and those kinds of thinkers, it could really open doors for innovation.
NNAMDIMaura Marx, is that what we're trying to do here?
MARXI was going to say, amen, you know, there's so much content, rich content, that's held in the academy, in our research libraries and this effort actually started in the research libraries out of -- it was born out of a desire to push it out and to create a public good. You know, how we can share this beyond the walls of academia with all people who seek knowledge. You know, there are 9,000 public libraries in the country, something like that, and many of them are serving very small communities. They simply don't have these types of resources. So like I say, amen.
NNAMDIElizabeth, thank you very much for your call. Maria, there are about 2 million books in the public domain, such as the works of William Shakespeare. There are presumably no restrictions on digitizing and making those available?
PALLANTEThat's right. And those who have a public vision like the National Archives, the Smithsonian, Martin's organization, mine, the Library of Congress, have for, you know, 20 years now, been doing that, digitizing public domain works. Some may that's a conservative approach but it's the legally appropriate approach until the law allows you to do otherwise or until there are licensing models in place. So one question is, at what point does the public domain start? So for those who are not copyright experts, we can generally say anything before 1923 is in the public domain, in the United States. Well, that's a bit of an incomplete digital library if you stop in 1923. I don't think there's anybody that doesn't recognize that.
NNAMDIA bit, yes. Here is Charlie in Silver Spring and I suspect this is for you, Martin. Charlie, you're on the air, go ahead please.
CHARLIEWell, good afternoon, thank you for taking my call. I just wanted to ask the participants about the accessibility of the digitized image to persons who are blind and unable to read the classic print without some sort of speech reader?
KALFATOVICA very important issue that we have been addressing with digital libraries over the course of the years, a number of different tools and techniques are now available. One of the important things that we do, pretty much with all our book conversions now, is we do optical character recognition, which actually converts those page images into machine-readable text. One of the problems, of course, is the quality of that OCR, optical character recognition, varies greatly depending on the age of the text itself, the type of type font that was used. So we get varying levels of quality.
KALFATOVICTo improve on that, what we've been doing lately at the Smithsonian is we've been converting all of those into e-pub format, the types of things that you can read on a Kindle Reader or any of the other types of e-readers that are out there. This is an added cost and perhaps may not be fully scalable to the millions of items that we will be doing as part of these projects but it's sort of a start to help make those things available more widely.
CHARLIEWell, thank you. I appreciate that.
NNAMDIAnd Charlie, thank you very much for your call. You too can call us at 800-433-8850. What documents, artifacts, images or books would you like to have access to online? You can also go to our website, kojoshow.org, ask a question or make a comment there. We got an email from Constance in Silver Spring, which I would like to hear both you, Maura, and you, Maria, address, but Martin, feel free to jump in and the email is fairly long but allow me to read it because I think it's fairly important. Constance in Silver Spring writes, "A free digital library sounds lovely, but it's bad news for authors."
NNAMDI"Authors only get paid if they're books are bought and libraries sales are an important source of revenue. With a nationwide digital library, very few books will be sold. In fact, the digital library only needs one copy and potential customers won't have to buy the book at all. They'll just "tune in" to the digital library. If authors can't sell books they can't afford to write them. Very few authors can afford to work for nothing. Most have day jobs anyway."
NNAMDI"In the long run, totally free access to books means that there will be fewer books and probably fewer books by your favorite author, who will be working nights at McDonald's or minding the children rather than writing. Most readers don't see this reality behind the copyright law. I've heard of new models of paying authors but none have shown up yet and in the meantime authors are working extra jobs and getting paid less if they get paid at all." First you, Maura Marx?
MARXI wish I could be clearer. We are not advocating buying one copy of a digital book and loaning it infinite amounts of time to people to totally ruin the publishing market. We in no way want that to happen. We simply want to be able to find a way while authors are being paid to give our users the type of access they want. And digital book access is, you know, people want their books on their Kindles so we're trying to figure out ways to do that, while paying authors. Yes, I can't be any clearer than that. We're just not advocating for giving content away.
PALLANTEYes, so that's a fantastic question. I would just say it this way that the future of any digital public library of America would have to be a collaboration with rights' holders. And I think that they've actually have tried to do that, include publishers and authors and those no question, that if we're moving towards revisions and copyright law, that that will not without rights' holders like authors and publishers and songwriters and filmmakers being at the table. So whether there are new business models or not isn't really the question. It's will there be new business models for paying authors that are instituted without authors wanting them to be instituted, right? That's the question.
NNAMDIAnd here's Andrew, in Washington D.C. Andrew, you're on the air, go ahead please.
ANDREWHi, Kojo. I understand a lot of the different views and everything that a lot of people go through and I understand the promise of a library meaning that you don't just purchase one copy and basically share it with everybody on the planet. But when you started this conversation, you know, 38 minutes ago, the first thought that went through my head was, are people up in arms for written works being published online for free? And then the first thing that went through my mind was, don't we have that already with the advent pirating?
NNAMDIThis is a discussion we had earlier this week on piracy when we received our first email from Constance about what authors need and I didn't get a chance to read it then, so I was really glad to get the chance to read today. But is that, in fact, the reality? That all of these things are already available online for free, Maria, or is that merely a perception and probably not an accurate -- necessarily accurate perception?
PALLANTEWell, there's no question that piracy's a massive issue.
NNAMDIHuge, multibillions or, like trillions.
PALLANTEAnd there's also -- and there's no question that people would not group libraries in the category of pirates but what happens is that it's the lack of control once a work has been digitized that is the fear. And so we can talk about a pay-wall. I'm not sure exactly what it means but let's just all assume we can work out and will continue to work out new business models for payment. Copyright is more that, though. copyright is not a right of remuneration where a user of a work decides what a pay schedule should look like.
PALLANTEIt's the right of the author to control distribution of their work, dissemination, public access and its life plus 70, right? Life of the author, plus 70 years. There are exceptions, for the public interest, and those are narrowly tailored and appropriate to circumstances. Is it for educational use? Is it for libraries on site? Is it for digital copies, in a limited number that would not affect the marketplace? So that's how copyright works.
NNAMDIAnd your turn, Martin?
KALFATOVICAnd I also think what we've learned from the music industry is that if you actually give people a fair and easy way to purchase electronic copies or borrow them or rent them, that they will actually flock to that and they will avoid piracy because most people generally are honest and will seek out those networks.
NNAMDIOne of the discussions we had earlier this week is that on one hand you could pay $.99 for a tune, on the other hand you can't buy a chapter of a book.
KALFATOVICYou -- but increasingly I think you'll be able to do that and even Amazon is now offering the Amazon singles, Kindle singles, where you can get sort of short books. I think we'll be able to see that model grow and I think Digital Public Library of America can help encourage that type of activity by publishers and authors.
MARXI just would applaud what Martin said, you know, I think he said it well.
NNAMDIWell, we have another fairly long email from Cecilia so bear with me. I am an archivist, currently undertaking a large-scale, grant-funded digitization project. I think many archivists actually doing digitization would argue that the idea of digitizing everything, everywhere is a great idea but it just doesn't happen. Digitization requires one thing above all, money, lots of it, for equipment, training, project planning and skilled workers to oversee the digitization. Digitization takes time, especially if you're end-goal is high-quality scans with meaningful metadata attached to the digital object.
NNAMDIProviding access to digitized content is certainly where the archives field is heading, but there's only so much we can do without funding. Cultural institutions are already struggling to fit their dreams within their recession budgets. I would love to hear a conversation about how we as archivists can ensure income to support the digitization of our collections." Martin, where's the money coming from?
KALFATOVICWell, in our example from the Digital Public Library of America, we have had the generous support of the Sloan Foundation and the Acadia Foundation to provide us with some initial start-up funds. Many of our institutions have had long relationships with other funders to provide that. There's been some government funding through the Institute of Museum and Library Services, and also the National Endowments.
KALFATOVICAnd I think one of the key things, and I know Maura and I have been involved in a number of these projects, is how to actually make the scanning get less expensive. One of our colleagues, Emily Gore, has brought up the notion of the idea of a Scannabego (sp?) and what could do to actually take scanning equipment and make it more mobile so that we could actually put scanning equipment in a vehicle, a Scannabego, roll that out to small historical societies around the country, and use staff appropriately, volunteers, and actually get some of that rich content that's out there in these smaller historical societies around the country out there in a way that's appropriate for those collections to be used.
NNAMDIIt takes filthy lucre, Maura Marx.
MARXWell, this is a big part of what they DPLA would like to do. It would like to compliment the work that's going on in all types of institutions and like Martin said, there are some new interesting things around mobile digitization. We've worked here in Massachusetts on what we call the Massachusetts model which is using a fixed but regional digitization center to serve the entire state. So we recognize that not every institution is gonna be able to build this infrastructure, and that's a key part of what DPLA would like to help with.
NNAMDIGot to take a short break. If you have called, stay on the line, we will get to your call. The number is 800-433-8850. If you'd like the join the conversation now, what documents, artifacts, images, or books would you like to have access to online? You can also go to our website kojoshow.org, or send email to email@example.com. I'm Kojo Nnamdi.
NNAMDIWelcome back. We're having a conversation about what's been described as a movement, the Digital Public Library of America with Maura Marx, director of the Digital Public Library of America Secretariat at the Berkman Center at Harvard University, Maria Pallante, United States Register of Copyrights, and Martin Kalfatovic, associate director of digital services at the Smithsonian Libraries, and the co-chair of the Technical Work Stream for the Digital Public Library of America. We'll go to Melissa in Gaithersburg, Md. on the phone. Melissa, you're on the air. Go ahead, please.
MELISSAYeah. I'm curious about future access, because while you can open up a book that's, you know, 500 years old and read it, trying to read a 5 1/4 inch floppy disk is a little bit difficult now, so I'm wondering how you're gonna allow for the changes in software and hardware and still keep access to the books.
KALFATOVICThat's a very good question, one of the things we are in the Digital Public Library of American is hewing to as many standards as possible so that we can have that type of forward transition of the content. One of the sort of things people don't really know about what libraries have done, is that libraries first started digitizing the metadata, the descriptions, the catalog records about libraries, over 40 years ago, and that data is still fully readable because we are using very well-known standards.
KALFATOVICWe continue to update the standards. We migrated that content over a period of time. So again, I think the libraries are good, libraries and archives and museums are a good community to deal with this. We're long-term institutions, memory institutions is sometimes used to collectively describe us, so we're concerned more about the future than certain other types of institutions. So all of our plans look forward to the future and think about those things of migration of the data, transformation of the data so that it will be readable by our great-grandchildren.
NNAMDIMelissa, thank you very much for your call. Here now is Ross in Annandale, Va. Ross, your turn.
ROSS...for taking my call. I have a question about (word?) ebooks. I have an ereader, and I was really excited when I heard there were so many classic books available. I downloaded lots of them, but when I tried to read them, I guess they were scanned with the OCR technology. But I was disappointed, because when I tried to read them, all of them that I read were filled with mistakes. So I guess this is more of a comment, but I guess my question is, is there any kind of quality control for books in the public domain, because I was really kind of disappointed that these, you know, you know, really classic books were really -- they were just filled with mistakes and unreadable. In fact, every one I downloaded was just not readable, it had so many mistakes in them.
KALFATOVICI'm on the same page with you with that. It's incredibly disappointing when you do download something from the free ebook content world and find all the different errors in it. The one key thing to remember when doing that is to look at the source of those files. You'll often find on Amazon or any of the other online book stores, Barnes & Noble, multiple versions of those free books, and some of the people that have done that conversions of those do a better job than others, pay more attention to the quality of the conversion, check for spelling errors that have occurred in the OCR. So don't be too disappointed.
KALFATOVICGo back and look for another free version of that book, or even go and pay .99 cents or $1.99 for a copy of that book that might have actually be fully edited. On Amazon, Barnes & Noble, you can find those classic books that have been edited, and they're still at a very low price. So again, it's sometimes you get what you pay for.
MARXAnd also, I'm just gonna jump in there and say this is one of the things that libraries have gotten kudos for, that we've actually done our scanning in a little more of a careful way. Google was known for speed, and perhaps not so much for accuracy. So...
PALLANTEAnd if I could also add...
PALLANTE...at the Library of Congress there's a program called NDIIPP which it's the National Digital Information Infrastructure and Preservation Program, but it's completely devoted to this issue. So how do you preserve things for posterity when they're digital?
NNAMDIRoss, thank you very much for your call. Maura, it's my understanding that some libraries will copyright what they're cataloging and digitizing. How is that possible?
MARXNo. I think what we're talking about there is the bibliographic data...
MARX...that we all use to describe our works.
MARXAnd, you know, if that data, which is a representation of fact is not copyrighted by the institutions that create it, then it can flow out into the Internet ecosphere and people can find it more readily, whereas if that data is copyrighted, then of course it's harder to push out into the wild Internet for people to find.
NNAMDIAnd Maura, Maria mentioned earlier Orphan Works. It's my understanding that it's quite expensive and time consuming to locate these lost authors or their descendants. How do you handle them?
MARXWell, you know, we're not actually handling anything right now. Luckily we're still in a planning phase, but part of our planning is a legal work stream which is run by Professor Pam Samuelson at Berkeley, and she's got a whole team of people investigating this area looking at the best solutions to the Orphan Works dilemma. There are projects underway. The University of Michigan is painstakingly going through and clearing orphans, you know, checking -- going through a meticulous process to look for creators.
NNAMDIAnd then as a clarification, we got a comment posted on our website from Shan who said, "Earlier in the program, one of the guests seemed to imply that the public will not have direct access to the Digital Public Library of America, but that instead the content will be curated by search engines such as Google and Bing." Does this mean that direct access will be restricted to such companies and programs, Martin?
KALFATOVICNo. Actually, I meant the exact opposite of that.
NNAMDII thought so.
KALFATOVICAnd that Google and Bing are a good entryway into the wilds of the Internet, but one of the things that the Digital Public Library of America will do is actually help those search engines work better or you, the public, can actually go right into Digital Public Library of America content through many different means.
NNAMDIHere is Carlton in Falls Church, Va. Carlton, your turn.
CARLTONThank you. So my big question is, is how do you relate to Project Gutenberg and existing projects out there for scanning books and those types of things.
NNAMDIProject Gutenberg, for those of you who don't know, is a volunteer effort to digitize and archive cultural works to encourage the creation and distribution of ebooks. It was founded in 1971 by Michael Hart, and is the oldest digital library. Maura Marx, can you answer Carlton's question?
MARXI'll give a short answer, and then I'll pass it over to Martin. Project Gutenberg is another great open source of content. You know, there are many of these that exist online, but today you have to as a user go to each individual place to look for stuff, and we're trying to make that process easier for the end user.
MARXI don't know if you have anything to add, Martin.
KALFATOVICYes. Michael Hart was one of the great visionaries in the world of digital libraries, and the work that he did, and again, going back to Ross's question, the Project Gutenberg is often a good source for those epubs that are really done well, and again, integrating that content, making it easier to find, will be one of the jobs that will be done by the Digital Public Library of America.
NNAMDIMaria Pallante, you have a comment?
PALLANTEI think Project Gutenberg is a fantastic example, but my understanding is that it's primarily public domain materials.
KALFATOVICCorrect. It's all public domain.
NNAMDIMaria, I don't know if you mentioned collective licensing earlier, did you?
PALLANTEYeah. Collective licensing is one of those areas of copyright reform discussions. So going forward, if one can't clear rights to many, many works, because it's just too time consuming, not cost effective, particularly for non-profits like libraries, is there some other kind of licensing where maybe you do a collective model or a blanket licensing where many parties are involved are rights are cleared through one agency.
NNAMDIHere is Mike in Silver Spring, Md. Mike, you're on the air. Go ahead, please.
MIKEGood afternoon, Kojo, and welcome to your panel. I'm wondering if anybody on the panel has read Lewis Hyde's book "Common as Air," in which he deals with this problem of copyright and the public domain and the commons that was the original idea of the founding fathers in writing the patent law. My understanding was that the idea that Benjamin Franklin had was that it was to promulgate the greatest good, the greatest benefit for mankind, and to that extent, he did not patent any of his inventions, and just donated them to the world at large, and that copyright in general was considered a way of controlling information in the times of the kings of England, that they only licensed certain publishers to publish books.
MIKESo it's kind of interesting. It seems like for instance you're not the -- you mentioned 1923 is the cutoff date. You really don't see too many image of Mickey Mouse around, do you?
NNAMDIWell, we had a discussion on that before also, but I'd like to ask all of our panelists to weigh on this, starting with you, Maria Pallante.
PALLANTEWell, I think you're going back so far that we have to remember if one controls the only printing press in town, one will have more control of the press. So today we're in a different place. You know, I would just say that, you know, the counterbalance here is that the Supreme Court has also said that copyright is the engine of free expression, right? That's a Supreme Court holding from a very famous case, and it's true.
PALLANTEI mean, we can't lose the connection between copyright being an incentive for creativity, especially for very talented people that we all respect to spend their lives and dedicate their careers to creativity. I love that we live in a country that has the kind of legal structure that allows that to happen, and the kind of copyright law that we have. So we all want a strong copyright law. We also all want more books, more songs, we want them faster, we want them when we want them and how we want them, but we don't want to, you know, somehow screw up authorship in the process, right?
NNAMDIWell, I'll let you speak on behalf of everybody else on that because Maura, we're running out of time, but there's an international component to the Digital Public Library now. You're collaborating with the European version. Tell us about that.
MARXI would love to just say copyright is, you know, it has to strike a balance between the public good and providing that incentive for authors to keep creating, so life plus 70 is a very long time for all works, perhaps it works for "Harry Potter," but it might not work for all...
NNAMDIWell, I also have to tell...
NNAMDI...our caller about Constance's email, but he might have heard me read that also. But yes, collaborating with the European version.
MARXWe're so excited about this. There is a Pan European digital library, it's called Europeana, and it's been around and working for about four years already, so it takes material from all of the national libraries in Europe and brings it together in a multilingual, you know, site, resource for all, and it's free to all. And we have signed a memo with them that we're going to be working together, we're gonna be working on things like data standards, and also just on some very practical things like a collection of immigration around the theme of immigration and emigration. So we're thrilled about that.
NNAMDIWhat can we learn from Europeana's experience so far?
MARXThey've just been wonderful in sharing all of their experience with us. They've shared their...
NNAMDIMistakes we should avoid making?
MARXYes. You know, they've told us they didn't start out with a cry for open data, so they have some licensed data in their library which makes it hard for them to push that data out. They've shared these types of things, and they're sharing their code with us. They're sharing their, you know, their best thinkers, so we're thrilled about that.
NNAMDIMaura Marx is the director of the Digital Public Library of America Secretariat at the Berkman Center at Harvard University. Maura Marx, thank you so much for joining us.
MARXThank you, Kojo. It's been wonderful.
NNAMDIMaria Pallante is the United States Register of Copyrights. Thank you for joining us.
PALLANTEMy pleasure, Kojo.
NNAMDIAnd Martin Kalfatovic is the associate director of digital services at the Smithsonian Libraries, and co-chair of the Technical Work Stream for the Digital Public Library of America. Martin, thank you for joining us.
NNAMDI"The Kojo Nnamdi Show" is produced by Brendan Sweeney, Michael Martinez, Ingalisa Schrobsdorff and Tayla Burnie, with assistance from Kathy Goldgeier and Elizabeth Weinstein. The managing producer is Diane Vogel. Our engineer is Timmy Olmstead. A.C. Valdez is on the phones. Podcasts of all shows, audio archives, CDs and free transcripts are available at our website kojoshow.org. We encourage you to share questions or comments with us by emailing firstname.lastname@example.org, by joining us on Facebook, or by tweeting @kojoshow. And thank you all for listening. I'm Kojo Nnamdi.