Although the Internet continues to expand its reach at a dizzying pace, the network of networks is still more an information swamp than a superhighway. Most of what’s there is junk, and precious nuggets are too often too difficult to find. So last July, when the Vatican made 20,000 images from rare documents available on-line to scholars around the world, it might have been a small step for the Apostolic Library--which holds more than 150,000 ancient volumes, or millions of pages--but it was a respectable leap for the Internet. The Vatican collection is a fair-size digital library by Internet standards, says Fred Mintzer, a computer scientist at IBM, a partner in the project. Right now, there isn’t a lot out there.
The Vatican’s collection is only the beginning of what may soon be a flood of new digital libraries available on the Internet. Of course, these won’t resemble your local library in any obvious way. For one thing, you won’t go to digital libraries, they will come to you--or rather to your personal computer. And if researchers succeed in their designs, the proliferation of cyberlibraries we are beginning to see now may one day seem more like one vast, fluid library without walls that encompasses information from the four corners of the Earth and from any medium--text, facsimile, film, video--you can think of. You won’t talk about a digital library, but the digital library, says Ramana Rao, a computer scientist at Xerox’s Palo Alto Research Center in California.
The idea of the digital library is fast becoming an even broader vision of the future of cyberspace in general. To a great extent, this trend is due to the phenomenal success of the World Wide Web in the past two years. The Web is a set of rules, protocols, and software that gives an Internet user access to information on a multitude of other computers. It is not the first technology to do so, but it is by far the most fun. Whereas other tools relied on text, the Web is a lively visual medium that allows for color graphics and even motion pictures and sound. Last spring, according to a National Science Foundation (NSF) audit, the Web eclipsed other search tools as the most widely used means of exploring the Internet.
Naturally, the Web is already becoming a way of selling things rather than learning things--a dark and no doubt inevitable trend that only makes sites like that of the Vatican library, one of the first serious scholarly collections to take advantage of the Web’s visual capabilities, more welcome. Mintzer and other IBM scientists had to develop new techniques for photographing and reproducing the ornately illustrated pages of ancient bibles and other documents. To minimize damage to the documents, they used filters to remove ultraviolet light and heat from their light source. To reproduce the colors accurately, they used filters that compensate for the redness of the light and the tendency of the electronic camera to emphasize reds and greens. The difference in color between what you see on your computer screen and what you see by going to Rome and begging admission to the library, according to IBM, is almost indistinguishable--provided your screen has been properly calibrated. To retain control over copyright of the documents, though, the Vatican had IBM devise a method of putting a shadowy watermark over the images. Even so, the Vatican is so concerned over copyright that for the time being it is restricting its digital collection to scholars. (At IBM’s site, however, you can see more than the two screen shots DISCOVER was allowed to reproduce.)
The Vatican’s is not the only digital library around. The Library of Congress began a cautious effort in 1993 to put its holdings on-line. It has continued to roll out new services, most recently offering up-to-date information about new legislation on the Web (http://thomas.loc.gov/). In 1993 the British Library started converting pages from the eleventh-century manuscript of Beowulf, making corrections electronically to portions of the text that had deteriorated with use and been damaged by fire. And the movement has gotten a boost from a $24.4 million research program funded by the NSF, NASA, and the Department of Defense at six U.S. universities to develop new technologies for digital libraries. This effort has stirred up the pot and gotten a lot of activity going, says Rao, a participant in the project. Now everybody is talking about digital libraries.
The biggest challenge in building one, researchers say, is automating the librarian. Technical questions such as compressing the data have been dealt with, says Toni Bearman, dean of the School of Library and Information Science at the University of Pittsburgh. It’s the ability to search through the information and get what you need that’s the more difficult part. The question is complicated when you consider having to search not only through a single library but a whole string of them. Hector Garcia-Molina, who heads a project at Stanford, is taking the first step in cracking this problem. He is devising a common umbrella language that will take a single request for information and break it down into many different requests that designated libraries on the Internet can understand. The next step is to come up with more powerful ways of doing the search itself. Current methods, based mostly on key words, are woefully inadequate for a large-scale digital library. We can search documents now for certain words or patterns, but what you get is often not what you want, says Garcia- Molina. When you scale up to a large network, these methods just won’t work. What’s needed are more intelligent ways of searching, such as using software agents imbued with some form of artificial intelligence.
When you’re dealing with visual information, the problem gets even tougher. Howard Wactlar, a computer scientist and the head of a digital library project at Carnegie Mellon in Pittsburgh, is designing a library prototype for video as well as text. The key, he says, is to massage the video data as they’re stored so that they’re easy to search through later. First he generates transcripts of video clips using a computer program that converts spoken language to written form, and then he makes corrections in the transcript with another program that can distinguish an English sentence from gibberish. He is also devising software that can automatically divide the video clips into video paragraphs, or logically coherent chunks. That way, key word searches on the transcripts can be used to assemble relevant video paragraphs. In addition, he is developing software that can recognize the objects in a frame, so that you could search for video clips of objects similar to, say, the Eiffel Tower merely by pointing to it on the screen. Using these techniques, it takes Wactlar more than 12 hours to process a one-hour video but only seconds to search through his library prototype.
At present, only a handful of libraries have actually made their contents available on the Internet, but more are expected to do so in the coming year. If the various projects live up to their promise, the Internet may become a bit like a walk down Fifth Avenue in midtown Manhattan: you pass a few fancy stores, you pass a lot of tawdry stores, and then, at Forty-second Street, you come to the gloriously imposing bulk of the New York Public Library, stately monument to scholarship and to the democracy of the printed word. Quite apart from its greater reach, the digital version may prove better than the original in one other respect: you may be able to go there without having to walk past all the junk.