The Renaissance of Medical History on the Web: Images and Archives

By Cindy Goldstein & Mary Holt

(presented at the Medical Library Association's Annual Meeting, Seattle, May 27, 1997)

The unique image capabilities of the World Wide Web have made it possible for libraries to highlight and broaden access to their under utilized special collections. Tulane University Medical Library has a wealth of historical resources including photographs, medals, diplomas, and primary documents concerning medicine in Louisiana. What the library does not have is a wealth of imaging expertise nor money with which to bring these historical resources to life on the Web.

It was this lack of where with all coupled with a short article in an AMIGOS Bibliographic Council publication that spawned the idea of applying for an AMIGOS Fellowship to support the scanning of one of the most important of the primary documents in the collection, the Registre du Comite Medicale de la Nouvelle Orleans. The Registre is the manuscript record of the meetings of the Licensing Board of Eastern Louisiana from 1816-1854. The article described a new imaging technique being used by OCLC's Preservation Resources Division to scan microfilm and create digitized images for preservation purposes. Since the library had microfilmed the Registre years earlier, this seemed like a wonderful opportunity to get it in the proper format to mount on the Web. Unfortunately, the project did not get off the ground in time to meet the Fellowship application deadline, but Preservation Resources were very interested in the idea of scanning for the Web as opposed to the more traditional preservation scanning and they offered to do it as an educational project free of charge.

The Registre consists of 230 pages including an index of the names of the approximately 1200 people who came before the Comite Medicale petitioning for the right to practice medicine in Louisiana. As such, it is of interest to genealogists as well as medical historians. The reference department often used the photo reproduction of the document available in house to answer questions of people who call requesting information on a certain ancestor who practiced medicine in Louisiana. Even though it has an index it is somewhat cumbersome to use as the text of the document is handwritten in French. The names in the index are also handwritten and the user must search through all names that begin with a certain letter to locate names, as the names are not in alphabetical order.

Preservation Resources initially proposed scanning 130 (double page) images in 8-bit grayscale at 300 dpi. Images were to be scanned in TIFF format with LZW compression. The images were not to be split due to text in the gutter margins. A second set of images would be created for HTML for Web viewing in GIF or JPEG. The TIFF images created were 9MG compressed with the JPEGS running under 2MG each with a compression ratio of 25 to 1. The files were enormous and completely unrealistic in terms of load time. This was the difference in orientation based on preservation needs for high resolution images and Web needs for lower resolution and faster access speed.

Paul Barone, Imaging Technician at Preservation Resources, managed to find a solution. It involved splitting the double page frames in half. The original concerns about losing marginalia proved to be unfounded, and the split frame images were done as 120 dpi with the JPEG's having an excellent legibility at the 30:1 compression ratio. This image was somewhat larger than the initial proposal, but the larger on screen image allowed the very small text and marginalia to be legible. Due to the compression the size of each file was kept near or under the 200K estimate that was needed for on screen viewing - a far cry from the original 2MB! Preservation Resources also provided compressed TIFF files for downloading and printing that were reduced from 9 MG to an average of 1.8MB apiece. These files have excellent highlight and shadow detail and even the smallest of the script is clear and legible. The TIFF available on CD will allow the images to be reproduced as printouts on a standard laser printer. The project was completed on May 1, 1996 and the files were delivered to the library on a CD-ROM with 420 MB of data. The compressed JPEG files only required about 35 MB. The JPEG files were loaded on the University's web server in July.

Mary Holt, Monographs Librarian, did all of the work on the library side, writing the HTML and creating the Registre presence on the library's home page. Searching the Registre requires a multi-step process because the files are image files and not ASCII text. In order to find a specific entry, it is necessary to identify the proper page number from the personal name index, go to the page number index, display the page and read/print the appropriate entry. The desirability of creating client side image maps from the index pages to the appropriate page to allow more direct and seamless navigation from index to content has been discussed, but a lack of time and lack of needed graphic programs in house have put this project enhancement on hold for the time being. It is hoped that this client side-image mapping will be completed in the near future and that it will greatly enhance the site. We have had real life interest in this from a physician doing research on a certain physician who was the very first physician to be registered by the Comite. He contacted us through the E-mail link we made available at the site. The reference librarians have also found the web site to be an excellent resource for their needs. It also allows a user to have access to a document that they normally would have had to relied on the assistance of the reference staff to obtain information. Now they can peruse the French script and the names to as they need. The library is extremely pleased with the outcome of the project and has nothing but kudos for the work provided by Preservation Resources.

The Registre Project was not the library staff's initial experience with the development of historical information on the web. The Library's web team made an early decision to try to make unique information and images that would enhance not only our site, but be an useful addition to the web community. The first projects undertaken were to scan a number of the historical photographs in the collection that pertain to the history of Tulane University and the Rudolph Matas Library. One of the first historic item scanned was the Prospectus, the document relating to the founding of the Medical College of Louisiana. The primary document was scanned and the text of the document was actually added in HTML with links to the images of the document. There are a number of additional historical imaging projects currently in various stages of completion. One is the scanning of medical medals and medallions in the Weinstein Medallion Collection. The front and back of a few medals were scanned directly from the scanner bed as a test. This direct scanning of the medals worked very well. Currently the library does not own a scanner attached to an adequate computer to run the needed graphics program to scan all the medals in house. The finding aid for the collection is mounted at the library's web site and the images serve as representative examples of the items in the collection. At present the library has developed HTML web finding aids for a number of the Historical Collections, including the Diploma and Certificate Collection, Tulane Matriculation Tickets and the Historic Works for the Library of Ernest Carroll Faust. The library has a number of large photographic collections, with holdings including slides, lantern slides, black & white and color photographs. While the entire collection may never be entirely scanned, web finding aids with representative examples imaged will greatly enhance the availability of these collections. The major impediments to further these historical projects on the web are the cost of upgrading computer hardware and software. The greatest cost, however is the cost of staff time to master the process, perform the imaging and develop the HTML finding aids. \ The library may choose to have the university's photography and graphics department provide selected scanned images for mounting on the web, but the nature of the historical materials and the overall amount of the historical imaging that is desired will require an in house imaging project.

Mounting historical resources on the Web is one way to preserve their content and make the information contained in them available to people throughout the world. Historical resources tend to be the first thing libraries put on the Web because many of these resources do not involve copyright problems and expenses. Yet, the scanning and preparing of images for loading on the Web may be costly. For libraries like Tulane, partnering with an organization such as OCLC's Preservation Resources can be a godsend. There is no way the project could have been done without their involvement. The library staff had neither the expertise nor the facilities to do it in spite of excellent support from the Tulane Computing Services (TCS) on the main campus across town. It was the university webmaster who scanned the photographs and slides selected for the initial creation of the library's home page over two years ago, but a project like the Registre was simply too big and time consuming for TCS to undertake. TCS reviewed the revised proposal from Preservation Resources and verified that what they were suggesting would result in something that could realistically be mounted on Tulane's web server and accessed as desired.

The next phase of our historical web site development is to enhance the availability by allowing users better means to discover the existence of these resources. The notification of other archival collections in Louisiana through traditional professional communication and the use of subject listservs such as CADUCEUS-L, a history of medicine moderated list, greatly enhance the community's awareness of a resource such as the Registre. These steps do not entirely meet the needs of all users in the greater community. With the development of the historic digital resources the library's intention was to make this information more readily available to a wide and diverse audience and have less need to physically handle the manuscripts or historic items or to be involved in a time consuming search through microfilm and paper for specific information. It is very conceivable that the user may be in France searching a family name, as well as, the local academic researcher that has traditionally obtained access to this document. The library must consider providing help to the may types of users in many varied locations to locate this information that may be relevant to their needs.

There is a great deal of interest in the web community to provide descriptive information for digital resources, but the most effective way to do this is currently under discussion. Documentation about digital documents and images is sometimes referred to as metadata. Cataloging and indexing is and may be called metadata, which really refers to information about information. Metadata elements are currently under development. One of these sets of elements in development is known as the "Dublin Core". A series of workshops have convened that have made progress on developing common models for description of Internet resources to support resource discovery and retrieval, the Dublin Metadata Workshop held in Dublin, Ohio (March 1995), the Warwick Metadata Workshop held in Coventry, UK (April 1996), and the Image Metadata Workshop held in Dublin, Ohio (September 1996). The Dublin Core Metadata Element Set, a simple resource description record, was the result of the first workshop. This element set has the potential to provide the basis of electronic description that may improve information discovery on the Internet. The most recent meeting concerning the "Dublin Core" was a workshop held in Australia in March of 1997 that worked to further refine the data elements and syntax for embedding META tags in HTML. One of the prevailing ideas is that the creator of a digital resource will provide adequate description via the meta tags and that web search engines will be developed to provide access using the information located in the metadata tags. AltaVista currently provides information on how to employ meta tags to best index your document via that search engine. This envisioned use of metadata which involves embedding the fields within the document or image is not the only way the Dublin Core may be utilized. The information may also be in a separate metadata record which provides information on how to access a document, this may be viewed as a surrogate of the digital documents and may be detailed or very brief. The information in these meta tags may adhere to prescribed policies or they may be free form. Tulane Medical Library plans to add embedded metadata to all unique digital holdings, such as the Registre. The meta tags will be embedded in a HTML document which serves as the finding aid and has a description of the document as well as the navigational aid to locate the images that make up the entire document. While we continue to follow the progress of the Dublin Core with interest, this is only one of many meta description schemes in development. There is the Text Encoding Initiative (TEI) headers for electronic documents which use SGML. Another metadata scheme of that may be of interest to librarians describing historical materials is the Encoded Archival Description (EAD), a scheme that utilizes SGML document type definition for finding aids. Jennifer A. Younger provides an excellent overview of metadata and digital description in her article, "Resources Description in the Digital Age", which was published in Library Trends, Winter 1997.

Librarians have a vast amount of expertise in describing resources, traditionally known as cataloging and indexing. While the information landscape is changing, resources may be described using the traditional means of the established cataloging rules and routines. The objection to such traditional schemes of document description is the cost of creating the detailed information and the complicated record structure and rules that are perhaps unlikely to be used by the HTML creators. There is a wealth of information on describing digital resources is available on the web. Nancy Olsen has written a manual on the cataloging of electronic resources, in addition, there is a listserv dedicated to internet cataloging. Using a MARC record to describe documents such as the Registre, and adding an 856 field which is designed to allow hypertext access to the document itself. There are currently a number of integrated library systems that include web interfaces and Z39.50 protocols that utilize the 856 linking field which will greatly enhance the accessibility of digital resources. Tulane is currently in the process of choosing an integrated library system that will provide this capability. These traditional catalog records may employ many structured and controlled access points, including authority for author and corporate names, subject heading schemes such as Medical Subject Headings (MeSH) classification numbers such as NLM classification.In addition, MARC records that fully describe a library's digital documents and images may be added to the OCLC database and become part of the records that provide electronic access (856 fields in the MARC record) in the online resources of OCLC and the InterCat Catalog for Internet-accessible resources (11,599 records- April 18, 1997).

Libraries should contribute their digital holdings to the national databases just as libraries have historically contributed their journal and monographic holdings to the national databases greatly enhancing resource discovery and the availability of the resources to users everywhere. Librarians should step forward as they have a major role to play in this venue of digital description. OCLC is currently providing a great deal of leadership in this endeavor through there involvement both in the traditional cataloging and in the development of metadata. OCLC is also involved in persistent resource locators, (PURLS) and in the development of NetFirst, a reference database of internet resources. NetFirst allows catalogers to derive information from the reference database to be used as the core of a more detailed MARC record. The current objective of Tulane Medical Library is to maximize both the use of traditional cataloging and meta tags for search engines like AltaVista. We feel the combination of both these strategies is what will work for the most people most of the time in the current age where no real guidelines have been fully developed and web search engines have a way to go in development before they may adequately meet the needs of users.

