Arnold Smolen, Ph. D., Principal Investigator
Joanne Grossman, Project Director
Margaret Graham, Digital Project Archivist, email@example.com
Claire McGuire, Metadata Archivist, Virginia.McGuire@DrexelMed.edu
Charles Dennis, Web Developer, firstname.lastname@example.org
Michael Ratti, Digital Resources Specialist, email@example.com
Interns and workstudy students:
Laura Stroffolino and the little bagel
Kerry Corrigan Annos
With additional support from:
Karen Ernst, Administrative Assistant
Barbara Williams, Reference Archivist
Ian Richmond, Systems Administrator
Stephen Janick, Archivist
Drexel College of Medicine IT department
ACCESS & DISCOVERY
Item records are fully browsable in addition to being searchable by metadata fields or keyword. A portion of the collection materials can be searched by their associated full-text file. These materials include all printed text and transcribed documents. For additional searching assistance, please see the help page.
The digital collection is managed and delivered from a custom-designed database built on the open-source LAMP platform: Linux, Apache, MySQL, PHP and Perl. The database supports the public interface for access and viewing and an administrative interface for capturing metadata.
Automated processes manage text and image processing, including generating OCR on image files of printed text pages; converting master TIFF files to tiled JPGs for web delivery; and converting image files to searchable PDFs.
Image capture and processing
The majority of the digital images were captured on flatbed scanners as 400ppi, uncompressed TIFFs. Oversize and bound volumes were outsourced to the University of Pennsylvania and the OCLC Preservation Division. Delivery JPGs are created using Zoomify, an application that slices the master TIFF into tiled JPEGs for efficient and flexible web delivery in a Flash-based viewer. An additional watermarked JPEG is created for each page for printing purposes.
Text capture and processing
Vividata’s OCR Shop XTR provides command line OCR processing and PDF conversion for the majority of machine produced text in the collection. This OCR output generally remains uncorrected and supports full-text searching. Abbyy FineReader is used for desktop OCR where greater precision is needed. These OCR files are corrected and used for full-text searching and for display of transcriptions.
CATALOGUING & METADATA
Materials are catalogued at the item level, with some variation on how an “item” is defined. Generally, an item is a complete physical item suc as an image, a pamphlet, or a letter. For greater precision in searching, the definition is subjective: at times pages from a scrapbook are defined as an item, or discrete articles pasted in a scrapbook are catalogued individually.
The digital collection materials come from a number of different physical collections. Each physical collection is catalogued in the University Library OPAC and each digital object’s item record links to its originating physical collection. Eventually, the OPAC collection-level record will also link to corresponding digital objects.
Item-level cataloguing includes:
- Descriptive metadata describing the digital item and constructing its access points
- Administrative metadata that tracks the digital surrogate and its lifecycle
- Structural metadata dictating how the items are navigated online
Item-level records are based on an extended set of Dublin Core elements. Dublin Core was chosen because of its simplicity and to support the harvesting of metadata through the Open Archives Initiative Protocol (OAI-PMH). Note that only qualified Dublin Core, the 15-element set, is supported in OAI-PMH. This set will be mapped to relevant, qualified elements for external harvesting.
The IMLS funding required that the project enlist a consultant to advise on digital preservation, prompting focus and activity by project staff: Dr. Michael Lesk, Rutgers University, met with project staff and provided project-level recommendations; the department co-sponsored a regional forum on digital preservation; and the forum launched a university-wide initiative to preserve digital assets.