"Ladies! History is useful only as its lessons may be made available for instruction. In entering upon your life work it is well to revert to the past, to review the difficulties encountered by those who have gone before and to note the victories won."

-- Mary Scarlett Dixon, M.D., WMC commencement address, 1870

In the fall of 2003, the Archives and Special Collections on Women in Medicine and Homeopathy of Drexel University College of Medicine was awarded an Institute for Museum and Library Services (IMLS) 2003 National Leadership Grant to build a digital collection on the history of women physicians. Now in its third year, the project infrastructure is complete and provides access to over 27,000 pages of materials representing the history of women physicians, increasing usage and reaching a broader, more diverse audience. The project is part of Drexel's long-term plan to continue meeting the needs of a broad learning community.

View the terms of usage for this digital collection.

From the Collection
Report of the Committee on Necrology
Report of the Committee on Necrology


Project Documentation:
IMLS Reports (.pdf)
Preservation report
Metadata schema (.pdf)
Related Projects

Legacy Center Archives and Special Collections
Drexel University College of Medicine

2900 West Queen Lane
Philadelphia, PA 19129

215-991-8172 (fax)

Research hours: 9:30 - 4:30 by appointment.


Arnold Smolen, Ph. D., Principal Investigator
Joanne Grossman, Project Director
Margaret Graham, Digital Project Archivist
Claire McGuire, Metadata Archivist
Charles Dennis, Web Developer
Michael Ratti, Digital Resources Specialist

Interns and workstudy students:
Laura Stroffolino and the little bagel
Kerry Corrigan Annos
Mustanser Badar

With additional support from:
Karen Ernst, Administrative Assistant
Barbara Williams, Reference Archivist
Ian Richmond, Systems Administrator
Stephen Janick, Archivist
Drexel College of Medicine IT department


Item records are fully browsable in addition to being searchable by metadata fields or keyword. A portion of the collection materials can be searched by their associated full-text file. These materials include all printed text and transcribed documents. For additional searching assistance, please see the help page.


The digital collection is managed and delivered from a custom-designed database built on the open-source LAMP platform: Linux, Apache, MySQL, PHP and Perl. The database supports the public interface for access and viewing and an administrative interface for capturing metadata.

Automated processes manage text and image processing, including generating OCR on image files of printed text pages; converting master TIFF files to tiled JPGs for web delivery; and converting image files to searchable PDFs.

Image capture and processing

The majority of the digital images were captured on flatbed scanners as 400ppi, uncompressed TIFFs. Oversize and bound volumes were outsourced to the University of Pennsylvania and the OCLC Preservation Division. Delivery JPGs are created using Zoomify, an application that slices the master TIFF into tiled JPEGs for efficient and flexible web delivery in a Flash-based viewer. An additional watermarked JPEG is created for each page for printing purposes.

Text capture and processing

Vividata’s OCR Shop XTR provides command line OCR processing and PDF conversion for the majority of machine produced text in the collection. This OCR output generally remains uncorrected and supports full-text searching. Abbyy FineReader is used for desktop OCR where greater precision is needed. These OCR files are corrected and used for full-text searching and for display of transcriptions.


Materials are catalogued at the item level, with some variation on how an “item” is defined. Generally, an item is a complete physical item suc as an image, a pamphlet, or a letter. For greater precision in searching, the definition is subjective: at times pages from a scrapbook are defined as an item, or discrete articles pasted in a scrapbook are catalogued individually.

The digital collection materials come from a number of different physical collections. Each physical collection is catalogued in the University Library OPAC and each digital object’s item record links to its originating physical collection. Eventually, the OPAC collection-level record will also link to corresponding digital objects.

Item-level cataloguing includes:

  • Descriptive metadata describing the digital item and constructing its access points
  • Administrative metadata that tracks the digital surrogate and its lifecycle
  • Structural metadata dictating how the items are navigated online

Item-level records are based on an extended set of Dublin Core elements. Dublin Core was chosen because of its simplicity and to support the harvesting of metadata through the Open Archives Initiative Protocol (OAI-PMH). Note that only qualified Dublin Core, the 15-element set, is supported in OAI-PMH. This set will be mapped to relevant, qualified elements for external harvesting.


The IMLS funding required that the project enlist a consultant to advise on digital preservation, prompting focus and activity by project staff: Dr. Michael Lesk, Rutgers University, met with project staff and provided project-level recommendations; the department co-sponsored a regional forum on digital preservation; and the forum launched a university-wide initiative to preserve digital assets.