You are here

Preservation Guidelines

The following points are based on a presentation by MacKenzie Smith at the CERN workshop on Innovations in Scholarly Communications in April of 2007.

What can go wrong...

  • Systems component failure - server, network, software, 3rd party services, or entire data center

  • Media faults (e.g. CD-ROM bit rot)

  • Software and data format obsolescence

  • Human error - deletion

  • Policy changes with bad trigger effects

  • Natural disaster

  • Network intrusion and attacks

  • Copyright restrictions - accidentally violating law by preservation activities

  • DRM - encryption prevents preservation, reduplication, and migration

  • Economic failure

  • Organization failures - an institution or department can go out of business or eliminate positions or change mission.

Preservation Services provides for...

  • Replication - multiple copies in multiple places (LOCKSS; use OAI)

  • Migration - of hardware, storage media, collections

  • Transparency - open source software, [meta]data (may need to be able to fix something)

  • Diversity - use multiple formats (Word, ODF, PDF and ASCII)

  • Auditing - checksum checking, file reads (detect and fix errors in data)

Repair Services

  • Detect errors in the data and automatically restore it from secondary storage.

HDR Preservation Policies

Preservation Planning

  • Monitor / interact with the designated community to understand requirements and changes

  • Monitor the emerging technology and standards

  • Develop and recommend preservation strategies

  • Develop packaging designs and detailed migration plans and prototypes

  • Implement administrative policies and directives.

Short-Term Preservation Guidelines

  • Scan objects according to accepted best practices.

  • Store objects in a central location.

  • Create minimal metadata in a database that includes image and text filenames.

  • Print copies of your digital images onto archival quality photographic paper

  • Use Xena (xeni.sourceforge.net/) to convert your documents into a format that will be readable in the future.

  • Make regular backups, ideally onto a removable hard disk drive.

  • Update file formats when you upgrade the software that created them.

Selection Requirements

  • Content must be curriculum or research related

  • Content must be of long-term interest

  • Content must have been created by the Hamilton community

  • Content must be copyright-free

Recommendations for File Formats for Time of Submission

  • Text documents may be submitted in their original formats (even if proprietary), such as Microsoft Word, but should also be submitted in PDF (150 dpi) and/or TIFF format (300 dpi).

  • Images should be submitted in these formats (order of preference)

    1. TIFF format (200-600 dpi; slides: 3000-4000 dpi)
    2. JPEG2000 (ask about specifications)
    3. JPEG (high dpi as possible, 72 dpi min.)
    4. PNG (high dpi is possible, 72 dpi min.)
  • PDF must meet these requirements
    • Files must not be password protected.
    • Text files must not have embedded fonts
    • If special fonts are required a copy of the font must be archived with the object

Recommendations for File Formats for Long-term Retention

The principles behind the file formats recommended for long-term retention (i.e., more than 5 years) of digital objects for in the custody of the Hamilton College Library are these:

  • the format must assure the preservation of the content, that is the completeness of all necessary data and links between them
  • the format must be widely acknowledged and accepted and can be used with hardware and software widely used and supported on the market
  • the format must be directly ready for reproduction or migration to another format that can be directly used afterward in the new format
  • the format must enable transformation from most commonly used formats (to this format) and will facilitate automatic reporting of any errors that occurred during transformations
  • the format must is independent of specific hardware, software or other components of the operating environment
  • the format must is based on international, national or widely accepted and, as a rule, open standard
  • use of the format must not violate and laws

The current list of allowed long-term retention formats for all archival material in the Hamilton College Digital Collections Repository:

Images

  • TIFF (TIFF 6.0, Adobe Systems)
  • JPEG2000 (.jp2) ISO/IEC 15444-1

Text and mixed:

  • ISO Latin-1 – ISO 8859-1
  • PDF/A - ISO 19005-1
  • XML - SGML – ISO 8879
  • TEI - an application of SGML ISO 8879
  • ODF – ISO/IEC 26300

Graphics:

  • SVG – v1.1 – W3C Specification

Video/audio:

  • MPEG-2 - ISO/IEC 13818
  • MPEG-4 - ISO/IEC 14496
- Top -

(Reviewed: September 27, 2010)