You are here

File Naming

Table of Contents

File Naming Principles

Good filenames will...

  • ensure efficient management (storage and backups) of large numbers of digital files.
  • ensure the names are compatible with most software applications and file systems now and in the future, including compression utilities, Web protocols (URLs), optical storage file systems, and different operating systems.
  • allow managers to be able to distinguish various versions of the same file from each other.
  • ensure that the names enable CONTENTdm to identify the proper sequencing and captioning of the images/pages.

Length

  • Although these guidelines allow long file names, do not exceed 27 characters for the base name. The whole filename, which includes the base name, period, plus extension should not exceed 31 characters. Some operating systems allow more characters than this, but limiting the base filename to 27 will prevent some interoperability problems for your files in the future.

Characters

  • Use numbers (0-9) and only lower case characters (a-z) or hyphens. Uppercase letters and mixed case filenames (such as the so-called "camel case") are discouraged in order to simplify use with case sensitive systems.

  • Avoid using the letters "l" (el) and "o" (oh) whenever you can because they can easily be mistaken as a "1" (one) or a "0" (zero) respectively.
  • Avoid using special characters in a file name. Do not use any of these: \ / : * ? " < > | [ ] & $
    Some characters, such as _ , . (underscore, comma, and period) might be handled differently in some operative systems and application software, it is the safest policy to avoid using them as well.

  • Do NOT use spaces in file names, particularly for files that might be used in a URL. Also, spaces in the file names are invalid in some systems and may cause confusing line breaks in reports and email messages.

  • Avoid using underscores as they do not show up well in URLs and because CONTENTdm use underscores to indicate where text for a caption begins (see section 3 below).

  • Filenames and directory names should neither begin nor end with a punctuation character (period, hyphen, underscore).
    .myfile.tif
    -myfile.tif
    _myfile.tif
    myfile-.tif

    Filenames and directory names should not contain multiple consecutive punctuation characters.
    my--file.tif
    myfile..tif

Extensions

  • Use only standard file extensions (e.g., .tif, .wav, .mov, .mp3, .mpg, .rm, .xml, .txt, etc.).

  • Filenames should not have two or more (identical) filename extensions (example: NOT foo.tif.tif or bar.pdf.pdf.

- Top -

Digital objects that consist of a single file (such as a photograph)...

Although there is a maximum number of 27 characters allowed in the base name of a filename, there is no standard way to break up the 27 characters. The following recommendations are designed to optimize the use of the allowable number of characters.

The filename can follow this pattern:

<institution>-<department>-<collection><item>.<format>

<institution> = a three-character code identifying the institution responsible for this digital object ("yhm" = the OCLC code for Hamilton College, New York). The reason for using an institutional identifier in a file name is to help keep track of the origin of a file if it ever finds itself on a non-Hamilton College server. It can then always be traced back to the Hamilton College Library.

<department> = a three-character code for the department that holds the original object.

  • Examples

    yhm-arc-... (College Archives)
    yhm-spe-... (Special Collections)
    yhm-jaz-... (Jazz Archive)

<collection> = A string of characters that uniquely identifies the digital collection this object belongs to. Check previous uploads to CONTENTdm to see if there is already a string established for this collection.

When no <collection> identifier is used in a filename it is because the object falls into the "general" category. We should, however, try to use "gen" as the <collection> identifier in the future.

  • Examples (3 letters of the collection name)

    yhm-arc-pub-... (College Publications)
    yhm-spe-hod-... (House of David collection)
    yhm-jaz-pho- (Photographs)

<item> = A string of characters that uniquely identifies this object. If the item has an accession number or call number, convert that number to valid filename characters. If there is no such inventory number, construct one with a 3-letter label and a sequence # padded out to at least three characters.

The above guidelines should facilitate alphabetization, enable human eye scanning and uniquely identifying every file. The next section discusses how the file name should be formatted to ensure proper ingesting of objects that consist of more than one file.

  • Limit this segment to 13 characters.

  • Examples (3 letters of the sub-collection name)

    yhm-arc-pub-cat- (college catalogs)
    yhm-arc-kir-17920512 (using an item's date, yyymmdd)
    yhm-spe-hod-pho- (photographs)

- Top -

Digital objects that consist of more than one file (such as a book)...

If a digital object consists of more than one file, the file names as discussed in section 2 above but should be appended by sequence numbers so the sequence of the files in the folder mirrors the sequence of the individual parts in the content. This facilitates the orderly, automatic displaying of such compound objects in CONTENTdm.

<filename>-<sequence>.<extension>

  • <filename> = the filename as formatted according to section 2 above.

  • <sequence #> = a number (always padded out to 3 digits) representing the ordinal place of the file in the sequence of files for that object.

Examples:

yhm-spe-hod-pho-001.tif (= 1st file to be displayed)
yhm-spe-hod-pho-002.tif (= 2nd file to be displayed)
yhm-spe-hod-pho-003.tif (= 3rd file to be displayed)

CONTENTdm specific note: If you want an image to appear with a special label when displayed in CONTENTdm, you can include that label within the filename by adding an underscore character plus the label (e.g., yhm-spe-hod-pho-001_Cover.tif)

<sequence>_<label>.<extension>

  • <page-label> = a word or phrase to be used by CONTENTdm as the caption for the image.

Examples:

yhm-spe-hod-pho-001_Cover.tif (29 total characters)
yhm-spe-hod-pho-002_Page-01.tif (31 total characters)
yhm-spe-hod-pho-003_Page-02.tif (31 total characters)

This nice feature of CONTENTdm can make it a challenge to keep the length of the filename under the 27-character limit.

Finally, ensure the file has the appropriate file extension, which is ".tif" in most cases.

- Top -

Ingesting Compound Objects using "Directory Structure"

If you want to upload compound objects to CONTENTdm using the "directory structure" method, use the following directory structure as prescribed by the CONTENTdm documentation.

  • yhm-spe-hod-pho (= name of the compound object)

    • scans (= image files to be uploaded, JPEGs or TIFFs)

      • yhm-spe-hod-pho-001_Title-Page.tif
      • yhm-spe-hod-pho-002_Page-1.tif
      • yhm-spe-hod-pho-003_Page-2.tif

    OPTIONAL DIRECTORIES

    • display (= customized images to be uploaded but not changed by CONTENTdm in any way)

      • yhm-spe-hod-pho-001_Title-Page.jpg
      • yhm-spe-hod-pho-002_Page-1.jpg
      • yhm-spe-hod-pho-003_Page-2.jpg
    • transcript (= ASCII or UTF-8 text)

      • yhm-spe-hod-pho-004.txt
- Top -

File Naming Resources

- Top -

(Reviewed: October 1, 2013)