This document does not tell you how to digitize audio from analog sources such as magnetic tape and vinyl records. ITS and MPC staff can advise you on the hardware and software required for doing that. Some resources that might help you with this are listed in the Digital Audio and Video Resources under "Digital Audio" such as Creating digital audio files: a step-by-step guide and see also the book Guidelines on the Production and Preservation of Digital Audio Objects, by Kevin Bradley, August 2004 (iasa-TC 04).
The purpose of this document is just to present the basics of digital audio and then to present the specifications for digital audio files for purposes of long-term preservation (archiving) and for submitting to CONTENTdm for Web delivery.
Audio quality is influenced by several factors:
Sample Rate
Bit Rate
Sample Size
Compression Level
Sample rate is the number of samples from a recording captured per second. That is, how many times per second the voltage of the analog signal is measured. The more samples taken, the higher the amount of data captured, and, in most cases, the higher the quality. For example, CD audio is sampled at a rate of 44,100 times per second (44.1 kHz). The following table describes benchmarks for audio quality. These offer gradually improving quality and a not so gradual increase in file size.
Sample Rate |
Quality |
File Size |
---|---|---|
96 kHz |
Preservation |
Super large |
48 kHz |
DVDs |
Very large |
44.1 kHz |
CDs |
Large |
22.05 kHz |
Good |
Medium |
11.025 kHz |
Internet |
Smallish |
To record high quality audio a sample rate of 44.1 kHz should be adequate.
To calculate the file size use the following formula:
Sample size is usually expressed in terms of the bit depth of each sample made of a signal. That is, how many bits are being sampled as the same time - whether it is 8, 16, or 24 bits. This in turn determines how big the range is from the lowest to the highest level recorded for that signal. CD audio uses a 16-bit depth, which provides a large dynamic range of 0 to 65,534 bits, whereas an 8-bit depth would provide a range from 0 to only 255 bits.
Higher bit depths increase the dynamic range of the signal and thus improve quality. Speech can usually be recorded at a bit depth of 8 without seriously affecting quality, but why take the risk of failing to capture something that might be usable by later technology. Record speech in mono mode but a depth of 16-bits.
Bit Rate refers to how many bits of data are being transmitted per second. Lower bit rates result in smaller file sizes but poorer sound quality and higher bit-rates result in better quality but larger files.
The bit rate can be recorded in two ways - constant or variable.
A constant bit-rate, in comparison, records audio data at a set bit depth irrespective of the content. This produces a replica of an analog according, even reproducing potentially unnecessary sounds. As a result, file size is significantly larger than those encoded with variable bit-rates.
A variable bit-rate creates smaller files by identifying and removing only inaudible sounds. [insert Y.]
[insert Z]
Quality |
Sample Rate |
Sample size |
Bit Rate |
Transmission Time |
---|---|---|---|---|
CD |
44.1tkHz |
16 |
1411 kbps |
10 MB/min. |
? |
? |
256 kbps |
? |
|
Good CD |
? |
192 kbps |
1.5 MB/min. |
|
Near CD |
? |
128 kbps |
1 MB/min. |
|
FM |
? |
64 kbps |
.4 MB/min. |
The formula for calculating the Bit-Rate is…
(Sample Rate) x (Bit Depth) x (# of Channels)
(44,100) x (16) x (2) = 1,1411,200
In order to preserve any spatial separation of sounds in a recording more than one microphone must be used to record two separate streams of the signal. Two microphones are generally needed to capture stereo sound and each microphone created one channel. Using just one channel reduces the sample size by 50% and thus produces a size %50 smaller than one that is recorded with 2 channels.
The following charts puts together these three fundamental components of digital sounds.
Sample Rate |
Sample Size |
Channels |
Bit Rate |
Quality |
File Size |
---|---|---|---|---|---|
96 kHz |
24-bit |
1 or 2 |
? |
Preservation quality |
Super large |
48 kHz |
16-bit |
1 or 2 |
? |
DVD quality |
Very large |
44.1 kHz |
16-bit |
2 (stereo) |
? |
CD quality all types |
Large |
44.1 kHz |
16-bit |
1 (mono) |
? |
CD quality speech |
Half size of 2-ch. |
22.05 kHz |
16-bit |
2 (stereo) |
? |
Good all types |
Medium |
22.05 kHz |
16-bit |
1 (mono) |
? |
Good all types |
Half size of 2-ch. |
22.05 kHz |
16-bit |
1 (mono) |
? |
Good speech |
For small file size, |
11.025 kHz |
16-bit |
2 (stereo) |
? |
Passable all types |
Smallish |
11.025 kHz |
16-bit |
1 (mono) |
? |
Passable speech |
Half size of 2-ch. |
11.025 kHz |
8-bit |
1 (mono) |
? |
Passable speech |
Half size of 16-bit. |
(Sampling Rate) x (Resolution) x (# of Channels) x (time in seconds) / (Bios / Byte)
(44,100) x (16) x (2) x (60) / 8 = 10,584,000
The size of audio files is affected by all the factors mentioned so far: Sample Rate, Sample Size, Number of Channels, and Bit Rate. But the size of the resulting file can also be affected by the codec that is used to generate the file.
Codecs that do not try to compress the data can indeed provide very high quality sound, but they produce large files. Uncompressed formats such as Microsoft's WAVE PCM encoding (.wav) and Apple's Audio Interchange File Format (.aiff) are great formats to use to capture high quality audio streams, but each minute of a recording takes an average of 10MB of disk space. Thus, these formats are not appropriate for files that are to be uploaded to HDR. For purposes of efficient delivery, it is better to submit compressed audio files to HDR.
In order to make digital audio files smaller, you could just reduce the sample rate and/or the bit rate and record mono instead of stereo, but that results in grainy, brittle, unacceptable audio quality you can easily hear in a side-by-side test. MP3 is more clever than that.
Compressed digital audio codecs generate very small file sizes, but at the expense of sound quality. The MP3, Ogg and Real Audio formats are popular compressed formats. They try to remove insignificant audio data (noise above 200000Hz). The following chart lists a few audio formats that use lossy compression and some of their most common uses.
Format |
Bit-rate |
Streaming support |
Popularity |
---|---|---|---|
MPEG 1 Layer 3 (.mp3) |
Variable |
Yes |
Common on all platforms |
Ogg Vorbis (.ogg) |
Variable |
Yes |
Limited support |
RealAudio (.ra) |
Variable |
Yes |
Popular for streaming |
Windows Media (.wma) |
Variable |
Yes |
Primarily for Windows |
Since MP3 is the most common lossy audio format used today (2007), we offer the following chart that shows various file sizes created by MP3 files using various bit rates compared to an uncompressed WAV file. As you can see, by lowering the bit rate of an MP3 file, you can achieve quite high compression.
Format |
Bit Rate |
Compression |
Transmission |
Quality |
---|---|---|---|---|
WAV |
1411 kbps |
1:1 |
10 MB/min. |
CD quality sound (if 16 bits, 44.1kHz) |
MP3 |
256 kbps |
6:1 |
? |
? |
MP3 |
160 kbps |
9:1 |
1.5 MB/min. |
Good CD quality |
MP3 |
128 kbps* |
11:1 |
1 MB/min. |
Average CD quality |
MP3 |
64 kbps |
22:1 |
.4 MB/min. |
FM radio quality music |
MP3 |
32 kbps |
? |
? |
AM radio quality music |
MP3 |
16-24 kbps |
? |
? |
AM radio quality spoken word |
[insert X]
*Most common bit rate for MP3s. Spoken word audio can easily be set at a lower bit rate and a high compression ratio will not affect the resulting quality nearly as much as it would a music recording.
There are some codecs that do a form of lossless compression, such as WMA9 (Windows Media Audio 9 Lossless) and FLAC (Free Lossless Audio Codec). These codecs compress the sound data in a way that results in a small sound file but one that has exactly the same quality as the original file. The resulting file is larger than one generated by a lossy compression scheme, but it's sound quality will be better.
Streaming audio avoids many of the problems of large audio files. Instead of having to wait for the entire file to download, you can listen to the sound as the data arrives at your computer.
Streaming audio players store several seconds worth of data in a buffer before beginning playback. The buffer absorbs the bursts of data as they are delivered by the Internet and releases it at a constant rate for smooth playback.
Many digital audio formats can be streamed by wrapping them in a streaming format, such as Microsoft ‘s ASF (Active Streaming Format), which can be used to stream MS Audio, MP3 and other formats.
Some popular streaming audio systems are
QuickTime (QuickTime format)
Windows Media Services (.asf format)
Helix (RealAudio format)
Icecast MP3 and Off Vorbis formats
Shoutcast (MP3 format)
(For link to these systems, see "Streaming Media Systems" Digital Audio and Video Resources.)
If your MP3's are going to be streaming over a live connection, keep in mind that higher bit rates require faster connections; otherwise, the sound file is going to "break up" in transit because you don't have enough bandwidth.
Conversion between digital audio formats can be complex. If you are producing audio content for Internet distribution (as for HDR), a lossless-to-lossy (e.g. WAV to MP3) conversion will significantly reduce bandwidth usage. Only lossless-to-lossy conversion is advised. The conversion process of lossy-to-lossy will further degrade audio quality by removing additional data, producing unpredictable results.
It doesn't make any sence to apply lossless compression to a file which was previously encode using a lossy codec. Such an action most likely will only result in a larger sound file, without any gain in sound quality.
Spoken word audio does not require the high frequency or bit-rate that music does. You can save both storage space and download time by recording spoken word audio using the mono channel and compressing it at frequencies and bit rate lower than you would normally use for music. Some common codecs used for highly compressing spoken word recordings are DSP Truespeech (8 kHz, 1 bit, mono) for monotone dictation, L&H SBC (8 kHz, 16 bit, mono) for phone quality speech, and MS ADPCM (8 kHz, 4 bit, mono) for talk radio quality, but decompression codecs are not as commonly available to users as is the more popular MP3.
Single-voice music recordings may be recorded in mono rather than stereo sound, which immediately decreases the file size by half.
Though the spoken word and music have very different dynamic ranges, they should both be captured with equal care to quality if you are recording them for long-term purposes. However, as you downsample them for ease of delivery, music recordings suffer more degradation of quality faster than do spoken word recordings.
Digital Audio files streamed over the Web must be saved in a streaming format such as RealAudio (.ra) and Windows Media Audio (.wma) and hosted on a streaming server. MP3 files do not stream very well. They need to be downloaded before playing.
Do not record at a volume setting so high that the loudest signals are recorded at a dB (decibel) level beyond 0 dB, which is the highest possible level the recording device can reliably handle. Signals with dB levels above 0 dB will be clipped and can cause extreme distortion.
To optimize digital audio quality (for preservation or for creating a very high quality, short clip), you should…
Use software designed to capture audio from the kind of source you are recording from
Use an uncompressed file format (WAV, BWF, AIFF, or FLAC)
Record the data at a minimum sampling rate of 44.1kHz, bit depth of 24, and a bit rate of 192 kbps.
If the original is low quality (or even compressed), save a preservation copy in the WAV or AIFF file format to protect it from further degradation.
To optimize delivery of digital audio files over the Internet, you need to…
Shorten the clip as much as possible
Use a compressed file format (MP3, etc.)
Reduce the sample Rate (22 kHz)
Reduce bit depth to 16.
Reduce the bit rate (128 kbps)
Digital Audio Digitization and Editing Software
Handbrake
Digital Audio Players:
View Digital Audio Specifications
*24 bit/96 kHz should be necessary only for recording extremely high fidelity musical recordings, not spoken word recordings nor most music recordings. Some hardware and software compatibility problems could arise later if 24 bit/96 kHz is used.
Chart Definitions:
Sample Rate - controls how many samples of sound are taken per second. The higher the number of samples taken per second, the better the resulting quality.
Bit Rate (=Sampling Resolution/Sampling Precision) - controls how many different gradations are possible per sample measured in bit-depth.
(Reviewed: October 1, 2013)