Archival Formats for Video Digitization

As digital camcorders become more prevalent, a growing number of videos are being created in digital format on magnetic tapes, then transferred to computers However, there are questions about the suitability of servers for video file storage at this time. For long-term preservation, uncompressed formats are preferred, to avoid loss of data, but uncompressed video files are extremely large. In a recent example, one minute of video from a linguistic project was digitized. As an uncompressed AVI file, it took 214 megabytes, compared to 10.4 megabytes in MPEG-1 format, and 12.4 megabytes in WMV format. At more than 200 megabytes per minute, a single hour of uncompressed video would require more than 12 gigabytes. Common wisdom now says that storage is cheap, but it may not be cheap enough for large amounts of uncompressed video. Furthermore, digital video standards are still evolving. The Library of Congress and the Survivors of the SHOAH Visual History Foundation both have been using Digital Betacam tapes for archival purposes. Of course, this will entail periodic copying onto fresh tapes.

Although uncompressed formats are optimal for archiving, compressed video formats might well be adequate for most linguistic research, since the crucial information is aural, not visual. At this time, MPEG-2 is one of the best formats for digital storage. When saving the file in MPEG-2 format, be sure to set the Quality to high. Note that MPEG compresses each still image as JPEG, and also compresses frames over time; both compression processes cause loss of data.

A newer codec, Motion JPEG 2000, was recommended in 2004 by the Dance Heritage Coalition for digital preservation of dance videos, after a careful comparison of all of the existing formats. In 2005, the Library of Congress chose to begin digitizing its audiovisual collections in this format. Motion JPEG 2000 uses wavelet technology to perform lossless compression on each frame, each digital image, of the video; it does not apply inter-frame compression. This means that Motion JPEG 2000 can preserve video without loss of quality, but it also means that the resulting files are still extremely large. Motion JPEG 2000 may not be practical for use by smaller projects at this time, but it should be considered as an alternative to lossy compression schemes.

Whichever format is chosen for digitization, the original tape should be saved as well, because future technological advances may make it possible to re-digitize the video in a higher-quality, uncompressed format.

Extraction of Audio

There is a second problem with digitizing video of endangered languages data: Camcorders are primarily designed to record video, not audio, and their microphones are generally unsuitable for linguistic research. Higher-quality microphones may be attached to the camcorder, or, when possible, a separate sound recording might be made. Make sure that the camcorder is set to capture audio at 48 or 44.1 kHz (many cameras come with a default setting of 32 kHz). If the only audio available is from the video recording, it is important to extract it from the video and save it separately, because when video is saved in MPEG-2 or MPEG-4 format, the audio is automatically compressed as MP3.

The content of this page was developed following the recommendations from The NINCH Guide to Good Practice, and Multimedia Format for the Linguistic Data, Digital Audio, Audio for Video, and Digital Video, and Digital Video Preservation Reformatting Project: A Report.

