Digitization of Video Files

Page Index

Introduction

Analog video recordings have been produced in a multitude of physical formats, including 8mm, 16mm, and 35mm film, 4 sizes of reel-to-reel tape, and videocassettes in 8mm (VHS and S-VHS) and 1/2" (Betacam and M-II), each requiring a specific player. Digital video camcorders have recently become popular, providing quality video at an affordable price, but even digital video camcorders initially store the data on magnetic tapes. All of these media are vulnerable to physical decay. There are clear advantages in transferring video to a computer, including easier presentation on the Internet, distribution on DVD or CD, and consolidation from multiple physical formats. However, electronic storage of video presents problems for long-term preservation: uncompressed video files are extremely large, and formats have not been standardized.

To capture (transfer to computer) video from a digital video camcorder, the main requirements are the proper cable and a video editing application. Digitization from an analog format, such as VHS or 16mm film, requires analog-to-digital conversion, preferably using a video capture card (external video converters are also available, but the results are of lower quality).

Video Digitization Terms

Video is an especially complex multimedia format, requiring the synchronization of a series of still images with a soundtrack. Digitization of video requires a codec, an algorithm that determines how to encode and (usually) compress the data electronically. The file can then be stored in a number of formats. Uncompressed formats retain all of the original data, but they are extremely large files, unsuitable for presentation on the Web. In general, compressed formats compress each individual frame, then apply further compression by using data from the previous frame to determine how much needs to be stored in the current frame, resulting in much smaller files. It is important to realize that uncompressed files can later be compressed and stored in other formats for presentation, but the reverse is not true for lossy compressed files. Although it is technically possible to convert a compressed file back to an uncompressed format, the data lost in the initial compression will not be restored.

AVI: Audio Video Interleave, a storage format that specifies how video and audio are put together within the file. AVI can use a number of different codecs, including MPEG and DV, and may be compressed or uncompressed. It is an instance of Microsoft's RIFF (Resource Interchange File Format).

DV: An encoding format originally developed by Sony for camcorders. DV is high quality video, but it requires a lot of storage space, and its proprietary nature makes it unsuitable for archiving.

MPEG: The Motion Pictures Experts Group has developed several codecs (see below), all of which can be saved in the MPEG, or .mpg, storage format.

MPEG-1: The first of several compression algorithms created by the Motion Pictures Experts Group, designed for CD storage and medium-bandwidth applications.

MPEG-2: Not a replacement for MPEG-1, but an additional compression algorithm, designed for high-bandwidth and broadband applications, including digital television broadcast, HDTV, and DVD storage. Currently, this is the compressed format that provides the highest quality, and the one used for storage by some large archival institutions.

MPEG-4: A standard designed for handling several different streams of media information while supporting user interaction. It also includes a new compression algorithm that provides DVD-quality video for low-bandwidth applications, such as quick downloading and streaming from the Internet.

MPEG-7: This standard does not involve compression or content; rather, it's a metadata standard based on XML, providing the ability to annotate the video file, by shot or even by frame.

SMIL: Like MPEG-7, this standard does not involve compression or content, but is a metadata standard based on XML, providing the ability to annotate the video file, by shot or even by frame. SMIL (Synchronized Multimedia Integration Language) is recommended for multimedia presentation by the W3C organization.

Motion JPEG 2000: The newest standard uses JPEG 2000, a lossless compression algorithm based on wavelet technology. This algorithm is applied to each individual frame, but unlike other digital video formats, no inter-frame compression is applied. Because loss of data is minimal, Motion JPEG 2000 shows promise as an archival format, but it still requires a great deal of storage capacity.

Quicktime (QT): An API (Application Programming Interface) from Apple that plays video, and a proprietary format (.mov).

Windows Media: An API (Application Programming Interface) from Microsoft that plays video, and a proprietary format (wmv).

For long-term storage, choose an archival format that offers LOTS:

FORMAT FILE EXT. L O T S ACCEPTABLE BEST PRACTICE?
AVI
.avi
+
+
+
+
YES
MPEG-2
.mpg
-
+
-
+
MINIMAL
MPEG-4
.mpg
-
+
-
+
NO
Motion JPEG 2000
.mj2
+
+
-
+
POSSIBLY
QuickTime
.mov
-
-
-
-
NO
RealPlayer
.rm
-
-
-
-
NO
Windows Media
.wmv
-
-
-
-
NO

Equipment

When selecting and using equipment for field work, keep the following points in mind:

Metadata for Digitized Video

Metadata provides information about resources. Along with the general forms of metadata recommended for linguistic resources, it is often useful to include technical metadata specific to video, including original medium (e.g., MiniDV), digitization date, digitization software used, etc.



The content of this page was developed following the recommendations from The NINCH Guide to Good Practice, and Multimedia Format for the Linguistic Data, and Digital Audio, Audio for Video, and Digital Video.

User Contributed Notes
Digitization of Video Files
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search