10. Digital target formats and accuracy

As with all forms of digital technology, digital coding schemes are subject to ongoing development. As such, discussion around the most appropriate formats for preservation will also continue to evolve. Irrespective of the options available however, several principles can be applied in choosing target formats.

  • File-based formats offer greater data security and integrity monitoring capability than do carrier-based formats containing data streams such as DAT, audio CD or Digital Betacam.
  • When transferring digital carrier-based content (for example from DAT or DV cassette formats) the resultant file must, when deemed appropriate, retain the coding scheme of the original data stream. Where this is not appropriate, for example where a lossy and proprietary coding scheme has been used (see section 11), a coding scheme should be chosen which preserves the integrity of the original.
  • An essential requirement of any archival file format is that coding schemes used for preservation purposes be openly defined, and not proprietary to a limited number of manufacturers.
  • Where there is little or no consensus throughout the archival community on the choice of target format for a given purpose, a repository must choose a format for which they can be at least relatively confident of their own ability to support it sustainably. This would require sufficient available resource including expertise, as well as ongoing wider industry support for the format.
  • A repository must ensure that a chosen target format will retain the minimum required combination of primary and secondary information.

 

Comment:

Preservation master recordings are generally carried by a target format that consists of a single file, in which a container (wrapper) carries the primary sound or sound-and-picture information, together with secondary information like captions, subtitles, timecode, and other ancillary data. In some cases, however, the secondary information may be carried in what are sometimes called “sidecar” files. This approach is not uncommon for subtitles or captions, and may be used for such corollary materials as record labels.

For audio, the Broadcast WAVE (BWF) format has become a de-facto standard. This format is officially recommended by the Technical Committee (see IASA-TC 04, 6.1.2.1). Broadcast WAVE files, like all WAVE files, cannot exceed 4GB in size, and are limited to mono or two-channel stereo recordings. To accommodate greater amounts of audio data and multiple audio channels, the European Broadcast Union has defined the RF64 BWF file, with a maximum file size of approximately 16 exabytes and up to 18 channels.

For digitisation of original analogue audio recordings, IASA recommends a minimum digital resolution of 48 kHz sampling rate at 24 bit word length, using linear pulse code modulation (LPCM) encoding. In heritage/memory institutions a resolution of 96 kHz / 24 bit has become widely adopted. Better transfers of the unintended parts of a sound document now (see section 8) will make the future removal of these artefacts by digital signal processing easier when making access copies. Because of the transient character of consonants, speech recordings must be treated like music recordings.

When primary information on disc and cylinder sound recordings are captured by non-contact optical scanning techniques, the scanning data itself may comprise the main element in the preservation master file, rather than a subsequently derived conventional audio bitstream.

In memory institutions, target formats for moving image preservation masters are in the early phases of implementation. For video, several institutions have been using a variant of the MXF wrapper standardised by SMPTE, with the picture signal encoded as lossless-compressed JPEG 2000. Meanwhile, other institutions are moving forward with the FFV1 lossless encoding, carrying the picture signal and accompanying soundtracks in wrappers such as QuickTime, Matroska, or AVI.

The most frequently selected target format for memory institution motion picture film scanning is DPX, standardised by SMPTE. At the same time, some archives are exploring approaches that will permit the carriage of synchronised sound and picture signals in the same wrapper, and/or the ability to incorporate additional colour and tonal data. These explorations entail the reformatting of the initially captured DPX picture signals (and soundtracks) into preservation master formats like those selected for video, e.g., lossless JPEG 2000 in MXF or FFV1 picture in QuickTime or Matroska.

In some circumstances it may not practically be possible to migrate audiovisual content. This could be due to specific integral functionality as encountered in video games for example, or the use of copy protection technology. Future access (and thus preservation) may therefore depend on the emulation of the original operating systems and/or application software.

Archives may acquire material in file-based forms, whose transcoding to archival formats may result in irreversible changes being made to the representation of the content. In such cases, authenticity and the promise of better transcoding methods in the future must be considered. The archive may choose to retain the original (as-acquired) file, as well as the transcoded version that is considered a better bet for long-term playability, or simply to transcode, retain the new copies, and delete the originals. The latter option may apply particularly in “edge” cases such as video clips that have been gathered as a part of a Web harvesting project.

In the very long term, further migration from any given format would seem inevitable. Therefore, as far as is possible, a repository must aim to ensure that future migration from any chosen target format will equally preserve this information.