6: Preservation Target Formats and Systems

6.1.1 Introduction The following information on the management, long term storage and preservation of digitally encoded audio is based on the premise that there is no ultimate, permanent storage media, nor will there be in the foreseeable future. Instead, those managing digital audio archives must plan to implement preservation management and storage systems which are designed to support processes that go with the inevitable change in format, carrier or other technologies. The rate and direction of technological change is something over which archives have no control and very little influence. The aim and emphasis in digital preservation is to build sustainable systems rather than permanent carriers. The choice of technological storage system is dependent on many factors, of which cost is but one. Though the type of technology selected for preserving a collection may differ according to the specific circumstances of the individual institution and its circumstances, the basic principles outlined here apply to any approach to management and long term storage of digital audio.

6.1.2 Data or Audio Specific Storage  To effectively manage and maintain digital audio it is necessary to transform it to a standard data format. Data formats are the file types, such as .wav, BWF, or AIFF, which computer systems recognise. These files, unlike audio specific carriers, technologically define the limits of their own content and are generally encoded in such a way that a loss of data is recognised and remedied by the host system. IASA recommends the use of BWF as defined in Section 2.8 File Formats.  Audio specific recording formats which have been available in the past include DAT (Digital Audio Tape) and CD-DA (Compact Disc-Digital Audio). DAT, though once largely used for the remote or field recording of 16 bit, 48 kHz audio is now an obsolete recording system. IASA recommends that any significant content recorded on DAT tape be transferred to a more reliable storage system in accordance with the guidance provided in section 5.5 Reproduction of Digital Magnetic Carriers.  The recordable compact disc can be used to record audio in either audio-only (CD-A or CD-DA) or data (CD-ROM) formats. In CD-DA format the encoded digital audio resembles an audio stream and so does not have the advantages of a closed file such as might be recorded on the CD-ROM formatted disc. In the latter though, less data can be stored on the same amount of disc space. IASA does not recommend recording audio in CD-DA form as a preservation target format. There are considerable risks associated with using a recordable CD as a target format in any form and those risks are outlined in Chapter 8 Optical Disks: CD/DVD Recordable. The ever reducing prices and increasing reliability of data management and storage systems make media specific storage approaches, such as CD-R, unnecessary, or at least uneconomic.

6.1.3 Principles of Digital Preservation Digital Mass Storage Systems (DMSS) Principles The following information is based very closely on the practical aspects of Data Protection Strategies from the UNESCO Guidelines for the Preservation of Digital Heritage. It is modified only to reflect the possibility of systems that incorporate non-automated back up, and to reflect the single format concerns of audio digital preservation. The section is included with the kind permission of the author (Webb 2003:16.13).

6.1.4 Practical Aspects of Data Protection Strategies There is a reasonably standard suite of strategies used to manage data in long-term storage. Most are predicated on an assumption that the data carrier itself does not need to be preserved, only the data. The following comprises, in part, those strategies. Allocation of responsibility: Someone must be given unambiguous responsibility for managing data storage and protection. This is a technical responsibility requiring a particular set of skills and knowledge as well as management expertise. For all collections, data storage and protection require dedicated resources, an appropriate plan and must be accountable for these strategies, and even very small collections must have access to the necessary expertise and a dedicated person responsible for that task. Appropriate technical infrastructure to do the job: Data must be stored and managed with appropriate systems and on an appropriate carrier. There are digital asset management systems or digital object storage systems available that meet the requirements of audio digital preservation programmes, some approaches to which are discussed below. Once requirements have been determined, they should be thoroughly discussed with potential suppliers. Different systems and carriers are suited to different needs and those chosen for preservation programmes must be fit for their purpose. The overall system must have adequate capabilities including: Sufficient storage capacity: Storage capacity can be built up over time, but the system must be able to manage the amount of data expected to be stored within its life cycle. As a fundamental capability, the system must be able to duplicate data as required without loss, and transfer data to new or ‘refreshed’ carriers without loss. Demonstrated reliability and technical support to deal with problems promptly. The ability to map file names into a file-naming scheme suitable for its storage architecture. Storage systems are based around named objects. Different systems use different architectures to organise objects. This may impose constraints on how objects are named within storage; for example, disk systems may impose a hierarchical directory structure on existing file names, different from those that would be used on a tape system. The system must allow, or preferably carry out, a mapping of system-imposed file names and existing identifiers. The ability to manage redundant storage. As digital media has a small, but significant failure rate, redundant copies of files at every stage are a necessity, especially the final storage phase. Error checking. A level of automated error checking is normal in most computer storage. Because audio and audio-visual materials must be kept for long periods, often with very low human usage, the system must be able to detect changes or loss of data and take appropriate action. At the very least the strategies in place must alert collection managers to potential problems, with sufficient time to allow appropriate action. Technical infrastructure must also include means of storing metadata and of reliably linking metadata to stored digital objects. Large operations often find they need to set up digital object management systems that are linked to, but separate from, their digital mass storage system, in order to cope with the range of processes involved, and to allow metadata and work interfaces to be changed without having to change the mass storage.

6.1.5 Philosophy of System Sustainability All technology, whether it be the hardware or software, formats or standards, will eventually change as a result of market forces, performance requirements or other needs or expectations. The task of the audio archivist charged with maintaining digital and digitised audio content is to navigate a way through these technological changes such that the content of their collections are maintained for current and future users in a reliable and authentic form in as cost effective way as can be managed.

6.1.6 Long Term Planning Long term planning for a digital audio archive involves more than just the technical standards for a data storage system. The technical issues must be carefully resolved, but the social and economic aspects of running a digital storage system are vital to ensuring the continued access to the content. Long term planning should consider the following issues. The sustainability of the raw data: that is the retention of the byte-stream in its proper and logical order. The data in the storage system must be returned to the system without change or corruption. It is worth noting that computer systems expertise identifies a considerable risk in the maintenance and refreshment of data, and only a well managed and designed approach to IT will ensure adequate results. Formats and ability to replay: Digital data is only useful in a sound archive if it can be rendered as audio in the future. The proper choice of file format ensures that the future sound archive can replay the content of the data files, or will be able to acquire the technology to migrate the files to a new format. Not incorporating a lossy compression algorithm in that format allows that future transformation process to occur without altering the original audio content. Metadata, identification and long term access: All digital audio files must be identifiable and findable in order for that audio material to be used and the value of the content realised. Economics and Sound Archives: this includes the continued viable existence of the institutions that support the data storage systems and repositories as well as those that own, manage, or gain value from, the digital audio stored therein. The cost of maintaining a digital audio collection is ongoing and their must be a plan and a budget that realistically plans for long term preservation of collections. The cost of curating and managing the audio collections is also ongoing. Digital preservation is as much an economic issue as a technical one. The requirements of ongoing sustainability demand at their base a source of reliable funding, necessary to ensure that the constant, albeit potentially low level, support for the sustainability of the digital content and its supporting repositories, technologies and systems can be maintained for as long as it is required. Storage, management and preservation alternatives: Given that the economic and technical environment may well be volatile it is recommended that agreements be established between archives and institutions regarding the storage of data as archives of last resort. This would require some standard agreement about file formats and data organisation as well as social and technical aspects of management of the content. Tools, Software and long term planning: Hardware, software and systems are not things in themselves to be preserved, but are merely tools to support the task of preserving the content. The repository software D-Space, for example, does not describe itself as a preservation solution, but only useful in “enabling institutions with a sustainable ability to retain information assets and offer services upon them.” (DSpace, Michael J. Bass et al. 2002). The repository software itself is a tool, as are the various components designed to aid in operation, simplify processes, and automate and validate the harvesting of metadata. Long term planning involves being able to change or upgrade systems without endangering the content.

6.1.7 Defining the Digital Object The audio file is only one part of the information that is to be preserved. The Reference Model for an Open Archival Information System (OAIS) identifies four parts to the digital object, described by them as the information package. These are the content information and the preservation description information, which are packaged together with packaging information, and which is discoverable by virtue of the descriptive information.

Information Package concept & relationships Though the information may be distributed across the storage system, it is well to remember that the conceptual package is the audio information, the ability to replay that audio, to know its provenance and to describe and find it. There may also be critical relationships between the one audio file and others in the collection, and these relationships are important to the use of the material and so must also be preserved.

6.1.8 The Open Archival Information System (OAIS) The Reference Model for an Open Archival Information System (OAIS) is a widely adopted conceptual model for a digital repository and archival system. The OAIS reference model provides a common language and conceptual framework that digital library and preservation specialists now share. The framework has been adopted as an International Standard, ISO 14721:2003. Though some critics identify shortcomings in the detail of the OAIS, the concept of constructing repository architectures in a form that corresponds with the OAIS functional categories is critical to the development of modular storage systems with interoperable exchange of content. The following sections of the Guidelines adopt the major functional components of the OAIS reference model to assist in the analysis of the available software and to develop recommendations for necessary development. There are a finite number of functions an archival digital repository must be able to perform in order for it to reliably and sustainably perform the purpose for which it is designed. These are defined in the Reference Model for an Open Archival Information System (OAIS) as Ingest, Access, Administration, Data Management, Preservation Planning and Archival Storage.

Open Archival Information System (OAIS) The OAIS also defines the structure of the various information packages that are necessary for the management of the data, according to the place in the digital life cycle. These are the Submission Information Package (SIP), Dissemination Information Package (DIP) and Archival Information Package (AIP). A package is the conceptual parcel of the data and relevant metadata and descriptive information necessary to the particular object. This object is conceptual only in the sense that the package contents may be dispersed in the system or collapsed into a single digital object. OAIS defines an information package as the Content Information and associated Preservation Description Information which is needed to aid in the preservation of the Content Information. The SIP is an Information Package that is delivered to the system for ingest. It contains the data to be stored and all the necessary related metadata about object. The SIP is accepted into the system and used to create an AIP. The AIP is an Information Package which is stored and preserved within the system. It is the information package the system stores, preserves and sustains. The DIP is the information package created to distribute the digital content. There are three roles in this system. First is access, and this DIP would be in a form that the users can use and understand. Second is exchange for the purpose of distributing risk. An archival repository may choose to share parts of its content with other similar institutions, or with an organisation whose role is archival storage. In this case the DIP would contain all the relevant metadata necessary to undertake this role. The third is for distributing content to archives as a last resort. The scenario where a particular archive or institution no longer has the resources to maintain its collection is not difficult to imagine. A standard DIP for this purpose allows other similarly architected systems to undertake the role with the minimum of manual intervention.

6.1.9 Trusted Digital Repositories (TDR) and Institutional Responsibility The technical specification of the digital storage environment is an important part of ensuring that the digital content that is managed is still accessible to researchers in the future. It is not of its own, however, enough to ensure that this will be achieved. The institution within which the digital archive resides has to be able to ensure that the content it manages is curated and maintained responsibly. In 2002 the Research Libraries Group (RLG) and the Online Computer Library Center (OCLC) jointly published “Trusted Digital Repositories: Attributes and Responsibilities” (TDR), which articulated a framework of attributes and responsibilities for trusted, reliable, sustainable digital repositories which were “required for an archive to provide permanent or indefinite long-term preservation of digital information”. These attributes include compliance with the OAIS reference model, organisational viability, financial sustainability, technological and procedural suitability, the security of the system and the existence of appropriate policies to ensure that the steps are taken to manage and preserve the data. The practical instantiation of this is a document known as the “Trustworthy Repositories Audit and Certification (TRAC): Criteria and Checklist” (2007). Using this document an archival institution can establish whether the practices, approaches and technologies they have or are planning to implement are appropriate to the permanent preservation of the digital information for which they have responsibility. The concern which the checklist addresses incorporates three main areas: organisational infrastructure; digital object management and technologies; and technical infrastructure and security. Organisational infrastructure provides a series of checks against appropriate governance and organisational viability, organisational structure and staffing, procedural accountability and policy framework, financial sustainability and a consideration of the licenses, and liabilities. Digital object management section considers the acquisition of content, the creation of an archivable package, planning for preservation, archival storage and planning, information management and access control. The third part of this checklist audits the system infrastructure, the use of technologies appropriate to the tasks and system and institution security. The terminology used in the “Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist” is chosen to represent digital archives in the broadest sense of the word, and so the document’s meaning may occasionally appear opaque to an audio archivist. Nonetheless, the issues examined and tested by it are critical to the planning and management of a digital audio archive. It is strongly recommended that the digital sound archivist uses the checklist to examine the suitability of an institution to manage a digital collection, or to identify weaknesses within an existing digital preservation strategy.

6.1.10 Audio Archives and Technical Responsibility Though a particular institution may be responsible for the management of a collection or set of audio items, it does not necessarily follow that institution will undertake the responsibility for maintaining the digital storage system. An institution may instead become a part of a distributed storage system,or may identify a third party provider to archive their content in a more standard approach. A distributed data storage approach such as that being promoted and developed for web based material by Stanford University under the name of LOCKSS (Lots of Copies Keep Stuff Safe) replicates data in a number of places on the web. The system manages the data on the grid and risk of loss of data is reduced because the information can be found in many different places. Such a system is not appropriate for material which has access restrictions or copyright which prohibits dissemination. Such a system also requires that a development and management responsibility to be shouldered by an institution. An institution may decide that they do not have the technical capability to undertake the development and management of a digital storage system. In this case they may establish a relationship with a third party provider. That provider may be another archive which will take the collection and store its content, or may be a commercial provider who will provide and manage the storage and content for a fee. The information provided here is provided as though the institution is intending to take on its own preservation. However, if any of the above alternatives are considered, then this information is useful for determining if those approaches are reliable and valid.

6.1.11 Digital Repository Software, Data Management and Preservation Systems Digital repository software is generally that software which supports storage and access to the digital content. It should incorporate indexing and metadata systems that manage information about the content, and a variety of tools to find and report on the content. Data management is the management of the byte stream, or data, that the system is responsible for. This may include back up procedures, multiple copies and changes. Preservation processes are those that ensure the content remain accessible in the long term, that the content is still meaningful and that the data management system’s tasks are documented and maintained. All three of these steps are necessary to achieve long term preservation to content.