3.3 Infrastructure

 3.3.1 We do not need a ‘discographic’ metadata standard: a domain-specific solution will be an unworkable constraint.We need a metadata infrastructure that has a number of core components shared with other domains, each of which may allow local variations (e.g. in the form of extension schema) that are applicable to the work of any particular audiovisual archive. Here are some of the essential qualities that will help to define the structural and functional requirements: Versatility: For the metadata itself, the system must be capable of ingesting, merging, indexing, enhancing, and presenting to the user, metadata from a variety of sources describing a variety of objects, It must also be able to define logical and physical structures, where the logical structure represents intellectual entities, such as collections and works, while the physical structure represents the physical media (or carriers) which constitute the source for the digitized objects. The system must not be tied to one particular metadata schema: it must be possible to mix schema in application profiles (see 3.9.8) suited to the archive’s particular needs though without compromising interoperability. The challenge is to build a system that can accommodate such diversity without needless complication for low threshold users, nor prevent more complex activities for those requiring more room for manoeuvre. Extensibility: Able to accommodate a broad range of subjects, document types (e.g. image and text files) and business entities (e.g. user authentication, usage licenses, acquisition policies, etc.). Allow for extensions to be developed and applied or ignored altogether without breaking the whole, in other words be hospitable to experimentation: implementing metadata solutions remains an immature science. Sustainability: Capable of migration, cost-effective to maintain, usable, relevant and fit for purpose over time. Modularity: The systems used to create or ingest metadata, and merge, index and export it should be modular in nature so that it is possible to replace a component that performs a specific function with a different component, without breaking the whole. Granularity: Metadata must be of a sufficient granularity to support all intended uses. Metadata can easily be insufficiently granular, while it would be the rare case where metadata would be too granular to support a given purpose. Liquidity: Write once, use many times. Liquidity will make digital objects and representations of those objects self-documenting across time, the metadata will work harder for the archive in many networked spaces and provide high returns for the original investments of time and money. Openness and transparency: Supports interoperability with other systems. To facilitate requirements such as extensibility, the standards, protocols, and software incorporated should be as open and transparent as possible. Relational (hierarchy/sequence/provenance): Must express parent- child relationships, correct sequencing, e.g. the scenes of a dramatic performance, and derivation. For digitized items, be able to support accurate mappings and instantiations of original carriers and their intellectual content to files. This helps ensure the authenticity of the archived object (Tennant 2004).

3.3.2 This recipe for diversity is itself a form of openness. If an open W3C (World Wide Web Consortium) standard, such as Extensible Markup Language (XML), a widely adopted mark-up language, is selected then this will not prevent particular implementations from including a mixture of standards such as Material Exchange Format (MXF) and Microsoft’s Advanced Authoring Format (AAF) interchange formats.

3.3.3 Although MXF is an open standard, in practice the inclusion of metadata in the MXF is commonly made in a proprietary way. MXF has further advantages for the broadcast industry because it can be used to professionally stream content whereas other wrappers only support downloading the complete file. The use of MXF for wrapping contents and metadata would only be acceptable for archiving after the replacement of any metadata represented in proprietary formats by open metadata formats.

3.3.4 So much has been written and said about XML that it would be easy to regard it as a panacea. XML is not a solution in itself but a way of approaching content organisation and re-use, its immense power harnessed through combining it with an impressive array of associated tools and technologies that continue to be developed in the interests of economical re-use and repurposing of data. As such, XML has become the de-facto standard for representing metadata descriptions of resources on the Internet. A decade of euphoria about XML is now matched by the means to handle it thanks to the development of many open source and commercial XML editing tools (See 3.6.2).

3.3.5 Although reference is made in this chapter to specific metadata formats that are in use today, or that promise to be useful in the future, these are not meant to be prescriptive. By observing those key qualities in section 3.3.1 and maintaining explicit, comprehensive and discrete records of all technical details, data creation and policy changes, including dates and responsibility, future migrations and translations will not require substantial changes to the underlying infrastructure. A robust metadata infrastructure should be able to accommodate new metadata formats by creating or applying tools specific to that format, such as crosswalks, or algorithms for translating metadata from one encoding scheme to another in an effective and accurate manner. A number of crosswalks already exist for formats such as MARC, MODS, MPEG-7 Path, SMPTE and Dublin Core. Besides using crosswalks to move metadata from one format to another, they can also be used to merge two or more different metadata formats into a third, or into a set of searchable indexes. Given an appropriate container/transfer format, such as METS, virtually any metadata format such as MARC-XML, Dublin Core, MODS, SMPTE (etc), can be accommodated. Moreover, this open infrastructure will enable archives to absorb catalogue records from their legacy systems in part or in whole while offering new services based on them, such as making the metadata available for harvesting – see OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting).