4.3 File Naming Conventions and Unique Identifiers

4.3.1     Care should be taken when discussing this subject to maintain the distinction between the persistent identifier used to refer to a work, and the file naming conventions. In many practical system there may well be links between the two. This section makes recommendations about file naming conventions. Data files managed in any given repository may include several types of data, not just audio. A Unique Identifier (UID) uniquely identifies a resource. This means that the identifier may change for the particular embodiment of the resource and each copy of the resource has its own ID. It consequently means that the UID are URL’s. For the purposes of this discussion, file names will also be referred to as unique identifiers.

4.3.2     For linkages within and external to any system the unique identifier is the primary key to managing audio data and all of its associated files, e.g. the master copies, playback copies, compressed versions of playback copies, metadata files, edit lists, accompanying texts, images, versions of any one of those master files or derivatives. Therefore, unless the archive is using a system-assigned ‘dumb’ identifiers, it is vitally important that the unique identifier’s structure is logically determined, clearly understood by those who have to apply it, and able to be read by people and machines. It is also important to reveal the connections between ‘families’ of data files: one commentator likens this connectivity to “the persistent ‘thread’ that enables resources to be re-tagged or re-stitched on the Web”. Talking in terms of ‘resources’ rather than collections is an important underlying concept in these guidelines.

4.3.3     One of the most powerful ways of constructing an identification system that reveals those connections is to base it on the concept of Root ID (RID). The RID is the identifier of entity. All the files and folders involved in the representation of the entity will be derived from the RID by addition of prefixes and suffixes such as the creation of unique identifiers.

4.3.4     Regardless of whether identifiers have embedded intelligence or not, it is normal for computer-generated and computer-readable identifiers to have fixed length codes as the primary key. This offers the following advantages:

4.3.4.1     They enable rules to be established for creating new unique identifiers.

4.3.4.2     They guarantee unambiguous recognition in the system (and for users who know the rules).

4.3.4.3     They permit validation of the code or components of the code.

4.3.4.4     They support searching, sorting and reporting.

4.3.5     There has been a prolonged debate about the relative merits of dumb and intelligent or expressive unique identifiers. Most systems allocate a dumb identifier the moment data are saved. They are quickly applied, require no human intervention and their uniqueness is guaranteed. However, their randomness and arbitrariness means that other ways have to be found to show how the different files generated in the life-cycle of a digital resource connect. A better way to do this is by use of intelligent, expressive identifiers.