Today I’d like to think about the subject of dynamic links. I’m hoping to start off in a document management context, but it also opens up questions from a digital preservation point of view.
Very few of the ideas here are my own. Last December I heard Barbara Reed of Recordkeeping Innovation Pty Ltd speaking at the Pericles Conference Acting on Change: New Approaches and Future Practices in Digital Preservation, and on a panel about the risk assessment of complex objects. She made some insightful remarks that very much resonated with me.
She described dynamic links, or self-referential links, as machine-readable links. These are now very common to many of us, particularly if we’re choosing to work in a cloud-based environment, such as Google Drive, or more recently Office 365 or SharePoint.
These environments greatly facilitate the possibility of creating a dynamic link to a resource – and share that link with others, e.g. colleagues in your organisation, or even external partners. It’s a grand way to enable and speed up collaboration. On a drive managed by Windows Explorer, the limitation was we could only open one document at a time; collaborators often got the message “locked for editing by another user”. With these new environments, multiple editors can work simultaneously, and dynamic links help to manage it.
Dynamic links don’t always depend on cloud storage of course, and I suppose we can manage dynamic links just as well in our own local network. Spreadsheets can link to other documents, and links can be held in emails.
Well, it seems there might be a weakness in this way of working. Reed said that these kind of links only work if the objects stay in the same place. It’s fine if nothing changes, but the server configuration can affect that “same place”, such as the network store, quite drastically.
If that is true, then the very IT architecture itself can be a point of failure. “Links are great,” said Reed, “but they presume a stable architecture.”
Part of the weakness could be the use of URLs for creating and maintaining these links. Reed said she has worked in places where there are no protocols for unique identifiers (UIDs), and instead it was more common to use URLs, which are based on storage location.
The problem scales up to larger systems, such as an Electronic Document and Records Management System (EDRMS), and to digital repositories generally. Many an EDRMS anticipates sharing and collaboration when working with live documents, and may have a built-in link generator for this purpose.
But when a resource is moved out of its native environment, you run the risk of breaking the links. Vendors of systems often have no procedure for this, and will simply recommend a redirect mechanism. We can’t seem to keep / preserve this dynamism. “This is everyone’s working environment now,” said Reed, “and we have no clear answer.”
There is a glimmer of hope though, and it seems to involve using UUIDs instead of URLs. I wanted to understand this a bit better, so I did a small amount of research as part of a piece of consultancy I was working on; very coincidentally, the client wanted a way to maintain the stability of digital objects migrated out of an EDRMS into a digital preservation repository.
URLs vs UUIDs
From what I understand, URLs and UUIDs are two fundamentally different methods of identifying and handling digital material. The article On the utility of identification schemes for digital earth science data: an assessment and recommendations (Duerr, R.E., Downs, R.R., Tilmes, C. et al. Earth Sci Inform (2011) 4: 139. doi:10.1007/s12145-011-0083-6), offers the following definitions:
A Uniform Resource Identifier (URI, or URL) is a naming and addressing technology that uses “a compact sequence of characters to identify” World Wide Web resources.
A Universally Unique Identifier (UUID) is a number that is 16-bytes (128-bits), as specified by the Open Software Foundation’s Distributed Computing Environment. A UUID contains 36 characters, of which 32 are hexadecimal digits that are arranged, as 5 hyphenated groups, for example:
As I would understand it, this is how it applies to the subject at hand:
URLs – which is what dynamic links tend to be expressed as - will only continue to work if the objects stay in the same places, and there is a stable environment. Server configuration is one profound change that can affect this.
UUIDs are potentially a more stable way of managing locations, and require less maintenance while ensuring integrity. According to the article:
“An organization that chooses to use URIs as its identifiers will need to maintain the web domain, manage the structure of the URIs and maintain the URL redirects (Cox et al. 2010) for the long-term.”
“Unlike DOIs or other URL-based identification schemes, UUIDs do not need to be recreated or maintained when data is migrated from one location to another.”
What This Means For Digital Preservation
I think it means that digital archivists need to understand this basic difference between URLs and UUIDs, especially when communicating their migration requirements to a vendor or other supplier. Otherwise, there is a risk that this requirement will be misunderstood as a simple redirection mechanism, which it isn’t. For instance, I found online evidence that one vendor offering an export service asserts that:
“It is best to utilize a redirection mechanism to translate your old links to the current location in SharePoint or the network drive.”
Redirection feels to me like a short-term fix, one that extends the shelf-life of dynamic links, but does nothing to stabilise the volatile environment. Conversely, UUIDs will give us more stability, and will not require to be recreated or maintained in future. This approach feels closer to digital preservation; indeed I am fairly certain that a good digital preservation system manages its objects using UUIDs rather than URLs.
UUIDs might be more time-consuming or computationally expensive to create – I honestly don’t know if they are. But that 36-character reference looks like a near-unbreakable machine-readable way of identifying a resource, and I would tend to trust its longevity.
It also means that the conscientious archivist or records manager will at least want to be aware of changes to the network, or server storage, across their current organisation. IT managers may not regard these architecture changes as something that impacts on record-keeping or archival care. My worry is that it might impact quite heavily, and we might not even know about it. The message here is to be aware of this vulnerability in your environment.