Wednesday, October 22, 2014

Evaporating Web Records

You’re hip, you’re cool – ‘business enablement’ is your middle name, and you’ve got social media accounts, blogs, forums, Atom/RSS feeds and wikis rocking and rolling. As Chief Records Officer (CRO) you help your agency move into the 21st Century full speed ahead.

Except. Things are never that easy, and what we find is web records are sadly missing.

We know that a short URL was used in this tweet, but 2 years later the URL has been re-used and no longer points where it did when we created the tweet. Worse yet, we relied on identifying records at creation and we missed one, it never got recorded and Facebook’s API is refusing to give it to us.

The wiki system was migrated to a new platform and the old edit history has been lost – worse the new system tracks comments in a different form and they have been lost too.

An ill-considered rollout of a new website neglected to ensure that all of our old URLs were migrated, and apart from losing Google ranking, we also now can’t identify what content a user might have seen on a given date for a given URL.

In other words, our web records are evaporating. It’s not your EDRMS that’s failing, it’s the fact that all of these web systems exist outside the EDRMS and compliance needs are seen as a secondary (unimportant?) requirement for replacement systems. Practical needs for delivering services now are overwhelming the old centralised compliance needs.

The “Review of Social Media and Defence” report in 2011 by George Patterson Y&R is a good example of the sorts of problems agencies face:

“Given the dynamic nature of social media communications and the collaborative approach to the creation of user generated content, Defence will need to take particular care to ensure that such content is properly identified as a Commonwealth record as and when it is created. An accurate and authentic copy of such content will need to be captured and saved as a record so as to ensure that obligations under the relevant auditing, recordkeeping and disclosure legislation can be met. This is likely to require the development of a specific Defence social media records policy that provides guidance for each particular social media channel to be used by Defence during Professional Use.”
Review of Social Media and Defence, p.102

“The simplest interpretation of international record-keeping policy is that all outgoing communication should be housed on an official website that provides both a credible source for the community and a method of archiving content. The content can then be shared easily into social media, and important or significant conversations can be selected for archiving.”
Review of Social Media and Defence, p.124

“Because the National Archives of Australia (NAA) considers social media to simply be channels in which Commonwealth records can be shared, existing record management and archiving protocols need to be followed. The challenge lies in identifying commonwealth records worthy of archiving but also in the resourcing and processes required to ensure compliance. The government’s response to the Government 2.0 Taskforce (p. 15) states explicitly that the Archives will produce guidance on what constitutes a Commonwealth record in the context of social media. The NAA should be consulted to provide greater clarification for DEOC.
Review of Social Media and Defence, p.157

Rebecca Stoks produced an academic paper in October 2012 that summarised a survey of actual recordkeeping practices for social media records amongst Australian government agencies (mostly state (33), but some local (20) and federal(9) agencies). Her summary was damning:

“The transient nature of social media opposes traditional recordkeeping methods; consequently, most government agencies are not meeting their legal obligation to keep records.”
Taming the Wild West: Capturing Public Records Created on Social Media Websites, p.8

“In this study, only a minority of government agencies were found to be capturing social media records. Most of those capturing records were not very confident that they are meeting their legal obligations or that their methods are sustainable. Within the sample, the level of internal support, be it strong or lacking, was found to affect the degree to which social media records were being captured. Although well regarded as a resource, the guidance provided by PROs did not seem to have an impact on whether or how agencies were capturing records, with several respondents expressing a desire for more practical advice.”
Taming the Wild West: Capturing Public Records Created on Social Media Websites, p.48

What do we want to know about web records when we capture them?

  • URL
  • AGLS meta-data (author, publish date/time, country, copyright, etc)
  • Re-use (trackbacks, retweets, inbound links, ratings, likes, votes)
  • Outbound links, and their status (if they redirect, then to what URL? do they have meta-tags set like NoFollow?)
  • Linked resources (images, JavaScript, iframes, Flash files, video/audio) – not always useful, but worth bringing images into content as an embedded image at very least
  • Conversations started by the record (comments, replies, threads in general)
  • Relative site-map location compared to other web records (requires the concept of a site, perhaps leverage Google site maps?)

Much of this comes from Atom/RSS feeds, but some of it requires post-capture processing.

How do we want to see web records that we capture?

  • As an HTML page, even though stored as XML.
  • As a PDF, even though originally seen as an HTML page and stored as XML.

Of course this only gets us 80% of what we need, there will always be the missing context of what the page design looked like when that content was displayed (and what other content was dynamically displayed alongside it). With social media there is also the context of an responses, retweets, likes, shares or trackbacks to consider.

Do we organise web records by the site they belong to, the Atom/RSS feed they come from, or by some other more definite measure?

I don’t know anyone that has all the answers to those questions, I’m not even sure I know that many people that care about all those questions! However, I do know that without those answers there are essential government records that are literally evaporating every minute of the day, never to be seen again, or known about. They may not be important now, they may not ever be important, but our lack of care with them is likely to be lamented by future generations seeking to understand what motivated, inspired and drove us into action (or not).

No comments:

Post a comment