Thursday, 12 July 2012

Kilts, ceilidh and data identifiers

Last night I arrived with Paula Callan at the conference dinner, which was held at the magnificent National Museum of Scotland. We were greeted by a man in a kilt playing the bagpipes at the entrance who was at pains to concentrate on playing as we posed for a few pictures. This set the scene for dinner in the main hall surrounded by unique museum pieces on the walls and throughout the hall including a row of weapons such as double headed axes, an Egyptian statue, and a huge lighthouse beacon. Following dinner was the ‘Ceilidh’ which I fortunately heard someone pronounce before I needed to say it aloud: ‘kay-lay’. It’s an evening of Scottish dancing and music. The whir of kilts and people bumping into each other like dodgems was the order of the first few dances. It was certainly a memorable night in Edinburgh!

The best conference session I attended yesterday was on the topic of name and data identifiers. Simeon Warner from Cornell University opened the session with an update on the ORCID project and made a case for why you should use ORCID’s in your repository. He made the point that other author identification systems have failed because they were not adopted widely enough. Certainly the National Library of Australia’s party infrastructure needs a critical mass to really take off in the research sector and to realise its full potential. ORCID has 328 participant organisations. Researchers can self claim in the ORCID model and institutions can also register. There will be two tier APIs. A Tier 1 API available to everyone and a Tier 2 API available to members only. There is a development site at My thoughts arising from the session:

·         Publisher buy-in is critical for a names project like this. If they get it, people will use it for certain because the benefits will be obvious.

·         Simeon said that organisations will be prompted to resolve duplicates. This happens in the workflows where the author records you provide as an organisation are automatically matched – or attempted to be matched – against author records already in ORCID. But he said that they had not yet worked out how organisations would resolve the duplicates and that they were working with some organisations to work that through. This is also critical. No one wants to hand match records, but there is only so match automatic matching can achieve before an actual person needs to review the records and disambiguate. Hand matching can cost time and money and may require training.

·         It’s unclear how ORCID will relate to the wide range of national name identifier schemes out there, including the one we have in Australia.

Amanda Hill from the University of Manchester then spoke about the UK Names Project run by MIMAS and the British Library with funding from JISC. The project has been in a pilot phase and its ultimate aim, should it secure further funding, would be to have a high quality set of persistent unique identifiers for UK researchers and research institutions. However ORCID may occupy the space and render national name identifier schemes, including the one we have in Australia, somewhat superfluous. We’ll have to wait and see on that one. If ORCID works, it will be fantastic. Two other interesting points from Amanda’s talk:

·         They ran a survey asking if people would be willing to pay for additional name related services (disambiguation) and about a quarter said that they would.

·         They developed an API that allows for flexible searching of names data and uses a plugin for ePrints. It allows repository users to choose from a list of names identities and to create a new record if none exists [but I did not see identifiers as a match point].
The final talk in this session was by Ryan Scherle from Dryad in the USA on ‘creating citeable data identifiers’. As a ‘DOI geek’ myself, I was keen to hear what he had to say. While he presented a somewhat controversial list of ‘principles of citable identifiers’, he made some excellent points. He said you should use DOIs because they are familiar to scientists. When you use a DOI you get a certain amount of weight assigned to your paper. DOIs are supported by many tools and services. But most repositories don’t support DOIs. This point alludes to my own feeling that the value of DOIs for research data collections – as distinct from other persistent identifiers - is in their political clout. They are standard in the publishing world, researchers understand them, they add value in the researcher’s eye, they allow for citations and citation tracking which is of much value to researchers. I spoke to Ryan afterwards and got a description of the workflows in which a scientist has an article about to be published, uploads their data to Dryad, which gives them a DOI for the dataset, and then that DOI is included in the published article. Tracking data citation is still a manual process for them (sigh) but so good he could show some statistics on this. Check out the Dryad data repository at

Another good session I attended yesterday was on share repository services and infrastructure. I’d like to find out more about the ‘Collabratorium Digitus Humanitas’ project between a few institutions in North America. The NYPL are part of this, and they are reconfiguring their repository to take in alot more video and audio from their collections, looking at online streaming platforms, a nice front end and so on. This is relevant to Griffith, and reflects a theme at the conference where institutions are looking at increasing their multimedia content but unsure what platforms will be good for this. There’s a session on multimedia in Islandora on Friday – that may offer an insight.

I gave my talk on Griffith’s research data journey as part of a panel on research data infrastructure. The other speakers were Anthony Beitz from Monash University and Sally Rumsey from the University of Oxford. I think the talk was well received, as was the session in general as we each provided different insights in managing research data at our respective institutions. There were some excellent questions including one from Simon Hodson from JISC about the availability of statistics to show the success of research data infrastructure in facilitating discovery and re-use.

I’ve ticked one more off my to-do list: found out what a ‘sporran’ is (a small bag worn on a kilt).

1 comment:

  1. More good information; you're a legend with these detailed accounts!