Wednesday, 31 October 2012

An eResearch journey to Antarctica and beyond


I’ve decided to move to Antarctica. Yep, it’s true. I know it’s cold and there will only be penguins and scientists to talk to but I’m going. Dave Connell from the Australian Antarctic Division – which is based in Tasmania - has convinced me that Antarctica is Nirvana for data management. At his fascinating talk at the eResearch Australasia conference in Sydney he said that the Antarctic treaty demands ‘free and open access to observations and results’. This means that they have more than just carrots to get research data plans and collections from scientists, they also have a mandate - with teeth. And even better, NASA runs the metadata tools used by the Australian Antarctic Division. In the near future they will be minting Digital Object Identifiers (DOIs) for their data collections and they are involved with the Thomson Reuters Citation Index.  So I’m going. Penguins (and scientists) are cool. Literally.

One particular project I’m going to keep tabs on since finding out about it at the eResearch conference is the ODIN Project. While you may know Odin as the legendary ruler of Asgard, the ODIN project is something else entirely. The ORCID and DataCite Interoperability Network (ODIN) is a two-year project that will ‘build on the ORCID and DataCite initiatives to uniquely identify scientists and data sets and connect this information across multiple services and infrastructures for scholarly communication’. CERN, the British Library, ORCID, DataCite, Dryad, arXiv and the Australian National Data Service are all ODIN partners. The project is funded by the European Union. What’s exciting about this project is that it links the leading international project for unique and persistent identification of researchers (the ORCID project) with the force leading an international culture of research data citation using DOIs (DataCite). This has huge potential to increase the ‘carrots’ for sharing data, such as citations because ‘it will address some of the critical open questions in the area: Referencing a data object; Tracking of use and re-use; Links between a data object, subsets, articles, rights statements and every person involved in its life-cycle’.

At the panel discussion on the ‘Australian eResearch Forum’ Professor Tom Cochrane suggested a university might ask itself some questions to determine where it is up to in the eResearch space:
·      Is eResearch referenced in university plans?
·      Is there a Director of eReseach?
·      Is there an eResearch support unit?
·      Are research management, postgrad training, technology and the library involved?
·      Are regional, national and transnational collaboration identified and underway?

He suggested that Australia has invested heavily in eResearch and much has been achieved – national capability and national coherence in particular. But there’s not such a big tick for national sustainability and that’s something we need to work on.

As the conference wrapped up yesterday I reflected that in some areas we seem to have come quite a long way in a short space of time. But in other areas, we seem to be progressing very slowly indeed. I hope that next year’s conference in Brisbane will show improved progress, new ideas, new innovation and new ways of collaboration. Of course, I’ll have to travel all the way from Antarctica for that.

Monday, 29 October 2012

The importance of community at eResearch 2012

It was busy and exciting day yesterday at eResearch Australasia in Sydney. Under the theme of ‘emPower eResearch’ the conference brings together eResearch practitioners, researchers, business providers and educators to discuss the latest and greatest in information and communication technological innovation to enhance research management and analysis. Racing from session to session in a very tightly packed programme, I did manage some time catching up with new and familiar faces and hanging out in the breaks at the Griffith booth which has a fabulous red couch and free chocolates!

Reflecting on yesterday, I noticed a maturing of the discussion since I first attended this conference 3 years ago. For those of us who have carried out projects funded by the Australian National Data Service (ANDS) – and there are 250 such projects at Australian research institutions – it’s now four years on since ANDS first began. While it’s easy to be critical, to my mind a great deal has also been achieved during that time. At Griffith we’ve had some true successes, such as the building of Griffith Research Hub which started with funding from ANDS and then grew with internal funding sources. We’re now at the point where we can build on our successes to make improvements on existing infrastructure, build new services and aim to significantly increase use of research infrastructure to support our hard working researchers. As past guinea pigs, we can share our experiences with those closer to the start of their eResearch journey which could help speed them on their way. It struck me yesterday that we’re really never going to be finished our eResearch journey because, as Dr Clifford said in his keynote speech on the Earth Cube: ‘we are trying to stay on top of the galloping pony of technology’.

A particular theme that ran throughout the entire day yesterday in every session I attended was the importance of community.  Dr Clifford talked about the way in which the National Science Foundation (NSF) has facilitated synergy within and across communities in the geosciences that had never existed before. He said that one of their end goals was to assist in reducing the amount of time researchers in the geosciences spend gathering data and increase the amount of time they spend analysing it. Other talks I attended focussed on engaging with discipline-specific research communities to build innovative technologies that assist researchers to manage, access and analyse their data such as neonatal specialists, crystallographers, and hydrologists. Within this, there have been some terrific tools built to assist researchers, in particular those involving visualisation tools and animation. Finally, at the session on sustainable software development, I participated in a focus group on building communities. The point was to ensure that software produced during the course of a project does not end when project funds run out. Instead, continual development is sustained by an active and caring community of users. This approach is critical given the project-based nature of eResearch funding and the conclusion of ANDS in the middle of next year.

Thursday, 12 July 2012

The sound of conversation (from 40 different countries)

Yesterday was the official close of the conference as we move into workshops on Friday. First up I gave my Pecha Kucha (PK) talk on the Party Infrastructure Project that I was involved with at the National Libraryof Australia. I gave this presentation as an individual, as I no longer work at the NLA and it’s not reflective of my work at Griffith University. For those not familiar with it, Pecha Kucha comes from the Japanese and refers to ‘the sound of conversation’. The presentation format is 20 slides at 20 seconds per slide which is a 6 minute 40 seconds presentation. The transition between slides is set automatically so the presenter has no choice but to keep ploughing on, even if they haven’t finished talking about the previous slide when the next one comes up. At OR2012 this is a Fringe event with several PK sessions, each very well attended. The format gives the audience a flavour for the topic, which they can find out more about afterwards should they choose, and frankly, it’s a bit of an intellectual blood sport. There’s a sort of perverse enjoyment of seeing the fear in the presenters eyes as they race through slides at breakneck speed. As that presenter yesterday, I admit to being terribly fearful and to speaking awfully quickly. I had a lag time of half a sentence or so on a couple of slides but came up on time for most of them. There’s an art in preparing the slides with more pictures and very few words that I didn’t get quite right, and you essentially have to script your words – only three sentences or so per slide. If you stumble over words or blither on, precious time is lost. I found it more difficult than the full paper talk I gave yesterday on Griffith’s data evolution journey, particularly as there are some very complex concepts in the party infrastructure and you simply can’t explain satisfactorily at 20 seconds per slide. I have total admiration for those who speak English as a second language and rose to the challenge of presenting a PK!

We gathered in the George Square lecture theatre at the University of Edinburgh to hear the official conference close before moving into workshops over the next day and a half. 460 people registered for OR2012 representing 40 countries. Of these, I counted six Australians and one New Zealander though the delegate list may reveal some I didn’t catch up with in person. Queensland had the greatest representation with QUT, UQ and Griffith all represented. The Australasians had a small but surprisingly loud presence, as each of us gave a talk, PK or workshop. The charismatic Peter Sefton from University of Western Sydney was on the conference committee and chaired the Developers Challenge. The challenge was sponsored by DevCSI and was to “show us something new and cool in the world of open repositories”. They had the best ever response to the challenge this year with 28 ideas. The winner was Patrick McSweeney with his ‘Data Engine’ idea and the runners up also put together a great idea about using mobile devices in the field to upload audio and video files, with transcriptions, into a repository. You can read about the Developers Challenge winners here.

At the closing plenary we saw a nice little wordle based on the conference tweets. The word ‘data’ was very prominent, as was (of course) repositories, but it showed that data is now mainstream whereas in past conferences it has been more of a side issue. Reflecting on the theme of ‘Open Services for Open Content: Local In for Global Out’ the discussion highlights were summarised as: a recognition of the role of registries; identifiers (how to use, manage and economise); citation (sufficiency, connectivity) and repository fringe (success of). The folks from Prince Edward Island off the East coast of Canada will be hosting next year’s OR and gave a very enjoyable presentation on what we can expect, should we have the fortune to attend. Personally, I thought this was an excellent conference. Edinburgh is a beautiful city and the conference had a wonderful and distinctly Scottish flavour. At the same time, it was truly international conference and that led to a diversity of experiences and ideas which made for rich discussions about repositories and related issues, particularly on shared challenges such as name and data identifiers. I’ve had the opportunity to meet and talk with some very clever and talented people and I’ll take home some good ideas and pertinent thoughts on difficult topics. This unique conference with its Scottish flavour will linger for quite some time yet and I hope I have been able to share some of it with you.  Finally, it’s stopped raining!

Kilts, ceilidh and data identifiers

Last night I arrived with Paula Callan at the conference dinner, which was held at the magnificent National Museum of Scotland. We were greeted by a man in a kilt playing the bagpipes at the entrance who was at pains to concentrate on playing as we posed for a few pictures. This set the scene for dinner in the main hall surrounded by unique museum pieces on the walls and throughout the hall including a row of weapons such as double headed axes, an Egyptian statue, and a huge lighthouse beacon. Following dinner was the ‘Ceilidh’ which I fortunately heard someone pronounce before I needed to say it aloud: ‘kay-lay’. It’s an evening of Scottish dancing and music. The whir of kilts and people bumping into each other like dodgems was the order of the first few dances. It was certainly a memorable night in Edinburgh!

The best conference session I attended yesterday was on the topic of name and data identifiers. Simeon Warner from Cornell University opened the session with an update on the ORCID project and made a case for why you should use ORCID’s in your repository. He made the point that other author identification systems have failed because they were not adopted widely enough. Certainly the National Library of Australia’s party infrastructure needs a critical mass to really take off in the research sector and to realise its full potential. ORCID has 328 participant organisations. Researchers can self claim in the ORCID model and institutions can also register. There will be two tier APIs. A Tier 1 API available to everyone and a Tier 2 API available to members only. There is a development site at http://dev.orcid.org/. My thoughts arising from the session:

·         Publisher buy-in is critical for a names project like this. If they get it, people will use it for certain because the benefits will be obvious.

·         Simeon said that organisations will be prompted to resolve duplicates. This happens in the workflows where the author records you provide as an organisation are automatically matched – or attempted to be matched – against author records already in ORCID. But he said that they had not yet worked out how organisations would resolve the duplicates and that they were working with some organisations to work that through. This is also critical. No one wants to hand match records, but there is only so match automatic matching can achieve before an actual person needs to review the records and disambiguate. Hand matching can cost time and money and may require training.

·         It’s unclear how ORCID will relate to the wide range of national name identifier schemes out there, including the one we have in Australia.

Amanda Hill from the University of Manchester then spoke about the UK Names Project run by MIMAS and the British Library with funding from JISC. The project has been in a pilot phase and its ultimate aim, should it secure further funding, would be to have a high quality set of persistent unique identifiers for UK researchers and research institutions. However ORCID may occupy the space and render national name identifier schemes, including the one we have in Australia, somewhat superfluous. We’ll have to wait and see on that one. If ORCID works, it will be fantastic. Two other interesting points from Amanda’s talk:

·         They ran a survey asking if people would be willing to pay for additional name related services (disambiguation) and about a quarter said that they would.

·         They developed an API that allows for flexible searching of names data and uses a plugin for ePrints. It allows repository users to choose from a list of names identities and to create a new record if none exists [but I did not see identifiers as a match point].
The final talk in this session was by Ryan Scherle from Dryad in the USA on ‘creating citeable data identifiers’. As a ‘DOI geek’ myself, I was keen to hear what he had to say. While he presented a somewhat controversial list of ‘principles of citable identifiers’, he made some excellent points. He said you should use DOIs because they are familiar to scientists. When you use a DOI you get a certain amount of weight assigned to your paper. DOIs are supported by many tools and services. But most repositories don’t support DOIs. This point alludes to my own feeling that the value of DOIs for research data collections – as distinct from other persistent identifiers - is in their political clout. They are standard in the publishing world, researchers understand them, they add value in the researcher’s eye, they allow for citations and citation tracking which is of much value to researchers. I spoke to Ryan afterwards and got a description of the workflows in which a scientist has an article about to be published, uploads their data to Dryad, which gives them a DOI for the dataset, and then that DOI is included in the published article. Tracking data citation is still a manual process for them (sigh) but so good he could show some statistics on this. Check out the Dryad data repository at http://datadryad.org/

Another good session I attended yesterday was on share repository services and infrastructure. I’d like to find out more about the ‘Collabratorium Digitus Humanitas’ project between a few institutions in North America. The NYPL are part of this, and they are reconfiguring their repository to take in alot more video and audio from their collections, looking at online streaming platforms, a nice front end and so on. This is relevant to Griffith, and reflects a theme at the conference where institutions are looking at increasing their multimedia content but unsure what platforms will be good for this. There’s a session on multimedia in Islandora on Friday – that may offer an insight.

I gave my talk on Griffith’s research data journey as part of a panel on research data infrastructure. The other speakers were Anthony Beitz from Monash University and Sally Rumsey from the University of Oxford. I think the talk was well received, as was the session in general as we each provided different insights in managing research data at our respective institutions. There were some excellent questions including one from Simon Hodson from JISC about the availability of statistics to show the success of research data infrastructure in facilitating discovery and re-use.

I’ve ticked one more off my to-do list: found out what a ‘sporran’ is (a small bag worn on a kilt).

Wednesday, 11 July 2012

The conference begins! Haggis, Gaelic and poster minute madness

I’m going to start with the end of day two of the conference and work backwards from there. At the end of the day, there was a drinks reception and poster session held at the Playfair Library, University of Edinburgh. The library is in the old part of the University. It’s an incredible building that was built in the 1820s and has an ornately decorated high vaulted ceiling and rows of books silently looked upon by busts of University professors long past. You can take a look at it here. It was wonderful venue to view the 60+ posters presented at this conference. And yes, they served Haggis – as balls on a stick that you dipped into a sauce.

In the morning I participated in a workshop on the Confederation of Open Access Repositories . COAR was created in October 2009 by partners of the driver project. It’s a not-for-profit organisation with 90 members, none of which are in Australia. The purpose is to facilitate greater visibility of open access repositories. There are three working groups to do this, each with a different focus: content, interoperability and training/support. The workshop was about the interoperability roadmap. There was certainly alot of useful references in the document the interoperability working group has produced but it’s unclear to me how COAR members will actually use the information. Perhaps the training group is of greatest value to those of us in Australia because the next round of it is free, online and you don’t have to be a member of COAR to enrol in it. There is a list of COAR’s upcoming training here.  It costs to be a member, some 2, 500 euro’s – so perhaps that’s another reason why there are no Australian members yet.

After lunch the conference officially opened with an opening plenary from Cameron Neylon (PLoS). The PLoS website has an article on the Finch Report, which is worth a read, as it’s such a hot topic in the UK. Cameron highlighted the qualitative leap forward made possible by researcher collaboration using internet tools.  Complex problems which even the primary expert in a field thinks is too difficult to solve may be solved through collaboration via the internet even with its most rudimentary tools. Following his talk, we had an hour of posters: minute madness. This is where poster presenters have one minute to pitch to the audience why they should visit their poster at the drinks reception later in the evening. If you don’t finish by the end of the minute, the chair blows a whistle in your ear. It was a real hoot and you get a flavour of the huge variety of repository initiatives around the world in a short space of time. The session reflect this is truly is an international conference reflected in the huge representation of countries as delegates and poster presenters (including our very own Paula Callan). A few poster highlights (there were 68 posters!): can linkedin enhance access to open repositories? ; History data management plan at the University of Hull; OpenAire in Europe linking articles with data; Creating, attracting and depositing non-traditional content.
After this I went to the session on research data management and infrastructure. The University of Exeter is an interesting case as they have three repositories and are looking to merge all three using DSpace, Oracle and SWORD2. The talk was mostly about their postgrad initiatives, however I will try and find the speaker over the course of the conference and find out more on their repository project. I enjoyed the talk by Leslie Johnson from the Library of Congress. It was an honest and revealing look at how LoC are now interacting with faculty and researchers that have inspired new ways of delivering data including the transition to a self serve model that is very different from the old model where researchers would need to come physically into the library and interact with a librarian. She referred to the Digging into Data Challenge.

The day ended with the drinks and poster reception and then I had dinner out with Jackie Wickham and others from the RSP and the University of Nottingham. I’ve crossed a few more things off my ‘to do’ list: hear someone speak Gaelic (at the drinks reception) and listen to ‘Scotland the Brave’ played on the bagpipes (actually this is rather unavoidable if you are anywhere near the Royal Mile which is a road over a 1000 years old that leads from the Castle to Holyrood Palace). As an aside, I discovered the word ‘Kirk’ does not refer to ‘James T’ (as in Star Trek – for those less geeky) but it’s the Scottish word for ‘church’. And yes, it’s still raining.

Monday, 9 July 2012

Green with envy (and just a little damp)

This is my second visit to Edinburgh and the real character and old world charm of ‘auld reekie’ is captivating despite the constant drizzle. I walked to OR2012 from my apartment through the old town which takes you past the magnificent Edinburgh Castle and up winding cobblestone streets in an area once known for public hangings and witch trials (now home restaurants, cafes and pubs). Then it was past the museum and on to the University of Edinburgh, once home to two of my favourite authors: Sir Arthur Conan Doyle and Robert Louis Stevenson. On the way to the conference buildings, I was fortunate to pass Doyle’s old home at the university, marked only by a simple commemorative plaque. A few doors down, the former residence of Sir Walter Scott.

Most of the conference takes place in Appleton Tower, which houses the offices of the Digital Curation Centre (DCC). Anthony Beitz from Monash University organised for us to meet with staff from the DCC and Peter Sefton from UWS also joined in. With a backdrop of magnificent views of Edinburgh, we exchanged experiences, thoughts and ideas. The DCC has produced some terrific guides for data management planning and data citation that we can use back home. They also do some great work in supporting institutions with their research data initiatives despite some interesting challenges in this area.
In the afternoon I attended a workshop on ‘Building a National Network’ chaired by the awesome Jackie Wickham from the Repositories Support Project at the University of Nottingham. This confirmed for me that the UK have done a great job in establishing a support network for repository development and for repository staff that made me green with envy. The RSP, for example, offers or facilitates: training, conferences, residential schools, webinars, support tools, site visits, help desk, blog and a buddy scheme (among other things!). UKCoRR offers targeted support for repository staff and most importantly, JISC provides support through its building infrastructure programs and through project funding. The UK repositories face similar challenges to us in terms of increasing full text content, advocacy for open access, building systems around academics, getting critical support from senior management and increasing different types of content in their repositories, specifically creative arts and research data.

Some thoughts arising from the workshop:
·       “Mandates with no teeth”. Just because deposit is policy, doesn’t mean academics will comply with it.
·         When a metadata record is created in a repository, an email is automatically generated and sent to the author asking if they can supply the full text of the article. Could we do the same but for research data?
·         The University of Glasgow has a showcase repository having carried out a number of initiatives driving deposit such as presentations, flyers and reminders to academics of compliance with funding requirements. Could we use any of this in our open access week?
·         How can we strengthen our repository network in Australia/NZ? We can definitely participate in the RSP webinars, though the times may not work out well.
·         The Finch Report is a hot topic here. I haven’t read much about it but I’ve heard it’s good the UK government is recognising the importance of open access but it takes repositories out of the picture.
·         Balviar Notay from JISC mentioned Repnet, which pulls different services together. There is an innovation zone for injecting new ideas and working with developers to try them out. There are rapid innovation projects and an oversight group with international membership.
·         You’ve heard of Sherpa Romeo. Well, now there’s Juliet. While he summarises publishers policies on self deposit, she lists funding requirements for open access.

Tomorrow will be day two. In the meantime, I’m working on my ‘to do’ list. So far I have crossed off ‘visit the castle, try a Jammie Dodger (jam biscuit) and buy something tartan’.

Thursday, 5 July 2012

Headed to Edinburgh for OR2012



In just under 7 hours I'll be stepping on a plane. 25 hours of flying (countless more sitting around at airports in transit desperate for a shower) and 16, 315 kilometers later I'll touch down in Edinburgh. And it'll still be the same day I left Brisbane! After a short recovery, I'll be going to the Open Repositories conference (OR2012) at the University of Edinburgh. There are over 400 people registered and the agenda looks terrific. I'm looking forward to the 'buzz' as people who work with each other across distances meet up in person. On Monday morning, I'm heading to the Digital Curation Centre to meet up with some DCCers, and in the afternoon, I'll be going to a workshop chaired by Jackie Wickham from the Repositories Support Project. I'm a bit of a fan of her work and I'm looking forward to catching up with her and Dominic Tate from UKCorr for lunch. On Wednesday I'm presenting on Griffith's Data Evolution Journey (the Research Hub) and then on Thursday I give a Pecha Kucha on the Party Infrastructure Project. The latter was pretty tricky to nail down in such a short presentation time - hopefully it will roll smoothly. This is a new blog spot for me, a break away from my ANDS Gold Standard Project blog and over the week of the conference, I'll be using it to share the ideas and excitement that OR2012 generates. Scotland, here I come!
- Natasha